idnits 2.17.1 draft-ietf-ntp-ntpv4-proto-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 4777. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4788. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4795. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4801. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 119 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document obsoletes RFC1305, but the abstract doesn't seem to directly say this. It does mention RFC1305 though, so this could be OK. -- The draft header indicates that this document obsoletes RFC4330, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 2698 has weird spacing: '... ipaddr srcad...' == Line 2699 has weird spacing: '... ipaddr dstad...' == Line 2700 has weird spacing: '... char ver...' == Line 2701 has weird spacing: '... char lea...' == Line 2702 has weird spacing: '... char mod...' == (114 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 16, 2007) is 6006 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'NSTAGE' on line 3571 -- Looks like a reference, but probably isn't: 'NMAX' on line 2843 == Missing Reference: '0' is mentioned on line 4178, but not defined -- Obsolete informational reference (is this intentional?): RFC 1305 (ref. '1') (Obsoleted by RFC 5905) -- Obsolete informational reference (is this intentional?): RFC 4330 (ref. '2') (Obsoleted by RFC 5905) -- Obsolete informational reference (is this intentional?): RFC 2434 (ref. '10') (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 793 (ref. '11') (Obsoleted by RFC 9293) ** Downref: Normative reference to an Informational RFC: RFC 1345 (ref. '15') ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '16') Summary: 5 errors (**), 0 flaws (~~), 11 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NTP WG J. Burbank, Ed. 3 Internet-Draft W. Kasch, Ed. 4 Obsoletes: RFC 4330, RFC 1305 JHU/APL 5 (if approved) J. Martin, Ed. 6 Intended status: Standards Track Daedelus 7 Expires: May 19, 2008 D. Mills 8 U. Delaware 9 November 16, 2007 11 Network Time Protocol Version 4 Protocol And Algorithms Specification 12 draft-ietf-ntp-ntpv4-proto-08 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on May 19, 2008. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2007). 43 Abstract 45 The Network Time Protocol (NTP) is widely used to synchronize 46 computer clocks in the Internet. This document describes NTP Version 47 4 (NTPv4), which is backwards compatible with NTP Version 3 (NTPv3) 48 described in RFC 1305, as well as previous versions of the protocol. 50 NTPv4 includes a modified protocol header to accommodate the Internet 51 Protocol Version 6 address family. NTPv4 includes fundamental 52 improvements in the mitigation and discipline algorithms which extend 53 the potential accuracy to the tens of microseconds with modern 54 workstations and fast LANs. It includes a dynamic server discovery 55 scheme, so that in many cases specific server configuration is not 56 required. It corrects certain errors in the NTPv3 design and 57 implementation and includes an optional extension mechanism. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 63 2. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 5 64 3. Protocol Modes . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1. Dynamic Server Discovery . . . . . . . . . . . . . . . . 7 66 4. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 8 67 5. Implementation Model . . . . . . . . . . . . . . . . . . . . 10 68 6. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 7. Data Structures . . . . . . . . . . . . . . . . . . . . . . . 16 70 7.1. Structure Conventions . . . . . . . . . . . . . . . . . . 16 71 7.2. Global Parameters . . . . . . . . . . . . . . . . . . . . 16 72 7.3. Packet Header Variables . . . . . . . . . . . . . . . . . 17 73 7.4. The Kiss-o'-Death Packet . . . . . . . . . . . . . . . . 23 74 7.5. NTP Extension Field Format . . . . . . . . . . . . . . . 25 75 8. On Wire Protocol . . . . . . . . . . . . . . . . . . . . . . 25 76 9. Peer Process . . . . . . . . . . . . . . . . . . . . . . . . 29 77 9.1. Peer Process Variables . . . . . . . . . . . . . . . . . 30 78 9.2. Peer Process Operations . . . . . . . . . . . . . . . . . 32 79 10. Clock Filter Algorithm . . . . . . . . . . . . . . . . . . . 36 80 11. System Process . . . . . . . . . . . . . . . . . . . . . . . 38 81 11.1. System Process Variables . . . . . . . . . . . . . . . . 38 82 11.2. System Process Operations . . . . . . . . . . . . . . . . 39 83 11.2.1. Selection Algorithm . . . . . . . . . . . . . . . . 42 84 11.2.2. Cluster Algorithm . . . . . . . . . . . . . . . . . 43 85 11.2.3. Combine Algorithm . . . . . . . . . . . . . . . . . 44 86 11.3. Clock Discipline Algorithm . . . . . . . . . . . . . . . 46 87 12. Clock Adjust Process . . . . . . . . . . . . . . . . . . . . 50 88 13. Poll Process . . . . . . . . . . . . . . . . . . . . . . . . 50 89 13.1. Poll Process Variables . . . . . . . . . . . . . . . . . 50 90 13.2. Poll Process Operations . . . . . . . . . . . . . . . . . 51 91 14. Simple Network Time Protocol (SNTP) . . . . . . . . . . . . . 53 92 15. Security Considerations . . . . . . . . . . . . . . . . . . . 54 93 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 94 17. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 56 95 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 56 96 18.1. Informative References . . . . . . . . . . . . . . . . . 56 97 18.2. Normative References . . . . . . . . . . . . . . . . . . 57 98 Appendix A. Code Skeleton . . . . . . . . . . . . . . . . . . . 57 99 A.1. Global Definitions . . . . . . . . . . . . . . . . . . . 58 100 A.1.1. Definitions, Constants, Parameters . . . . . . . . . 58 101 A.1.2. Packet Data Structures . . . . . . . . . . . . . . . 61 102 A.1.3. Association Data Structures . . . . . . . . . . . . 62 103 A.1.4. System Data Structures . . . . . . . . . . . . . . . 64 104 A.1.5. Local Clock Data Structures . . . . . . . . . . . . 65 105 A.1.6. Function Prototypes . . . . . . . . . . . . . . . . 65 106 A.2. Main Program and Utility Routines . . . . . . . . . . . . 66 107 A.3. Kernel Input/Output Interface . . . . . . . . . . . . . . 69 108 A.4. Kernel System Clock Interface . . . . . . . . . . . . . . 69 109 A.5. Peer Process . . . . . . . . . . . . . . . . . . . . . . 71 110 A.5.1. receive() . . . . . . . . . . . . . . . . . . . . . 72 111 A.5.2. clock_filter() . . . . . . . . . . . . . . . . . . . 79 112 A.5.3. fast_xmit() . . . . . . . . . . . . . . . . . . . . 83 113 A.5.4. access() . . . . . . . . . . . . . . . . . . . . . . 85 114 A.5.5. System Process . . . . . . . . . . . . . . . . . . . 85 115 A.5.6. Clock Adjust Process . . . . . . . . . . . . . . . . 99 116 A.5.7. Poll Process . . . . . . . . . . . . . . . . . . . . 100 117 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 107 118 Intellectual Property and Copyright Statements . . . . . . . . . 108 120 1. Introduction 122 This document defines the Network Time Protocol Version 4 (NTPv4), 123 which is widely used to synchronize system clocks among a set of 124 distributed time servers and clients. It describes the core 125 architecture, protocol, state machines, data structures and 126 algorithms. NTPv4 introduces new functionality to NTPv3, as 127 described in [1], and functionality expanded from SNTPv4 as described 128 in [2] (SNTPv4 is a subset of NTPv4). This document obsoletes [1], 129 and [2]. While certain minor changes have been made in some protocol 130 header fields, these do not affect the interoperability between NTPv4 131 and previous versions of NTP and SNTP. 133 The NTP subnet model includes a number of widely accessible primary 134 time servers synchronized by wire or radio to national standards. 135 The purpose of the NTP protocol is to convey timekeeping information 136 from these primary servers to secondary time servers and clients via 137 both private networks and the public Internet. Precisely tuned 138 algorithms mitigate errors that may result from network disruptions, 139 server failures and possible hostile actions. Servers and clients 140 are configured such that values flow towards clients from the primary 141 servers at the root via branching secondary servers. 143 The NTPv4 design overcomes significant shortcomings in the NTPv3 144 design, corrects certain bugs and incorporates new features. In 145 particular, expanded NTP timestamp definitions encourage the use of 146 the floating double data type throughout the implementation. As a 147 result, the time resolution is better than one nanosecond and 148 frequency resolution is less than one nanosecond per second. 149 Additional improvements include a new clock discipline algorithm 150 which is more responsive to system clock hardware frequency 151 fluctuations. Typical primary servers using modern machines are 152 precise within a few tens of microseconds. Typical secondary servers 153 and clients on fast LANs are within a few hundred microseconds with 154 poll intervals up to 1024 seconds, which was the maximum with NTPv3. 155 With NTPv4, servers and clients are precise within a few tens of 156 milliseconds with poll intervals up to 36 hours. 158 The main body of this document describes the core protocol and data 159 structures necessary to interoperate between conforming 160 implementations. Appendix A contains additional detail in the form 161 of a skeleton program, including data structures and code segments 162 for the core algorithms as well as the mitigation algorithms used to 163 enhance reliability and accuracy. While the skeleton program and 164 other descriptions in this document apply to a particular 165 implementation, they are not intended as the only way the required 166 functions can be implemented. While the NTPv3 symmetric key 167 authentication scheme described in this document has been carried 168 over from NTPv3, the Autokey public key authentication scheme new to 169 NTPv4 is described in [3]. 171 The NTP protocol includes modes of operation described in Section 2 172 using data types described in Section 6 and data structures described 173 in Section 7. The implementation model described in Section 5 is 174 based on a threaded, multi-process architecture, although other 175 architectures could be used as well. The on-wire protocol described 176 in Section 8 is based on a returnable-time design which depends only 177 on measured clock offsets, but does not require reliable message 178 delivery. Reliable message delivery such as TCP[11] can actually 179 make the delivered NTP packet less reliable since retries would 180 increase the delay value and other errors. The synchronization 181 subnet is a self-organizing, hierarchical, master-slave network with 182 synchronization paths determined by a shortest-path spanning tree and 183 defined metric. While multiple masters (primary servers) may exist, 184 there is no requirement for an election protocol. 186 This document includes material from [4], which contains flow charts 187 and equations unsuited for RFC format. There is much additional 188 information in [5], including an extensive technical analysis and 189 performance assessment of the protocol and algorithms in this 190 document. The reference implementation is available at www.ntp.org. 192 The remainder of this document contains numerous variables and 193 mathematical expressions. Some variables take the form of Greek 194 characters, which are spelled out by their full case-sensitive name. 195 For example DELTA refers to the uppercase Greek character, while 196 delta refers to the lowercase character. Furthermore, subscripts are 197 denoted with '_', for example theta_i refers to the lowercase Greek 198 character theta with subscript i, or phonetically theta sub i. In 199 this document all time values are in seconds (s), and all frequencies 200 will be specified as fractional frequency offsets (FFO) (pure 201 number). It is often convenient to express these FFOs in parts per 202 million (ppm). 204 1.1. Requirements Notation 206 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 207 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 208 document are to be interpreted as described in [12]. 210 2. Modes of Operation 212 An NTP implementation operates as a primary server, secondary server 213 or client. A primary server is synchronized to a reference clock 214 directly traceable to UTC (eg, GPS, Galileo, etc). A client 215 synchronizes to one or more upstream servers, but does not provide 216 synchronization to dependent clients. A secondary server has one or 217 more upstream servers and one or more downstream servers or clients. 218 All servers and clients who are fully NTPv4 compliant MUST implement 219 the entire suite of algorithms described in this document. In order 220 to maintain stability in large NTP subnets, secondary servers SHOULD 221 be fully NTPv4 compliant. Alternative algorithms MAY be used, but 222 their output MUST be identical to the algorithms described in this 223 specification. 225 3. Protocol Modes 227 There are three NTP protocol variants, symmetric, client/server and 228 broadcast. Each is associated with an association mode as shown in 229 Figure 1. In addition, persistent associations are mobilized upon 230 startup and are never demobilized. Ephemeral associations are 231 mobilized upon the arrival of a packet and are demobilized upon error 232 or timeout. 234 +-------------------+--------------+-------------+ 235 | Association Mode | Assoc. Mode | Packet Mode | 236 +-------------------+--------------+-------------+ 237 | Symmetric Active | 1 | 1 or 2 | 238 | Symmetric Passive | 2 | 1 | 239 | Client | 3 | 4 | 240 | Server | 4 | 3 | 241 | Broadcast Server | 5 | 5 | 242 | Broadcast Client | 6 | N/A | 243 +-------------------+--------------+-------------+ 245 Figure 1: Association and Packet Modes 247 In the client/server variant a persistent client sends client (mode 248 3) packets to a server, which returns server (mode 4) packets. 249 Servers provide synchronization to one or more clients, but do not 250 accept synchronization from them. A server can also be a reference 251 clock driver which obtains time directly from a standard source such 252 as a GPS receiver or telephone modem service. In this varient, 253 clients pull synchronization from servers. 255 In the symmetric variant a peer operates as both a server and client 256 using either a symmetric active or symmetric passive association. A 257 persistent symmetric active association sends symmetric active (mode 258 1) packets to a symmetric active peer association. Alternatively, an 259 ephemeral symmetric passive association can be mobilized upon arrival 260 of a symmetric active packet with no matching association. That 261 association sends symmetric passive (mode 2) packets and persists 262 until error or timeout. Peers both push and pull synchronization to 263 and from each other. For the purposes of this document, a peer 264 operates like a client, so references to client imply peer as well. 266 In the broadcast variant a persistent broadcast server association 267 sends periodic broadcast server (mode 5) packets which can be 268 received by multiple clients. Upon reception of a broadcast server 269 packet without a matching association, an ephemeral broadcast client 270 (mode 6) association is mobilized and persists until error or 271 timeout. It is useful to provide an initial volley where the client 272 operating in client mode exchanges several packets with the server, 273 so as to calibrate the propagation delay and to run the Autokey 274 security protocol, after which the client reverts to broadcast client 275 mode. A broadcast server pushes synchronization to clients and other 276 servers. 278 Following loosely the conventions established by the telephone 279 industry, the level of each server in the hierarchy is defined by a 280 stratum number. Primary servers are assigned stratum one; secondary 281 servers at each lower level are assigned stratum numbers one greater 282 than the preceding level. As the stratum number increases, its 283 accuracy degrades depending on the particular network path and system 284 clock stability. Mean errors, measured by synchronization distances, 285 increase approximately in proportion to stratum numbers and measured 286 roundtrip delay. 288 As a standard practice, timing network topology should be organized 289 to avoid timing loops and minimize the synchronization distance. In 290 NTP the subnet topology is determined using a variant of the Bellman- 291 Ford distributed routing algorithm, which computes the shortest-path 292 spanning tree rooted on the primary servers. As a result of this 293 design, the algorithm automatically reorganizes the subnet, so as to 294 produce the most accurate and reliable time, even when there are 295 failures in the timing network. 297 3.1. Dynamic Server Discovery 299 There are two special associations, manycast client and manycast 300 server, which provide a dynamic server discovery function. There are 301 two types of manycast client associations, persistent and ephemeral. 302 The persistent manycast client sends client (mode 3) packets to a 303 designated IPv4 or IPv6 broadcast or multicast group address. 304 Designated manycast servers within range of the time-to-live (TTL) 305 field in the packet header listen for packets with that address. If 306 a server is suitable for synchronization, it returns an ordinary 307 server (mode 4) packet using the client's unicast address. Upon 308 receiving this packet, the client mobilizes an ephemeral client (mode 309 3) association. The ephemeral client association persists until 310 error or timeout. 312 A manycast client continues sending packets to search for a minimum 313 number of associations. It starts with a TTL equal to one and 314 continuously adding one to it until the minimum number of 315 associations is made or when the TTL reaches a maximum value. If the 316 TTL reaches its maximum value and yet not enough associations are 317 mobilized, the client stops transmission for a time-out period to 318 clear all associations, and then repeats the search cycle. If a 319 minimum number of associations has been mobilized, then the client 320 starts transmitting one packet per time-out period to maintain the 321 associations. Field constraints limit the minimum value to 1 and the 322 maximum to 255. These limits may be tuned for individual application 323 needs. 325 The ephemeral associations compete among themselves. As new 326 ephemeral associations are mobilized, the client runs the mitigation 327 algorithms described in Section 10 and Section 11.2 for the best 328 candidates out of the population, the remaining ephemeral 329 associations are timed out and demobilized. In this way the 330 population includes only the best candidates that have most recently 331 responded with an NTP packet to discipline the system clock. 333 4. Definitions 335 A number of technical terms are defined in this section. A timescale 336 is a frame of reference where time is expressed as the value of a 337 monotonically increasing binary counter with an indefinite number of 338 bits. It counts in seconds and fractions of a second, when a decimal 339 point is employed. The Coordinated Universal Time (UTC) timescale is 340 defined by ITU-R TF.460[6]. Under the auspices of the Metre 341 Convention of 1865, in 1975 the CGPM[7] strongly endorsed the use of 342 UTC as the basis for civil time. 344 The Coordinated Universal Time (UTC) timescale represents mean solar 345 time as disseminated by national standards laboratories. The system 346 time is represented by the system clock maintained by the hardware 347 and operating system. The goal of the NTP algorithms is to minimize 348 both the time difference and frequency difference between UTC and the 349 system clock. When these differences have been reduced below nominal 350 tolerances, the system clock is said to be synchronized to UTC. 352 The date of an event is the UTC time at which the event takes place. 353 Dates are ephemeral values designated with upper case T. Running time 354 is another timescale that is coincident to the synchronization 355 function of the NTP program. 357 A timestamp T(t) represents either the UTC date or time offset from 358 UTC at running time t. Which meaning is intended should be clear 359 from the context. Let T(t) be the time offset, R(t) the frequency 360 offset, D(t) the aging rate (first derivative of R(t) with respect to 361 t). Then, if T(t_0) is the UTC time offset determined at t = t_0, 362 the UTC time offset at time t is 364 T(t) = T(t_0) + R(t_0)(t-t_0) + 1/2 * D(t_0)(t-t_0)^2 + e, 366 where e is a stochastic error term discussed later in this document. 367 While the D(t) term is important when characterizing precision 368 oscillators, it is ordinarily neglected for computer oscillators. In 369 this document all time values are in seconds (s) and all frequency 370 values are in seconds-per-second (s/s). It is sometimes convenient 371 to express frequency offsets in parts-per-million (PPM), where 1 PPM 372 is equal to 10^(-6) seconds/second. 374 It is important in computer timekeeping applications to assess the 375 performance of the timekeeping function. The NTP performance model 376 includes four statistics which are updated each time a client makes a 377 measurement with a server. The offset (theta) represents the 378 maximum-likelihood time offset of the server clock relative to the 379 system clock. The delay (delta) represents the round trip delay 380 between the client and server. The dispersion (epsilon) represents 381 the maximum error inherent in the measurement. It increases at a 382 rate equal to the maximum disciplined system clock frequency 383 tolerance (PHI), typically 15 PPM. The jitter (psi) is defined as 384 the root-mean-square (RMS) average of the most recent offset 385 differences, represents the nominal error in estimating the offset. 387 While the theta, delta, epsilon, and psi statistics represent 388 measurements of the system clock relative to each server clock 389 separately, the NTP protocol includes mechanisms to combine the 390 statistics of several servers to more accurately discipline and 391 calibrate the system clock. The system offset (THETA) represents the 392 maximum-likelihood offset estimate for the server population. The 393 system jitter (PSI) represents the nominal error in estimating the 394 system offset. The delta and epsilon statistics are accumulated at 395 each stratum level from the reference clock to produce the root delay 396 (DELTA) and root dispersion (EPSILON) statistics. The 397 synchronization distance (LAMBDA) equal to EPSILON + DELTA / 2 398 represents the maximum error due all causes. The detailed 399 formulations of these statistics are given in Section 11.2. They are 400 available to the dependent applications in order to assess the 401 performance of the synchronization function. 403 5. Implementation Model 405 Figure 2 shows the architecture of a typical, multi-threaded 406 implementation. It includes two processes dedicated to each server, 407 a peer process to receive messages from the server or reference clock 408 and a poll process to transmit messages to the server or reference 409 clock. 411 ..................................................................... 412 . Remote . Peer/Poll . System . Clock . 413 . Servers . Processes . Process .Discipline. 414 . . . . Process . 415 .+--------+. +-----------+. +------------+ . . 416 .| |->| |. | | . . 417 .|Server 1| |Peer/Poll 1|->| | . . 418 .| |<-| |. | | . . 419 .+--------+. +-----------+. | | . . 420 . . ^ . | | . . 421 . . | . | | . . 422 .+--------+. +-----------+. | | +-----------+. . 423 .| |->| |. | Selection |->| |. +------+ . 424 .|Server 2| |Peer/Poll 2|->| and | | Combine |->| Loop | . 425 .| |<-| |. | Cluster | | Algorithm |. |Filter| . 426 .+--------+. +-----------+. | Algorithms |->| |. +------+ . 427 . . ^ . | | +-----------+. | . 428 . . | . | | . | . 429 .+--------+. +-----------+. | | . | . 430 .| |->| |. | | . | . 431 .|Server 3| |Peer/Poll 3|->| | . | . 432 .| |<-| |. | | . | . 433 .+--------+. +-----------+. +------------+ . | . 434 ....................^.........................................|...... 435 | . V . 436 | . +-----+ . 437 +--------------------------------------| VFO | . 438 . +-----+ . 439 . Clock . 440 . Adjust . 441 . Process . 442 ............ 444 Figure 2: Implementation Model 446 These processes operate on a common data structure, called an 447 association, which contains the statistics described above along with 448 various other data described in Section 9. A client sends packets to 449 one or more servers and then processes returned packets when they are 450 received. The server interchanges source and destination addresses 451 and ports, overwrites certain fields in the packet and returns it 452 immediately (in the client/server mode) or at some time later (in the 453 symmetric modes). As each NTP message is received, the offset theta 454 between the peer clock and the system clock is computed along with 455 the associated statistics delta, epsilon and psi. 457 The system process includes the selection, cluster and combine 458 algorithms that mitigate among the various servers and reference 459 clocks to determine the most accurate and reliable candidates to 460 synchronize the system clock. The selection algorithm uses Byzantine 461 fault detection principles to discard the presumably incorrect 462 candidates called "falsetickers" from the incident population, 463 leaving only good candidates called "truechimers". A truechimer is a 464 clock that maintains timekeeping accuracy to a previously published 465 and trusted standard, while a falseticker is a clock that shows 466 misleading or inconsistent time. The cluster algorithm uses 467 statistical principles to find the most accurate set of truechimers. 468 The combine algorithm computes the final clock offset by 469 statistically averaging the surviving truechimers. 471 The clock discipline process is a system process that controls the 472 time and frequency of the system clock, here represented as a 473 variable frequency oscillator (VFO). Timestamps struck from the VFO 474 close the feedback loop which maintains the system clock time. 475 Associated with the clock discipline process is the clock adjust 476 process, which runs once each second to inject a computed time offset 477 and maintain constant frequency. The RMS average of past time offset 478 differences represents the nominal error or system clock jitter. The 479 RMS average of past frequency offset differences represents the 480 oscillator frequency stability or frequency wander. These terms are 481 given precise interpretation in Section 11.3. 483 A client sends messages to each server with a poll interval of 2^tau 484 seconds, as determined by the poll exponent tau. In NTPv4, tau 485 ranges from 4 (16 s) through 17 (36 h). The value of tau is 486 determined by the clock discipline algorithm to match the loop time 487 constant T_c = 2^tau. In client/server mode the server responds 488 immediately; however, in symmetric modes each of two peers manages 489 tau as a function of current system offset and system jitter, so may 490 not agree with the same value. It is important that the dynamic 491 behavior of the clock discipline algorithm be carefully controlled in 492 order to maintain stability in the NTP subnet at large. This 493 requires that the peers agree on a common tau equal to the minimum 494 poll exponent of both peers. The NTP protocol includes provisions to 495 properly negotiate this value. 497 The implementation model includes some means to set and adjust the 498 system clock. The operating system is assumed to provide two 499 functions, one to set the time directly, for example the Unix 500 settimeofday() function, and another to adjust the time in small 501 increments advancing or retarding the time by a designated amount, 502 for example the Unix adjtime() function. In this and following 503 references, parentheses following a name indicate reference to a 504 function rather than a simple variable. In the intended design the 505 clock discipline process uses the adjtime() function if the 506 adjustment is less than a designated threshold, and the 507 settimeofday() function if above the threshold. The manner in which 508 this is done and the value of the threshold as described in 509 Section 10. 511 6. Data Types 513 All NTP time values are represented in twos-complement format, with 514 bits numbered in big-endian (as described in Appendix A of [13]) 515 fashion from zero starting at the left, or high-order, position. 516 There are three NTP time formats, a 128-bit date format, a 64-bit 517 timestamp format and a 32-bit short format, as shown in Figure 3. 518 The 128-bit date format is used where sufficient storage and word 519 size are available. It includes a 64-bit signed seconds field 520 spanning 584 billion years and a 64-bit fraction field resolving .05 521 attosecond (i.e., 0.5e-18). For convenience in mapping between 522 formats, the seconds field is divided into a 32-bit Era Number field 523 and a 32-bit Era Offset field. Eras cannot be produced by NTP 524 directly, nor is there need to do so. When necessary, they can be 525 derived from external means, such as the filesystem or dedicated 526 hardware. 528 0 1 2 3 529 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 | Seconds | Fraction | 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 534 NTP Short Format 536 0 1 2 3 537 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 | Seconds | 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 | Fraction | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 544 NTP Timestamp Format 546 0 1 2 3 547 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 | Era Number | 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 551 | Era Offset | 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 | | 554 | Fraction | 555 | | 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 NTP Date Format 560 Figure 3: NTP Time Formats 562 The 64-bit timestamp format is used in packet headers and other 563 places with limited word size. It includes a 32-bit unsigned seconds 564 field spanning 136 years and a 32-bit fraction field resolving 232 565 picoseconds. The 32-bit short format is used in delay and dispersion 566 header fields where the full resolution and range of the other 567 formats are not justified. It includes a 16-bit unsigned seconds 568 field and a 16-bit fraction field. 570 In the date and timestamp formats the prime epoch, or base date of 571 era 0, is 0 h 1 January 1900 UTC, when all bits are zero. It should 572 be noted that strictly speaking, UTC did not exist prior to 1 January 573 1972, but it is convenient to assume it has existed for all eternity, 574 even if all knowledge of historic leap seconds has been lost. Dates 575 are relative to the prime epoch; values greater than zero represent 576 times after that date; values less than zero represent times before 577 it. Note that the Era Offset field of the date format and the 578 Seconds field of the timestamp format have the same interpretation. 580 Timestamps are unsigned values and operations on them produce a 581 result in the same or adjacent eras. Era 0 includes dates from the 582 prime epoch to some time in 2036, when the timestamp field wraps 583 around and the base date for era 1 is established. In either format 584 a value of zero is a special case representing unknown or 585 unsynchronized time. Figure 4 shows a number of historic NTP dates 586 together with their corresponding Modified Julian Day (MJD), NTP era 587 and NTP timestamp. 589 +-------------+------------+-----+---------------+------------------+ 590 | Date | MJD | NTP | NTP Timestamp | Epoch | 591 | | | Era | Era Offset | | 592 +-------------+------------+-----+---------------+------------------+ 593 | 1 Jan -4712 | -2,400,001 | -49 | 1,795,583,104 | 1st day Julian | 594 | 1 Jan -1 | -679,306 | -14 | 139,775,744 | 2 BCE | 595 | 1 Jan 0 | -678,491 | -14 | 171,311,744 | 1 BCE | 596 | 1 Jan 1 | -678,575 | -14 | 202,939,144 | 1 CE | 597 | 4 Oct 1582 | -100,851 | -3 | 2,873,647,488 | Last day Julian | 598 | 15 Oct 1582 | -100,840 | -3 | 2,874,597,888 | First day | 599 | | | | | Gregorian | 600 | 31 Dec 1899 | 15019 | -1 | 4,294,880,896 | Last day NTP Era | 601 | | | | | -1 | 602 | 1 Jan 1900 | 15020 | 0 | 0 | First day NTP | 603 | | | | | Era 0 | 604 | 1 Jan 1970 | 40,587 | 0 | 2,208,988,800 | First day UNIX | 605 | 1 Jan 1972 | 41,317 | 0 | 2,272,060,800 | First day UTC | 606 | 31 Dec 1999 | 51,543 | 0 | 3,155,587,200 | Last day 20th | 607 | | | | | Century | 608 | 8 Feb 2036 | 64,731 | 1 | 63,104 | First day NTP | 609 | | | | | Era 1 | 610 +-------------+------------+-----+---------------+------------------+ 612 Figure 4: Interesting Historic NTP Dates 614 Let p be the number of significant bits in the second fraction. The 615 clock resolution is defined 2^(-p), in seconds. In order to minimize 616 bias and help make timestamps unpredictable to an intruder, the non- 617 significant bits should be set to an unbiased random bit string. The 618 clock precision is defined as the running time to read the system 619 clock, in seconds. Note that the precision defined in this way can 620 be larger or smaller than the resolution. The term rho, representing 621 the precision used in the protocol, is the larger of the two. 623 The only arithmetic operation permitted on dates and timestamps is 624 twos-complement subtraction, yielding a 127-bit or 63-bit signed 625 result. It is critical that the first-order differences between two 626 dates preserve the full 128-bit precision and the first-order 627 differences between two timestamps preserve the full 64-bit 628 precision. However, the differences are ordinarily small compared to 629 the seconds span, so they can be converted to floating double format 630 for further processing and without compromising the precision. 632 It is important to note that twos-complement arithmetic does not 633 distinguish between signed and unsigned values (although comparisons 634 can take sign into account); only the conditional branch instructions 635 do. Thus, although the distinction is made between signed dates and 636 unsigned timestamps, they are processed the same way. A perceived 637 hazard with 64-bit timestamp calculations spanning an era, such as 638 possible in 2036, might result in over-run. In point of fact, if the 639 client is set within 68 years of the server before the protocol is 640 started, correct values are obtained even if the client and server 641 are in adjacent eras. 643 Some time values are represented in exponent format, including the 644 precision, time constant and poll interval. These are in 8-bit 645 signed integer format in log2 (log base 2) seconds. The only 646 arithmetic operations permitted on them are increment and decrement. 647 For the purpose of this document and to simplify the presentation, a 648 reference to one of these variables by name means the exponentiated 649 value, e.g., the poll interval is 1024 s, while reference by name and 650 exponent means the actual value, e.g., the poll exponent is 10. 652 To convert system time in any format to NTP date and timestamp 653 formats requires that the number of seconds s from the prime epoch to 654 the system time be determined. To determine the integer era and 655 timestamp given s, 657 era = s / 2^(32) and timestamp = s - era * 2^(32), 659 which works for positive and negative dates. To determine s given 660 the era and timestamp, 662 s = era * 2^(32) + timestamp. 664 Converting between NTP and system time can be a little messy, and 665 beyond the scope of this document. Note that the number of days in 666 era 0 is one more than the number of days in most other eras and this 667 won't happen again until the year 2400 in era 3. 669 In the description of state variables to follow, explicit reference 670 to integer type implies a 32-bit unsigned integer. This simplifies 671 bounds checks, since only the upper limit needs to be defined. 673 Without explicit reference, the default type is 64-bit floating 674 double. Exceptions will be noted as necessary. 676 7. Data Structures 678 The NTP protocol state machines described in the following sections 679 are defined using state variables and code fragments defined in 680 Appendix A. State variables are separated into classes according to 681 their function in packet headers, peer and poll processes, the system 682 process and the clock discipline process. Packet variables represent 683 the NTP header values in transmitted and received packets. Peer and 684 poll variables represent the contents of the association for each 685 server separately. System variables represent the state of the 686 server as seen by its dependent clients. Clock discipline variables 687 represent the internal workings of the clock discipline algorithm. 688 Additional parameters and variable classes are defined in Appendix A. 690 7.1. Structure Conventions 692 In order to distinguish between different variables of the same name 693 but used in different processes, the naming convention summarized in 694 Figure 5 is adopted. A receive packet variable v is a member of the 695 packet structure r with fully qualified name r.v. In a similar 696 manner x.v is a transmit packet variable, p.v is a peer variable, s.v 697 is a system variable and c.v is a clock discipline variable. There 698 is a set of peer variables for each association; there is only one 699 set of system and clock variables. 701 +------+---------------------------------+ 702 | Name | Description | 703 +------+---------------------------------+ 704 | r. | receive packet header variable | 705 | x. | transmit packet header variable | 706 | p. | peer/poll variable | 707 | s. | system variable | 708 | c. | clock discipline variable | 709 +------+---------------------------------+ 711 Figure 5: Prefix Conventions 713 7.2. Global Parameters 715 In addition to the variable classes a number of global parameters are 716 defined in this document, including those shown with values in 717 Figure 6. 719 +-----------+-------+----------------------------------+ 720 | Name | Value | Description | 721 +-----------+-------+----------------------------------+ 722 | PORT | 123 | NTP port number | 723 | VERSION | 4 | version number | 724 | TOLERANCE | 15e-6 | frequency tolerance PHI (s/s) | 725 | MINPOLL | 4 | minimum poll exponent (16 s) | 726 | MAXPOLL | 17 | maximum poll exponent (36 h) | 727 | MAXDISP | 16 | maximum dispersion (16 s) | 728 | MINDISP | .005 | minimum dispersion increment (s) | 729 | MAXDIST | 1 | distance threshold (1 s) | 730 | MAXSTRAT | 16 | maximum stratum number | 731 +-----------+-------+----------------------------------+ 733 Figure 6: Global Parameters 735 While these are the only global parameters needed in this document, a 736 larger collection is necessary in the skeleton and larger still for 737 any implementation. Appendix A.1.1 contains those used by the 738 skeleton for the mitigation algorithms, clock discipline algorithm 739 and related implementation-dependent functions. Some of these 740 parameter values are cast in stone, like the NTP port number assigned 741 by the IANA and the version number assigned NTPv4 itself. Others 742 like the frequency tolerance (also called PHI), involve an assumption 743 about the worst case behavior of a system clock once synchronized and 744 then allowed to drift when its sources have become unreachable. The 745 minimum and maximum parameters define the limits of state variables 746 as described in later sections of this document. 748 While shown with fixed values in this document, some implementations 749 may make them variables adjustable by configuration commands. For 750 instance, the reference implementation computes the value of 751 PRECISION as log2 of the minimum time in several iterations to read 752 the system clock. 754 7.3. Packet Header Variables 756 The most important state variables from an external point of view are 757 the packet header variables described in Figure 7 and below. The NTP 758 packet header consists of an integral number of 32-bit (4 octet) 759 words in network byte order. The packet format consists of three 760 components, the header itself, one or more optional extension fields 761 and an optional message authentication code (MAC). The header 762 component is identical to the NTPv3 header and previous versions. 763 The optional extension fields are used by the Autokey public key 764 cryptographic algorithms described in [3]. The optional MAC is used 765 by both Autokey and the symmetric key cryptographic algorithm 766 described in this report. 768 +-----------+------------+-----------------------+ 769 | Name | Formula | Description | 770 +-----------+------------+-----------------------+ 771 | leap | leap | leap indicator (LI) | 772 | version | version | version number (VN) | 773 | mode | mode | mode | 774 | stratum | stratum | stratum | 775 | poll | poll | poll exponent | 776 | precision | rho | precision exponent | 777 | rootdelay | delta_r | root delay | 778 | rootdisp | epsilon_r | root dispersion | 779 | refid | refid | reference ID | 780 | reftime | reftime | reference timestamp | 781 | org | T1 | origin timestamp | 782 | rec | T2 | receive timestamp | 783 | xmt | T3 | transmit timestamp | 784 | dst | T4 | destination timestamp | 785 | keyid | keyid | key ID | 786 | MAC | MAC | message digest | 787 +-----------+------------+-----------------------+ 789 Figure 7: Packet Header Variables 791 The NTP packet is a UDP datagram[14]. Some fields use multiple words 792 and others are packed in smaller fields within a word. The NTP 793 packet header shown in Figure 8 has 12 words followed by optional 794 extension fields and finally an optional message authentication code 795 (MAC) consisting of the key identifier field and message digest 796 field. 798 0 1 2 3 799 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 800 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 801 |LI | VN |Mode | Stratum | Poll | Precision | 802 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 803 | Root Delay | 804 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 805 | Root Dispersion | 806 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 807 | Reference ID | 808 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 809 | | 810 + Reference Timestamp (64) + 811 | | 812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 813 | | 814 + Origin Timestamp (64) + 815 | | 816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 817 | | 818 + Receive Timestamp (64) + 819 | | 820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 821 | | 822 + Transmit Timestamp (64) + 823 | | 824 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 825 | | 826 . . 827 . Extension Field 1 (variable) . 828 . . 829 | | 830 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 831 | | 832 . . 833 . Extension Field 2 (variable) . 834 . . 835 | | 836 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 837 | Key Identifier | 838 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 839 | | 840 | MAC (128) | 841 | | 842 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 Figure 8: Packet Header Format 846 The extension fields are used to add optional capabilities, for 847 example, the Autokey security protocol [3]. The extension field 848 format is presented in order that the packet can be parsed without 849 knowledge of the extension field functions. The MAC is used by both 850 Autokey and the symmetric key authentication scheme described in 851 Appendix A. 853 A list of the packet header variables is shown in Figure 7 and 854 described in detail below. Except for a minor variation when using 855 the IPv6 address family, these fields are backwards compatible with 856 NTPv3. The packet header fields apply to both transmitted packets (x 857 prefix) and received packets (r prefix). In Figure 8 the size of 858 some multiple-word fields is shown in bits if not the default 32 859 bits. The basic header extends from the beginning of the packet to 860 the end of the Transmit Timestamp field. 862 The fields and associated packet variables (in parentheses) are 863 interpreted as follows: 865 LI Leap Indicator (leap): 2-bit integer warning of an impending leap 866 second to be inserted or deleted in the last minute of the current 867 month with values defined in Figure 9. 869 +-------+-------------------------------------------------+ 870 | Value | Meaning | 871 +-------+-------------------------------------------------+ 872 | 0 | no warning | 873 | 1 | last minute of the day has 61 seconds | 874 | 2 | last minute of the day has 59 seconds | 875 | 3 | alarm condition (the clock is not synchronized) | 876 +-------+-------------------------------------------------+ 878 Figure 9: Leap Indicator 880 VN Version Number (version): 3-bit integer representing the NTP 881 version number, currently 4. 883 Mode (mode): 3-bit integer representing the mode, with values defined 884 in Figure 10. 886 +-------+--------------------------+ 887 | Value | Meaning | 888 +-------+--------------------------+ 889 | 0 | reserved | 890 | 1 | symmetric active | 891 | 2 | symmetric passive | 892 | 3 | client | 893 | 4 | server | 894 | 5 | broadcast | 895 | 6 | NTP control message | 896 | 7 | reserved for private use | 897 +-------+--------------------------+ 899 Figure 10: Association Modes 901 Stratum (stratum): 8-bit integer representing the stratum, with 902 values defined in Figure 11. 904 +--------+-----------------------------------------------------+ 905 | Value | Meaning | 906 +--------+-----------------------------------------------------+ 907 | 0 | unspecified or invalid | 908 | 1 | primary server (e.g., equipped with a GPS receiver) | 909 | 2-15 | secondary server (via NTP) | 910 | 16 | unsynchronized | 911 | 17-255 | reserved | 912 +--------+-----------------------------------------------------+ 914 Figure 11: Packet Stratum 916 It is customary to map the stratum value 0 in received packets to 917 MAXSTRAT (16) in the peer variable p.stratum and to map p.stratum 918 values of MAXSTRAT or greater to 0 in transmitted packets. This 919 allows reference clocks, which normally appear at stratum 0, to be 920 conveniently mitigated using the same algorithms used for external 921 sources (See Appendix A.5.5.1). 923 Poll: 8-bit signed integer representing the maximum interval between 924 successive messages, in log2 seconds. Suggested default limits for 925 minimum and maximum poll intervals are 6 and 10, respectively. 927 Precision: 8-bit signed integer representing the precision of the 928 system clock, in log2 seconds. For instance a value of -18 929 corresponds to a precision of about one microsecond. The precision 930 can be determined when the service first starts up as the minimum 931 time of several iterations to read the system clock. 933 Root Delay (rootdelay): Total round trip delay to the reference 934 clock, in NTP short format. 936 Root Dispersion (rootdisp): Total dispersion to the reference clock, 937 in NTP short format. 939 Reference ID (refid): 32-bit code identifying the particular server 940 or reference clock. The interpretation depends on the value in the 941 stratum field. For packet stratum 0 (unspecified or invalid) this is 942 a four-character ASCII[15] string, called the kiss code, used for 943 debugging and monitoring purposes. For stratum 1 (reference clock) 944 this is a four-octet, left-justified, zero-padded ASCII string 945 assigned to the reference clock. The authoritative list of Reference 946 Identifiers is maintained by IANA, however any string begining with 947 the ASCII character "X" is reserved for unregistered experimentation 948 and developent.The identifiers in Figure 12 have been used as ASCII 949 identifiers: 951 +------+----------------------------------------------------------+ 952 | ID | Clock Source | 953 +------+----------------------------------------------------------+ 954 | GOES | Geosynchronous Orbit Environment Satellite | 955 | GPS | Global Position System | 956 | GAL | Galileo Positioning System | 957 | PPS | Generic pulse-per-second | 958 | IRIG | Inter-Range Instrumentation Group | 959 | WWVB | LF Radio WWVB Ft. Collins, CO 60 kHz | 960 | DCF | LF Radio DCF77 Mainflingen, DE 77.5 kHz | 961 | HBG | LF Radio HBG Prangins, HB 75 kHz | 962 | MSF | LF Radio MSF Anthorn, UK 60 kHz | 963 | JJY | LF Radio JJY Fukushima, JP 40 kHz, Saga, JP 60 kHz | 964 | LORC | MF Radio LORAN C station, 100 kHz | 965 | TDF | MF Radio Allouis, FR 162 kHz | 966 | CHU | HF Radio CHU Ottawa, Ontario | 967 | WWV | HF Radio WWV Ft. Collins, CO | 968 | WWVH | HF Radio WWVH Kauai, HI | 969 | NIST | NIST telephone modem | 970 | ACTS | NIST telephone modem | 971 | USNO | USNO telephone modem | 972 | PTB | European telephone modem | 973 +------+----------------------------------------------------------+ 975 Figure 12: Reference Identifiers 977 Above stratum 1 (secondary servers and clients) this is the reference 978 identifier of the server and can be used to detect timing loops. If 979 using the IPv4 address family, the identifier is the four-octet IPv4 980 address. If using the IPv6 address family, it is the first four 981 octets of the MD5 hash of the IPv6 address. Note that, when using 982 the IPv6 address family on an NTPv4 server with a NTPv3 client, the 983 Reference Identifier field appears to be a random value and a timing 984 loop might not be detected. 986 Reference Timestamp: Time when the system clock was last set or 987 corrected, in NTP timestamp format. 989 Origin Timestamp (org): Time at the client when the request departed 990 for the server, in NTP timestamp format. 992 Receive Timestamp (rec): Time at the server when the request arrived 993 from the client, in NTP timestamp format. 995 Transmit Timestamp (xmt): Time at the server when the response left 996 for the client, in NTP timestamp format. 998 Destination Timestamp (dst): Time at the client when the reply 999 arrived from the server, in NTP timestamp format. 1001 Note: Destination Timestamp field is not included as a header field; 1002 it is determined upon arrival of the packet and made available in the 1003 packet buffer data structure. 1005 The MAC consists of the Key Identifier followed by the Message 1006 Digest. The message digest, or cryptosum, is calculated as in [16] 1007 over all NTP header and optional extension fields, but not the MAC 1008 itself. 1010 Extension Field n: See Section 7.5 for a description of the format of 1011 this field. 1013 Key Identifier (keyid): 32-bit unsigned integer used by the client 1014 and server to designate a secret 128-bit MD5 key. 1016 Message Digest (digest): 128-bit bitstring computed by the keyed MD5 1017 message digest computed over all the words in the header and 1018 extension fields, but not the MAC itself. 1020 7.4. The Kiss-o'-Death Packet 1022 If the Stratum field is 0, which implies unspecified or invalid, the 1023 Reference Identifier field can be used to convey messages useful for 1024 status reporting and access control. These are called Kiss-o'-Death 1025 (KoD) packets and the ASCII messages they convey are called kiss 1026 codes. The KoD packets got their name because an early use was to 1027 tell clients to stop sending packets that violate server access 1028 controls. The kiss codes can provide useful information for an 1029 intelligent client, either NTPv4 or SNTPv4. Kiss codes are encoded 1030 in four-character ASCII strings left justified and zero filled. The 1031 strings are designed for character displays and log files. A list of 1032 the currently-defined kiss codes is given in Figure 13. Recipients 1033 of kiss codes MUST inspect them and in the following cases take these 1034 actions: 1036 a. For kiss codes: DENY, RSTR the client MUST demobilize any 1037 associations to that server and stop sending packets to that 1038 server; 1040 b. For kiss code: RATE the client MUST immediately reduce its 1041 polling interval to that server and continue to reduce it each 1042 time it receives a RATE kiss code. 1044 c. Kiss codes begining with the ASCII character "X" are for 1045 unregistered experimentation and development and MUST be ignored 1046 if not recognized. 1048 d. Other than the above conditions, KoD packets have no protocol 1049 significance and are discarded after inspection. 1051 +------+------------------------------------------------------------+ 1052 | Code | Meaning | 1053 +------+------------------------------------------------------------+ 1054 | ACST | The association belongs to a unicast server | 1055 | AUTH | Server authentication failed | 1056 | AUTO | Autokey sequence failed | 1057 | BCST | The association belongs to a broadcast server | 1058 | CRYP | Cryptographic authentication or identification failed | 1059 | DENY | Access denied by remote server | 1060 | DROP | Lost peer in symmetric mode | 1061 | RSTR | Access denied due to local policy | 1062 | INIT | The association has not yet synchronized for the first | 1063 | | time | 1064 | MCST | The association belongs to a dynamically discovered server | 1065 | NKEY | No key found. Either the key was never installed or is | 1066 | | not trusted | 1067 | RATE | Rate exceeded. The server has temporarily denied access | 1068 | | because the client exceeded the rate threshold | 1069 | RMOT | Alteration of association from a remote host running | 1070 | | ntpdc. | 1071 | STEP | A step change in system time has occurred, but the | 1072 | | association has not yet resynchronized | 1073 +------+------------------------------------------------------------+ 1075 Figure 13: Kiss Codes 1077 7.5. NTP Extension Field Format 1079 In NTPv4 one or more extension fields can be inserted after the 1080 header and before the MAC, which is always present when an extension 1081 field is present. Other than defining the field format, this 1082 document makes no use of the field contents. An extension field 1083 contains a request or response message in the format shown in 1084 Figure 14. 1086 0 1 2 3 1087 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1088 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1089 | Field Type | Length | 1090 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1091 . . 1092 . Value . 1093 . . 1094 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1095 | Padding (as needed) | 1096 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1098 Figure 14: Extension Field Format 1100 All extension fields are zero-padded to a word (4 octets) boundary. 1101 The Field Type field is specific to the defined function and is not 1102 elaborated here. While the minimum field length containing required 1103 fields is 4 words (16 octets), a maximum field length remains to be 1104 established. 1106 The Length field is a 16-bit unsigned integer which indicates the 1107 length of the entire extension field in octets, including the Padding 1108 field. 1110 8. On Wire Protocol 1112 The heart of the NTP on-wire protocol is the core mechanism which 1113 exchanges time values between servers, peers and clients. It is 1114 inherently resistant to lost or duplicate packets. Data integrity is 1115 provided by the IP and UDP checksums. No flow control or 1116 retransmission facilities are provided or necessary. The protocol 1117 uses timestamps, either extracted from packet headers or struck from 1118 the system clock upon the arrival or departure of a packet. 1119 Timestamps are precision data and should be restruck in case of link 1120 level retransmission and corrected for the time to compute a MAC on 1121 transmit. 1123 NTP messages make use of two different communication modes, one-to- 1124 one and one-to-many, commonly referred to as unicast and broadcast. 1125 For the purposes of this document, the term broadcast is interpreted 1126 as any available one-to-many mechanism. For IPv4 this equates to 1127 either IPv4 broadcast or IPv4 multicast. For IPv6 this equates to 1128 IPv6 multicast. For this purpose, IANA has allocated the IPv4 1129 multicast address 224.0.1.1 and the IPv6 multicast address ending 1130 :101, with prefix determined by scoping rules. 1132 The on-wire protocol uses four timestamps numbered t1 through t4 and 1133 three state variables org, rec and xmt, as shown in Figure 15. This 1134 figure shows the most general case where each of two peers, A and B, 1135 independently measure the offset and delay relative to the other. 1136 For purposes of illustration the packet timestamps are shown in lower 1137 case, while the state variables are shown in upper case. The state 1138 variables are copied from the packet timestamps upon arrival or 1139 departure of a packet. 1141 t2 t3 t6 t7 1142 +---------+ +---------+ +---------+ +---------+ 1143 | 0 | | t1 | | t3 | | t5 | 1144 +---------+ +---------+ +---------+ +---------+ 1145 | 0 | | t2 | | t4 | | t6 | Packet 1146 +---------+ +---------+ +---------+ +---------+ Timestamps 1147 | t1 | |t3=clock | | t5 | |t7=clock | 1148 +---------+ +---------+ +---------+ +---------+ 1149 |t2=clock | |t6=clock | 1150 +---------+ +---------+ 1151 Peer B 1152 +---------+ +---------+ +---------+ +---------+ 1153 org | T1 | | T1 | | t5<>T1? | | T5 | 1154 +---------+ +---------+ +---------+ +---------+ State 1155 rec | T2 | | T2 | | T6 | | T6 | Variables 1156 +---------+ +---------+ +---------+ +---------+ 1157 xmt | 0 | | T3 | | t3=T3? | | T7 | 1158 +---------+ +---------+ +---------+ +---------+ 1160 t2 t3 t6 t7 1161 --------------------------------------------------------- 1162 /\ \ /\ \ 1163 / \ / \ 1164 / \ / \ 1165 / \/ / \/ 1166 --------------------------------------------------------- 1167 t1 t4 t5 t8 1169 t1 t4 t5 t8 1170 +---------+ +---------+ +---------+ +---------+ 1171 | 0 | | t1 | | t3 | | t5 | 1172 +---------+ +---------+ +---------+ +---------+ 1173 | 0 | | t2 | | t4 | | t6 | Packet 1174 +---------+ +---------+ +---------+ +---------+ Timestamps 1175 |t1=clock | | t3 | |t5=clock | | t7 | 1176 +---------+ +---------+ +---------+ +---------+ 1177 |t4=clock | |t8=clock | 1178 +---------+ +---------+ 1179 Peer A 1180 +---------+ +---------+ +---------+ +---------+ 1181 org | 0 | | t3<>0? | | T3 | | t7<>T3? | 1182 +---------+ +---------+ +---------+ +---------+ State 1183 rec | 0 | | T4 | | T4 | | T8 | Variables 1184 +---------+ +---------+ +---------+ +---------+ 1185 xmt | T1 | | t1=T1? | | T5 | | t5=T5? | 1186 +---------+ +---------+ +---------+ +---------+ 1187 Figure 15: On-Wire Protocol 1189 In the figure the first packet transmitted by A contains only the 1190 origin timestamp t1, which is then copied to T1. B receives the 1191 packet at t2 and copies t1 to T1 and the receive timestamp t2 to T2. 1192 At this time or some time later at t3, B sends a packet to A 1193 containing t1 and t2 and in addition the transmit timestamp t3. All 1194 three timestamps are copied to the corresponding state variables. A 1195 receives the packet at t4 containing the three timestamps t1, t2 and 1196 t3 and in addition the destination timestamp t4. These four 1197 timestamps are used to compute the offset and delay of B relative to 1198 A, as described below. 1200 Before the xmt and org state variables are updated, two sanity checks 1201 are performed in order to protect against duplicate, bogus or 1202 replayed packets. In the exchange above, a packet is duplicate or 1203 replay if the transmit timestamp t3 in the packet matches the org 1204 state variable T3. A packet is bogus if the origin timestamp t1 in 1205 the packet does not match the xmt state variable T1. In either of 1206 these cases the state variables are updated, then the packet is 1207 discarded. To protect against replay of the last transmitted packet, 1208 the xmt state variable is set to zero immediately after a successful 1209 bogus check. 1211 The four most recent timestamps, T1 through T4, are used to compute 1212 the offset of B relative to A 1214 theta = T(B) - T(A) = 1/2 * [(T2-T1) + (T3-T4)] 1216 and the round trip delay 1218 delta = T(ABA) = (T4-T1) - (T3-T2). 1220 Note that the quantities within parentheses are computed from 64-bit 1221 unsigned timestamps and result in signed values with 63 significant 1222 bits plus sign. These values can represent dates from 68 years in 1223 the past to 68 years in the future. However, the offset and delay 1224 are computed as sums and differences of these values, which contain 1225 62 significant bits and two sign bits, so can represent unambiguous 1226 values from 34 years in the past to 34 years in the future. In other 1227 words, the time of the client must be set within 34 years of the 1228 server before the service is started. This is a fundamental 1229 limitation with 64-bit integer arithmetic. 1231 In implementations where floating double arithmetic is available, the 1232 first-order differences can be converted to floating double and the 1233 second-order sums and differences computed in that arithmetic. Since 1234 the second-order terms are typically very small relative to the 1235 timestamp magnitudes, there is no loss in significance, yet the 1236 unambiguous range is restored from 34 years to 68 years. 1238 In some scenarios where the initial frequency offset of the client is 1239 relatively large and the actual propagation time small, it is 1240 possible for the delay computation to becomes negative. For 1241 instance, if the frequency difference is 100 PPM and the interval 1242 T4-T1 is 64 s, the apparent delay is -6.4 ms. Since negative values 1243 are misleading in subsequent computations, the value of delta should 1244 be clamped not less than s.rho, where s.rho is the system precision 1245 described in Section 11.1, expressed in seconds. 1247 The discussion above assumes the most general case where two 1248 symmetric peers independently measure the offsets and delays between 1249 them. In the case of a stateless server, the protocol can be 1250 simplified. A stateless server copies T3 and T4 from the client 1251 packet to T1 and T2 of the server packet and tacks on the transmit 1252 timestamp T3 before sending it to the client. Additional details for 1253 filling in the remaining protocol fields are given in a Section 9 and 1254 following sections and in the appendix. 1256 Note that the on-wire protocol as described resists replay of a 1257 server response packet. However, it does not resist replay of the 1258 client request packet, which would result in a server reply packet 1259 with new values of T2 and T3 and result in incorrect offset and 1260 delay. This vulnerability can be avoided by setting the xmt state 1261 variable to zero after computing the offset and delay. 1263 9. Peer Process 1265 The process descriptions to follow include a listing of the important 1266 state variables followed by an overview of the process operations 1267 implemented as routines. Frequent reference is made to the skeleton 1268 in the appendix. The skeleton includes C-language fragments that 1269 describe the functions in more detail. It includes the parameters, 1270 variables and declarations necessary for a conforming NTPv4 1271 implementation. However, many additional variables and routines may 1272 be necessary in a working implementation. 1274 The peer process is called upon arrival of a server or peer packet. 1275 It runs the on-wire protocol to determine the clock offset and round 1276 trip delay and in addition computes statistics used by the system and 1277 poll processes. Peer variables are instantiated in the association 1278 data structure when the structure is initialized and updated by 1279 arriving packets. There is a peer process, poll process and 1280 association for each server. 1282 9.1. Peer Process Variables 1284 Figure 16, Figure 17, Figure 18 and Figure 19 summarize the common 1285 names, formula names and a short description of the peer variables. 1286 The common names and formula names are interchangeable; formula names 1287 are intended to increase readability of equations in this 1288 specification. Unless noted otherwise, all peer variables have 1289 assumed prefix p. 1291 +---------+----------+-----------------------+ 1292 | Name | Formula | Description | 1293 +---------+----------+-----------------------+ 1294 | srcaddr | srcaddr | source address | 1295 | srcport | srcport | source port | 1296 | dstaddr | dstaddr | destination address | 1297 | dstport | destport | destination port | 1298 | keyid | keyid | key identifier key ID | 1299 +---------+----------+-----------------------+ 1301 Figure 16: Peer Process Configuration Variables 1303 +-----------+------------+---------------------+ 1304 | Name | Formula | Description | 1305 +-----------+------------+---------------------+ 1306 | leap | leap | leap indicator | 1307 | version | version | version number | 1308 | mode | mode | mode | 1309 | stratum | stratum | stratum | 1310 | ppoll | ppoll | peer poll exponent | 1311 | rootdelay | delta_r | root delay | 1312 | rootdisp | epsilon_r | root dispersion | 1313 | refid | refid | reference ID | 1314 | reftime | reftime | reference timestamp | 1315 +-----------+------------+---------------------+ 1317 Figure 17: Peer Process Packet Variables 1319 +------+---------+--------------------+ 1320 | Name | Formula | Description | 1321 +------+---------+--------------------+ 1322 | org | T1 | origin timestamp | 1323 | rec | T2 | receive timestamp | 1324 | xmt | T3 | transmit timestamp | 1325 | t | t | packet time | 1326 +------+---------+--------------------+ 1328 Figure 18: Peer Process Timestamp Variables 1330 +--------+---------+-----------------+ 1331 | Name | Formula | Description | 1332 +--------+---------+-----------------+ 1333 | offset | theta | clock offset | 1334 | delay | delta | roundtrip delay | 1335 | disp | epsilon | dispersion | 1336 | jitter | psi | jitter | 1337 | filter | filter | clock filter | 1338 | tp | t_p | filter time | 1339 +--------+---------+-----------------+ 1341 Figure 19: Peer Process Statistics Variables 1343 The following configuration variables are normally initialized when 1344 the association is mobilized, either from a configuration file or 1345 upon the arrival of the first packet for an unknown association. 1347 srcaddr: IP address of the remote server or reference clock. This 1348 becomes the destination IP address in packets sent from this 1349 association. 1351 srcport: UDP port number of the server or reference clock. This 1352 becomes the destination port number in packets sent from this 1353 association. When operating in symmetric modes (1 and 2) this field 1354 must contain the NTP port number PORT (123) assigned by the IANA. In 1355 other modes it can contain any number consistent with local policy. 1357 dstaddr: IP address of the client. This becomes the source IP 1358 address in packets sent from this association. 1360 dstport: UDP port number of the client, ordinarily the NTP port 1361 number PORT (123) assigned by the IANA. This becomes the source port 1362 number in packets sent from this association. 1364 keyid: Symmetric key ID for the 128-bit MD5 key used to generate and 1365 verify the MAC. The client and server or peer can use different 1366 values, but they must map to the same key. 1368 The variables defined in Figure 17 are updated from the packet header 1369 as each packet arrives. They are interpreted in the same way as the 1370 packet variables of the same names. It is convenient for later 1371 processing to convert the NTP short format packet values r.rootdelay 1372 and r.rootdisp to floating doubles as peer variables. 1374 The variables defined in Figure 18 include the timestamps exchanged 1375 by the on-wire protocol in Section 8. The t variable is the seconds 1376 counter c.t associated with these values. The c.t variable is 1377 maintained by the clock adjust process described in Section 12. It 1378 counts the seconds since the service was started. The variables 1379 defined in Figure 19 include the statistics computed by the 1380 clock_filter() routine described in Section 10. The tp variable is 1381 the seconds counter associated with these values. 1383 9.2. Peer Process Operations 1385 The receive() routine in Appendix A.5.1 shows the peer process code 1386 flow upon the arrival of a packet. The access() routine in 1387 Appendix A.5.4 implements access restrictions using an access control 1388 list (ACL). There is no specific method required for access control, 1389 although it is recommended that implementations include such a 1390 scheme, which is similar to many others now in widespread use. 1391 Format checks require correct field length and alignment, acceptable 1392 version number (1-4) and correct extension field syntax, if present. 1394 There is no specific requirement for authentication; however, if 1395 authentication is implemented, the symmetric key scheme described in 1396 Appendix A.2 must be among the supported schemes. This scheme uses 1397 the MD5 keyed hash algorithm described in [16]. 1399 Next, the association table is searched for matching source address 1400 and source port using the find_assoc() routine in Appendix A.5.1. 1401 Figure 20 is a dispatch table where the columns correspond to the 1402 packet mode and rows correspond to the association mode. The 1403 intersection of the association and packet modes dispatches 1404 processing to one of the following steps. 1406 +------------------+---------------------------------------+ 1407 | | Packet Mode | 1408 +------------------+-------+-------+-------+-------+-------+ 1409 | Association Mode | 1 | 2 | 3 | 4 | 5 | 1410 +------------------+-------+-------+-------+-------+-------+ 1411 | No Association 0 | NEWPS | DSCRD | FXMIT | MANY | NEWBC | 1412 | Symm. Active 1 | PROC | PROC | DSCRD | DSCRD | DSCRD | 1413 | Symm. Passive 2 | PROC | ERR | DSCRD | DSCRD | DSCRD | 1414 | Client 3 | DSCRD | DSCRD | DSCRD | PROC | DSCRD | 1415 | Server 4 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1416 | Broadcast 5 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1417 | Bcast Client 6 | DSCRD | DSCRD | DSCRD | DSCRD | PROC | 1418 +------------------+-------+-------+-------+-------+-------+ 1420 Figure 20: Peer Dispatch Table 1422 DSCRD. This indicates a nonfatal violation of protocol as the result 1423 of a programming error, long delayed packet or replayed packet. The 1424 peer process discards the packet and exits. 1426 ERR. This indicates a fatal violation of protocol as the result of a 1427 programming error, long delayed packet or replayed packet. The peer 1428 process discards the packet, demobilizes the symmetric passive 1429 association and exits. 1431 FXMIT. This indicates a client (mode 3) packet matching no 1432 association (mode 0). If the destination address is not a broadcast 1433 address, the server constructs a server (mode 4) packet and returns 1434 it to the client without retaining state. The server packet header 1435 is constructed by the fast_xmit() routine in Appendix A.5.3. The 1436 packet header is assembled from the receive packet and system 1437 variables as shown in Figure 21. If the s.rootdelay and s.rootdisp 1438 system variables are stored in floating double, they must be 1439 converted to NTP short format first. 1441 +-----------------------------------+ 1442 | Packet Variable --> Variable | 1443 +-----------------------------------+ 1444 | r.leap --> p.leap | 1445 | r.mode --> p.mode | 1446 | r.stratum --> p.stratum | 1447 | r.poll --> p.ppoll | 1448 | r.rootdelay --> p.rootdelay | 1449 | r.rootdisp --> p.rootdisp | 1450 | r.refid --> p.refid | 1451 | r.reftime --> p.reftime | 1452 | r.keyid --> p.keyid | 1453 +-----------------------------------+ 1455 Figure 21: Receive Packet Header 1457 Note that, if authentication fails, the server returns a special 1458 message called a crypto-NAK. This message includes the normal NTP 1459 header data shown in Figure 8, but with a MAC consisting of four 1460 octets of zeros. The client MAY accept or reject the data in the 1461 message. After these actions the peer process exits. 1463 If the destination address is a multicast address, the sender is 1464 operating in manycast client mode. If the packet is valid and the 1465 server stratum is less than the client stratum, the server sends an 1466 ordinary server (mode 4) packet, but using its unicast destination 1467 address. A crypto-NAK is not sent if authentication fails. After 1468 these actions the peer process exits. 1470 MANY: This indicates a server (mode 4) packet matching no 1471 association. Ordinarily, this can happen only as the result of a 1472 manycast server reply to a previously sent multicast client packet. 1473 If the packet is valid, an ordinary client (mode 3) association is 1474 mobilized and operation continues as if the association was mobilized 1475 by the configuration file. 1477 NEWBC. This indicates a broadcast (mode 5) packet matching no 1478 association. The client mobilizes either a client (mode 3) or 1479 broadcast client (mode 6) association as shown in the mobilize() and 1480 clear() routines in Appendix A.2. Then the packet() routine in 1481 Appendix A.5.1.1 validates the packet and initializes the peer 1482 variables. 1484 If the implementation supports no additional security or calibration 1485 functions, the association mode is set to broadcast client (mode 6) 1486 and the peer process exits. Implementations supporting public key 1487 authentication MAY run the Autokey or equivalent security protocol. 1488 Implementations SHOULD set the association mode to 3 and run a short 1489 client/server exchange to determine the propagation delay. Following 1490 the exchange the association mode is set to 6 and the peer process 1491 continues in listen-only mode. Note the distinction between a mode-6 1492 packet, which is reserved for the NTP monitor and control functions, 1493 and a mode-6 association. 1495 NEWPS. This indicates a symmetric active (mode 1) packet matching no 1496 association. The client mobilizes a symmetric passive (mode 2) 1497 association as shown in the mobilize() routine and clear() routines 1498 in Appendix A.2. Processing continues in the PROC section below. 1500 PROC. This indicates a packet matching an existing association. The 1501 packet timestamps are carefully checked to avoid invalid, duplicate 1502 or bogus packets. Additional checks are summarized in Figure 22. 1503 Note that all packets, including a crypto-NAK, are considered valid 1504 only if they survive these tests. 1506 +--------------------------+----------------------------------------+ 1507 | Packet Type | Description | 1508 +--------------------------+----------------------------------------+ 1509 | 1 duplicate packet | The packet is at best an old duplicate | 1510 | | or at worst a replay by a hacker. | 1511 | | This can happen in symmetric modes if | 1512 | | the poll intervals are uneven. | 1513 | 2 bogus packet | | 1514 | 3 invalid | One or more timestamp fields are | 1515 | | invalid. This normally happens in | 1516 | | symmetric modes when one peer sends | 1517 | | the first packet to the other and | 1518 | | before the other has received its | 1519 | | first reply. | 1520 | 4 access denied | The access controls have blacklisted | 1521 | | the source. | 1522 | 5 authentication failure | The cryptographic message digest does | 1523 | | not match the MAC. | 1524 | 6 unsynchronized | The server is not synchronized to a | 1525 | | valid source. | 1526 | 7 bad header data | One or more header fields are invalid. | 1527 +--------------------------+----------------------------------------+ 1529 Figure 22: Packet Error Checks 1531 Processing continues in the packet() routine in Appendix A.5.1.1. It 1532 copies the packet variables to the peer variables as shown in 1533 Figure 21 and the packet() routine in Appendix A.5.2. The receive() 1534 routine implements tests 1-5 in Figure 22; the packet() routine 1535 implements tests 6-7. If errors are found the packet is discarded 1536 and the peer process exits. 1538 The on-wire protocol calculates the clock offset theta and round trip 1539 delay delta from the four most recent timestamps as described in 1540 Section 8. While it is in principle possible to do all calculations 1541 except the first-order timestamp differences in fixed-point 1542 arithmetic, it is much easier to convert the first-order differences 1543 to floating doubles and do the remaining calculations in that 1544 arithmetic, and this will be assumed in the following description. 1546 Next, the 8-bit p.reach shift register in the poll process described 1547 in Section 13 is used to determine whether the server is reachable 1548 and the data are fresh. The register is shifted left by one bit when 1549 a packet is sent and the rightmost bit is set to zero. As valid 1550 packets arrive, the packet() routine sets the rightmost bit to one. 1551 If the register contains any nonzero bits, the server is considered 1552 reachable; otherwise, it is unreachable. Since the peer poll 1553 interval might have changed since the last packet, the poll_update() 1554 routine in Appendix A.5.7.2 is called to redetermine the host poll 1555 interval. 1557 The dispersion statistic epsilon(t) represents the maximum error due 1558 to the frequency tolerance and time since the last packet was sent. 1559 It is initialized 1561 epsilon(t_0) = r.rho + s.rho + PHI * (T4-T1) 1563 when the measurement is made at t_0 according to the seconds counter. 1564 Here r.rho is the packet precision described in Section 7.3 and s.rho 1565 is the system precision described in Section 11.1, both expressed in 1566 seconds. These terms are necessary to account for the uncertainty in 1567 reading the system clock in both the server and the client. 1569 The dispersion then grows at constant rate PHI; in other words, at 1570 time t, epsilon(t) = epsilon(t_0) + PHI * (t-t_0). With the default 1571 value PHI = 15 PPM, this amounts to about 1.3 s per day. With this 1572 understanding, the argument t will be dropped and the dispersion 1573 represented simply as epsilon. The remaining statistics are computed 1574 by the clock filter algorithm described in the next section. 1576 10. Clock Filter Algorithm 1578 The clock filter algorithm, part of the peer process, is implemented 1579 in the clock_filter() routine of Appendix A.5.2. It grooms the 1580 stream of on-wire data to select the samples most likely to represent 1581 accurate time. The algorithm produces the variables shown in 1582 Figure 19, including the offset (theta), delay (delta), dispersion 1583 (epsilon), jitter (psi) and time of arrival (t). These data are used 1584 by the mitigation algorithms to determine the best and final offset 1585 used to discipline the system clock. They are also used to determine 1586 the server health and whether it is suitable for synchronization. 1588 The clock filter algorithm saves the most recent sample tuples 1589 (theta, delta, epsilon, t) in the filter structure, which functions 1590 as an 8-stage shift register. The tuples are saved in the order that 1591 packets arrive. Here t is the packet time of arrival according to 1592 the seconds counter and should not be confused with the peer variable 1593 tp. 1595 The following scheme is used to insure sufficient samples are in the 1596 filter and that old stale data are discarded. Initially, the tuples 1597 of all stages are set to the dummy tuple (0, MAXDISP, MAXDISP, 0). 1598 As valid packets arrive, tuples are shifted into the filter causing 1599 old tuples to be discarded, so eventually only valid tuples remain. 1600 If the three low order bits of the reach register are zero, 1601 indicating three poll intervals have expired with no valid packets 1602 received, the poll process calls the clock filter algorithm with a 1603 dummy tuple just as if the tuple had arrived from the network. If 1604 this persists for eight poll intervals, the register returns to the 1605 initial condition. 1607 In the next step the shift register stages are copied to a temporary 1608 list and the list sorted by increasing delta. Let i index the stages 1609 starting with the lowest delta. If the first tuple epoch t_0 is not 1610 later than the last valid sample epoch tp, the routine exits without 1611 affecting the current peer variables. Otherwise, let epsilon_i be 1612 the dispersion of the ith entry, then 1613 i=n-1 1614 --- epsilon_i 1615 epsilon = \ ---------- 1616 / (i+1) 1617 --- 2 1618 i=0 1620 is the peer dispersion p.disp. Note the overload of epsilon, whether 1621 input to the clock filter or output, the meaning should be clear from 1622 context. 1624 The observer should note (a) if all stages contain the dummy tuple 1625 with dispersion MAXDISP, the computed dispersion is a little less 1626 than 16 s, (b) each time a valid tuple is shifted into the register, 1627 the dispersion drops by a little less than half, depending on the 1628 valid tuples dispersion, (c) after the fourth valid packet the 1629 dispersion is usually a little less than 1 s, which is the assumed 1630 value of the MAXDIST parameter used by the selection algorithm to 1631 determine whether the peer variables are acceptable or not. 1633 Let the first stage offset in the sorted list be theta_0; then, for 1634 the other stages in any order, the jitter is the RMS average 1635 +----- -----+^1/2 1636 | n-1 | 1637 | --- | 1638 1 | \ 2 | 1639 psi = -------- * | / (theta_0-theta_j) | 1640 (n-1) | --- | 1641 | j=1 | 1642 +----- -----+ 1644 where n is the number of valid tuples in the filter (n > 1). In 1645 order to insure consistency and avoid divide exceptions in other 1646 computations, the psi is bounded from below by the system precision 1647 s.rho expressed in seconds. While not in general considered a major 1648 factor in ranking server quality, jitter is a valuable indicator of 1649 fundamental timekeeping performance and network congestion state. Of 1650 particular importance to the mitigation algorithms is the peer 1651 synchronization distance, which is computed from the delay and 1652 dispersion. 1654 lambda = (delta / 2) + epsilon. 1656 Note that epsilon and therefore lambda increase at rate PHI. The 1657 lambda is not a state variable, since lambda is recalculated at each 1658 use. It is a component of the root synchronization distance used by 1659 the mitigation algorithms as a metric to evaluate the quality of time 1660 available from each server. 1662 It is important to note that, unlike NTPv3, NTPv4 associations do not 1663 show a timeout condition by setting the stratum to 16 and leap 1664 indicator to 3. The association variables retain the values 1665 determined upon arrival of the last packet. In NTPv4 lambda 1666 increases with time, so eventually the synchronization distance 1667 exceeds the distance threshold MAXDIST, in which case the association 1668 is considered unfit for synchronization. 1670 11. System Process 1672 As each new sample (theta, delta, epsilon, jitter, t) is produced by 1673 the clock filter algorithm, all peer processes are scanned by the 1674 mitigation algorithms consisting of the selection, cluster, combine 1675 and clock discipline algorithms in the system process. The selection 1676 algorithm scans all associations and casts off the falsetickers, 1677 which have demonstrably incorrect time, leaving the truechimers as 1678 result. In a series of rounds the cluster algorithm discards the 1679 association statistically furthest from the centroid until a 1680 specified minimum number of survivors remain. The combine algorithm 1681 produces the best and final statistics on a weighted average basis. 1682 The final offset is passed to the clock discipline algorithm to steer 1683 the system clock to the correct time. 1685 The cluster algorithm selects one of the survivors as the system 1686 peer. The associated statistics (theta, delta, epsilon, jitter, t) 1687 are used to construct the system variables inherited by dependent 1688 servers and clients and made available to other applications running 1689 on the same machine. 1691 11.1. System Process Variables 1693 Figure 25 summarizes the common names, formula names and a short 1694 description of each system variable. Unless noted otherwise, all 1695 variables have assumed prefix s. 1697 +-----------+------------+------------------------+ 1698 | Name | Formula | Description | 1699 +-----------+------------+------------------------+ 1700 | t | t | update time | 1701 | p | p | system peer identifier | 1702 | leap | leap | leap indicator | 1703 | stratum | stratum | stratum | 1704 | precision | rho | precision | 1705 | offset | THETA | combined offset | 1706 | jitter | PSI | combined jitter | 1707 | rootdelay | DELTA | root delay | 1708 | rootdisp | EPSILON | root dispersion | 1709 | v | v | survivor list | 1710 | refid | refid | reference ID | 1711 | reftime | reftime | reference time | 1712 | NMIN | 3 | minimum survivors | 1713 | CMIN | 1 | minimum candidates | 1714 +-----------+------------+------------------------+ 1716 Figure 25: System Process Variables 1718 Except for the t, p, offset and jitter variables and the NMIN and 1719 CMIN constants, the variables have the same format and interpretation 1720 as the peer variables of the same name. The NMIN and CMIN parameters 1721 are used by the selection and cluster algorithms described in the 1722 next section. 1724 The t variable is the seconds counter at the last update determined 1725 by the clock_update() routine in Appendix A.5.5.4. The p variable is 1726 the system peer identifier determined by the cluster() routine in 1727 Section 11.2.2. The precision variable has the same format as the 1728 packet variable of the same name. The precision is defined as the 1729 larger of the resolution and time to read the clock, in log2 units. 1730 For instance, the precision of a mains-frequency clock incrementing 1731 at 60 Hz is 16 ms, even when the system clock hardware representation 1732 is to the nanosecond. 1734 The offset and jitter variables are determined by the combine() 1735 routine in Section 11.2.3. These values represent the best and final 1736 offset and jitter used to discipline the system clock. Initially, 1737 all variables are cleared to zero, then the leap is set to 3 1738 (unsynchronized) and stratum is set to MAXSTRAT (16). Remember that 1739 MAXSTRAT is mapped to zero in the transmitted packet. 1741 11.2. System Process Operations 1743 Figure 26 summarizes the system process operations performed by the 1744 clock_select() routine. The selection algorithm described in 1745 Section 11.2.1 produces a majority clique of presumed correct 1746 candidates (truechimers) based on agreement principles. The cluster 1747 algorithm described in Section 11.2.2 discards outlyers to produce 1748 the most accurate survivors. The combine algorithm described in 1749 Section 11.2.3 provides the best and final offset for the clock 1750 discipline algorithm described in Appendix A.5.5.6. If the selection 1751 algorithm cannot produce a majority clique, or if it cannot produce 1752 at least CMIN survivors, the system process exits without 1753 disciplining the system clock. If successful, the cluster algorithm 1754 selects the statistically best candidate as the system peer and its 1755 variables are inherited as the system variables. 1757 +-----------------+ 1758 | clock_select() | 1759 +-----------------+ 1760 ................................|........... 1761 . V . 1762 . yes +---------+ +-----------------+ . 1763 . +--| accept? | | scan candidates | . 1764 . | +---------+ | | . 1765 . V no | | | . 1766 . +---------+ | | | . 1767 . | add peer| | | | . 1768 . +---------- | | | . 1769 . | V | | . 1770 . +---------->-->| | . 1771 . | | . 1772 . Selection Algorithm +-----------------+ . 1773 .................................|.......... 1774 V 1775 no +-------------------+ 1776 +-------------| survivors? | 1777 | +-------------------+ 1778 | | yes 1779 | V 1780 | +-------------------+ 1781 | | Cluster Algorithm | 1782 | +-------------------+ 1783 | | 1784 | V 1785 V yes +-------------------+ 1786 |<------------| n < CMIN? | 1787 | +-------------------+ 1788 V | 1789 +-----------------+ V no 1790 | s.p = NULL | +-------------------+ 1791 +-----------------+ | s.p = v_0.p | 1792 | +-------------------+ 1793 V | 1794 +-----------------+ V 1795 | return (UNSYNC) | +-------------------+ 1796 +-----------------+ | return (SYNC) | 1797 +-------------------+ 1799 Figure 26: clock_select() Routine 1801 11.2.1. Selection Algorithm 1803 Note that the selection and cluster algorithms are described 1804 separately, but combined in the code skeleton. The selection 1805 algorithm operates to find an intersection interval containing a 1806 majority clique of truechimers using Byzantine agreement principles 1807 originally proposed by Marzullo [8], but modified to improve 1808 accuracy. An overview of the algorithm is given below and in the 1809 first half of the clock_select() routine in Appendix A.5.5.1. 1811 First, those servers which are unusable according to the rules of the 1812 protocol are detected and discarded by the accept() routine in 1813 Appendix A.5.5.3. Next, a set of tuples (p, type, edge) is generated 1814 for the remaining candidates. Here, p is the association identifier 1815 and type identifies the upper (+1), middle (0) and lower (-1) 1816 endpoints of a correctness interval centered on theta for that 1817 candidate. This results in three tuples, lowpoint (p, -1, theta - 1818 lambda), midpoint (p, 0, theta) and highpoint (p, +1, theta + 1819 lambda), where lambda is the root synchronization distance calculated 1820 on each use by the rootdist() routine in Appendix A.5.1.1. The steps 1821 of the algorithm are: 1823 1. For each of m associations, place three tuples as defined above 1824 on the candidate list. 1826 2. Sort the tuples on the list by the edge component. Order the 1827 lowpoint, midpoint and highpoint of these intervals from lowest to 1828 highest. Set the number of falsetickers f = 0. 1830 3. Set the number of midpoints d = 0. Set c = 0. Scan from lowest 1831 endpoint to highest. Add one to c for every lowpoint, subtract one 1832 for every highpoint, add one to d for every midpoint. If c >= m - f, 1833 stop; set l = current lowpoint. 1835 4. Set c = 0. Scan from highest endpoint to lowest. Add one to c 1836 for every highpoint, subtract one for every lowpoint, add one to d 1837 for every midpoint. If c >= m - f, stop; set u = current highpoint. 1839 5. Is d = f and l < u? If yes, then follow step 5A; else, follow 1840 step 5B. 1842 5A. Success: the intersection interval is [l, u]. 1844 5B. Add one to f. Is f < (m / 2)? If yes, then go to step 3 again. 1845 If no, then go to step 6. 1847 6. Failure; a majority clique could not be found. There are no 1848 suitable candidates to discipline the system clock. 1850 The algorithm is described in detail in Appendix A.5.5.1. Note that 1851 it starts with the assumption that there are no falsetickers (f = 0) 1852 and attempts to find a nonempty intersection interval containing the 1853 midpoints of all correct servers, i.e., truechimers. If a nonempty 1854 interval cannot be found, it increases the number of assumed 1855 falsetickers by one and tries again. If a nonempty interval is found 1856 and the number of falsetickers is less than the number of 1857 truechimers, a majority clique has been found and the midpoint of 1858 each truechimer (theta) represents the candidates available to the 1859 cluster algorithm. 1861 If a majority clique is not found, or if the number of truechimers is 1862 less than CMIN, there are insufficient candidates to discipline the 1863 system clock. CMIN defines the minimum number of servers consistent 1864 with the correctness requirements. Suspicious operators would set 1865 CMIN to insure multiple redundant servers are available for the 1866 algorithms to mitigate properly. However, for historic reasons the 1867 default value for CMIN is one. 1869 11.2.2. Cluster Algorithm 1871 The candidates of the majority clique are placed on the survivor list 1872 v in the form of tuples (p, theta_p, psi_p, lambda_p), where p is an 1873 association identifier, theta_p, psi_p, and stratum_p the current 1874 offset, jitter and stratum of association p, respectively, and 1875 lambda_p is a merit factor equal to stratum_p * MAXDIST + lambda, 1876 where lambda is the root synchronization distance for association p. 1877 The list is processed by the cluster algorithm below and the second 1878 half of the clock_select() algorithm in Appendix A.5.5.1. 1880 1. Let (p, theta_p, psi_p, lambda_p) represent a survivor candidate. 1882 2. Sort the candidates by increasing lambda_p. Let n be the number 1883 of candidates and NMIN the minimum required number of survivors. 1885 3. For each candidate compute the selection jitter psi_s: 1887 +----- -----+^1/2 1888 | n-1 | 1889 | --- | 1890 | 1 \ 2 | 1891 psi_s = | ---- * / (theta_s - theta_j) | 1892 | n-1 --- | 1893 | j=1 | 1894 +----- -----+ 1896 4. Select psi_max as the candidate with maximum psi_s. 1898 5. Select psi_min as the candidate with minimum psi_p. 1900 6. Is psi_max < psi_min or n <= NMIN? If yes, follow step 6A; 1901 otherwise, follow step 6B. 1903 6A. Done. The remaining candidates on the survivor list are ranked 1904 in the order of preference. The first entry on the list represents 1905 the system peer; its variables are used later to update the system 1906 variables. 1908 6B. Delete the outlyer candidate with psi_max; reduce n by one and go 1909 back to step 3. 1911 The algorithm operates in a series of rounds where each round 1912 discards the statistical outlyer with maximum selection jitter psi_s. 1913 However, if psi_s is less than the minimum peer jitter psi_p, no 1914 improvement is possible by discarding outlyers. This and the minimum 1915 number of survivors represent the terminating conditions of the 1916 algorithm. Upon termination, the final value of psi_max is saved as 1917 the system selection jitter PSI_s for use later. 1919 11.2.3. Combine Algorithm 1921 The remaining survivors are processed by the clock_combine() routine 1922 in Appendix A.5.5.5 to produce the best and final data for the clock 1923 discipline algorithm. The clock_combine() routine processes peer 1924 offset and jitter statistics to produce the combined system offset 1925 THETA and system peer jitter PSI_p, where each server statistic is 1926 weighted by the reciprocal of the root synchronization distance and 1927 the result normalized. 1929 The combined THETA is passed to the clock_update() routine in 1930 Appendix A.5.5.4. The first candidate on the survivor list is 1931 nominated as the system peer with identifier p. The system peer 1932 jitter PSI_p is a component of the system jitter PSI. It is used 1933 along with the selection jitter PSI_s to produce the system jitter: 1935 PSI = [(PSI_s)^2 + (PSI_p)^2]^1/2 1937 Each time an update is received from the system peer, the 1938 clock_update() routine in Appendix A.5.5.4 is called. By rule, an 1939 update is discarded if its time of arrival p.t is not strictly later 1940 than the last update used s.t. The labels IGNOR, PANIC, ADJ and STEP 1941 refer to return codes from the local_clock() routine described in the 1942 next section. 1944 IGNORE means the update has been ignored as an outlyer. PANIC means 1945 the offset is greater than the panic threshold PANICT (1000 s) and 1946 SHOULD cause the program to exit with a diagnostic message to the 1947 system log. STEP means the offset is less than the panic threshold, 1948 but greater than the step threshold STEPT (125 ms). In this case the 1949 clock is stepped to the correct offset, but since this means all peer 1950 data have been invalidated, all associations MUST be reset and the 1951 client begins as at initial start. 1953 ADJ means the offset is less than the step threshold and thus a valid 1954 update. In this case the system variables are updated from the peer 1955 variables as shown in Figure 28. 1957 +-------------------------------------------+ 1958 | System Variable <-- System Peer Variable | | 1959 +-------------------------------------------+ 1960 | s.leap <-- p.leap | 1961 | s.stratum <-- p.stratum + 1 | 1962 | s.offset <-- THETA | 1963 | s.jitter <-- PSI | 1964 | s.rootdelay <-- p.delta_r + delta | 1965 | s.rootdisp <-- p.epsilon_r + p.epsilon + | 1966 | p.psi + PHI * (s.t - p.t) | 1967 | + |THETA| | 1968 | s.refid <-- p.refid | 1969 | s.reftime <-- p.reftime | 1970 | s.t <-- p.t | 1971 +-------------------------------------------+ 1973 Figure 28: System Variables Update 1975 There is an important detail not shown. The dispersion increment 1976 (p.epsilon + p.psi + PHI * (s.t - p.t) + |THETA|) is bounded from 1977 below by MINDISP. In subnets with very fast processors and networks 1978 and very small delay and dispersion this forces a monotone-definite 1979 increase in s.rootdisp (EPSILON), which avoids loops between peers 1980 operating at the same stratum. 1982 The system variables are available to dependent application programs 1983 as nominal performance statistics. The system offset THETA is the 1984 clock offset relative to the available synchronization sources. The 1985 system jitter PSI is an estimate of the error in determining this 1986 value, elsewhere called the expected error. The root delay DELTA is 1987 the total round trip delay relative to the primary server. The root 1988 dispersion EPSILON is the dispersion accumulated over the network 1989 from the primary server. Finally, the root synchronization distance 1990 is defined 1992 LAMBDA = EPSILON + DELTA / 2, 1993 which represents the maximum error due all causes and is designated 1994 the root synchronization distance. 1996 11.3. Clock Discipline Algorithm 1998 The NTPv4 clock discipline algorithm, shortened to discipline in the 1999 following, functions as a combination of two philosophically quite 2000 different feedback control systems. In a phase-locked loop (PLL) 2001 design, periodic phase updates at update intervals mu seconds are 2002 used directly to minimize the time error and indirectly the frequency 2003 error. In a frequency-locked loop (FLL) design, periodic frequency 2004 updates at intervals mu are used directly to minimize the frequency 2005 error and indirectly the time error. As shown in [5], a PLL usually 2006 works better when network jitter dominates, while a FLL works better 2007 when oscillator wander dominates. This section contains an outline 2008 of how the NTPv4 design works. An in-depth discussion of the design 2009 principles is provided in [5], which also includes a performance 2010 analysis. 2012 The discipline is implemented as the feedback control system shown in 2013 Figure 29. The variable theta_r represents the combine algorithm 2014 offset (reference phase) and theta_c the VFO offset (control phase). 2015 Each update produces a signal V_d representing the instantaneous 2016 phase difference theta_r - theta_c. The clock filter for each server 2017 functions as a tapped delay line, with the output taken at the tap 2018 selected by the clock filter algorithm. The selection, cluster and 2019 combine algorithms combine the data from multiple filters to produce 2020 the signal V_s. The loop filter, with impulse response F(t), 2021 produces the signal V_c which controls the VFO frequency omega_c and 2022 thus the integral of the phase theta_c which closes the loop. The 2023 V_c signal is generated by the clock adjust process in Section 12. 2024 The detailed equations that implement these functions are best 2025 presented in the routines of Appendix A.5.5.6 and Appendix A.5.6.1. 2027 theta_r + +---------\ +----------------+ 2028 NTP --------->| Phase \ V_d | | V_s 2029 theta_c - | Detector ------>| Clock Filter |----+ 2030 +-------->| / | | | 2031 | +---------/ +----------------+ | 2032 | | 2033 ----- | 2034 / \ | 2035 | VFO | | 2036 \ / | 2037 ----- ....................................... | 2038 ^ . Loop Filter . | 2039 | . +---------+ x +-------------+ . | 2040 | V_c . | |<-----| | . | 2041 +------.-| Clock | y | Phase/Freq |<---------+ 2042 . | Adjust |<-----| Prediction | . 2043 . | | | | . 2044 . +---------+ +-------------+ . 2045 ....................................... 2047 Figure 29: Clock Discipline Feedback Loop 2049 Ordinarily, the pseudo-linear feedback loop described above operates 2050 to discipline the system clock. However, there are cases where a 2051 nonlinear algorithm offers considerable improvement. One case is 2052 when the discipline starts without knowledge of the intrinsic clock 2053 frequency. The pseudo-linear loop takes several hours to develop an 2054 accurate measurement and during most of that time the poll interval 2055 cannot be increased. The nonlinear loop described below does this in 2056 15 minutes. Another case is when occasional bursts of large jitter 2057 are present due to congested network links. The state machine 2058 described below resists error bursts lasting less than 15 minutes. 2060 Figure 30 contains a summary of the variables and parameters 2061 including the variables (lower case) or parameters (upper case) name, 2062 formula name and short description. Unless noted otherwise, all 2063 variables have assumed prefix c. The variables t, tc, state, hyster 2064 and count are integers; the remaining variables are floating doubles. 2065 The function of each will be explained in the algorithm descriptions 2066 below. 2068 +--------+------------+--------------------------+ 2069 | Name | Formula | Description | 2070 +--------+------------+--------------------------+ 2071 | t | timer | seconds counter | 2072 | offset | theta | combined offset | 2073 | resid | theta_r | residual offset | 2074 | freq | phi | clock frequency | 2075 | jitter | psi | clock offset jitter | 2076 | wander | omega | clock frequency wander | 2077 | tc | tau | time constant (log2) | 2078 | state | state | state | 2079 | adj | adj | frequency adjustment | 2080 | hyster | hyster | hysteresis counter | 2081 | STEPT | 125 | step threshold (.125 s) | 2082 | WATCH | 900 | stepout thresh(s) | 2083 | PANICT | 1000 | panic threshold (1000 s) | 2084 | LIMIT | 30 | hysteresis limit | 2085 | PGATE | 4 | hysteresis gate | 2086 | TC | 16 | time constant scale | 2087 | AVG | 8 | averaging constant | 2088 +--------+------------+--------------------------+ 2090 Figure 30: Clock Discipline Variables and Parameters 2092 The discipline is implemented by the local_clock() routine, which is 2093 called from the clock_update() routine. The local_clock() routine in 2094 Appendix A.5.5.6 has two parts; the first implements the clock state 2095 machine and the second determines the time constant and thus the poll 2096 interval. 2098 The local_clock() routine exits immediately if the offset is greater 2099 than the panic threshold PANICT (1000 s). The state transition 2100 function is implemented by the rstclock() function in 2101 Appendix A.5.5.7. Figure 31 shows the state transition function used 2102 by this routine. It has four columns showing respectively the state 2103 name, predicate and action if the offset theta is less than the step 2104 threshold, the predicate and actions otherwise, and finally some 2105 comments. 2107 +-------+---------------------+-------------------+--------------+ 2108 | State | theta < STEP | theta > STEP | Comments | 2109 +-------+---------------------+-------------------+--------------+ 2110 | NSET | ->FREQ | ->FREQ | no frequency | 2111 | | adjust time | step time | file | 2112 +-------+---------------------+-------------------+--------------+ 2113 | FSET | ->SYNC | ->SYNC | frequency | 2114 | | adjust time | step time | file | 2115 +-------+---------------------+-------------------+--------------+ 2116 | SPIK | ->SYNC | if < 900 s ->SPIK | outlyer | 2117 | | adjust freq | else ->SYNC | detected | 2118 | | adjust time | step freq | | 2119 | | | step time | | 2120 +-------+---------------------+-------------------+--------------+ 2121 | FREQ | if < 900 s ->FREQ | if < 900 s ->FREQ | initial | 2122 | | else ->SYNC | else ->SYNC | frequency | 2123 | | step freq | step freq | | 2124 | | adjust time | adjust time | | 2125 +-------+---------------------+-------------------+--------------+ 2126 | SYNC | ->SYNC | if < 900 s ->SPIK | normal | 2127 | | adjust freq | else ->SYNC | operation | 2128 | | adjust time | step freq | | 2129 | | | step time | | 2130 +-------+---------------------+-------------------+--------------+ 2132 Figure 31: State Transition Function 2134 In the table entries the next state is identified by the arrow -> 2135 with the actions listed below. Actions such as adjust time and 2136 adjust frequency are implemented by the PLL/FLL feedback loop in the 2137 local_clock() routine. A step clock action is implemented by setting 2138 the clock directly, but this is done only after the stepout threshold 2139 WATCH (900 s) when the offset is more than the step threshold STEPT 2140 (.125 s). This resists clock steps under conditions of extreme 2141 network congestion. 2143 The jitter (psi) and wander (omega) statistics are computed using an 2144 exponential average with weight factor AVG. The time constant 2145 exponent (tau) is determined by comparing psi with the magnitude of 2146 the current offset theta. If the offset is greater than PGATE (4) 2147 times the clock jitter, the hysteresis counter hyster is reduced by 2148 two; otherwise, it is increased by one. If hyster increases to the 2149 upper limit LIMIT (30), tau is increased by one; if it decreases to 2150 the lower limit -LIMIT (-30), tau is decreased by one. Normally, tau 2151 hovers near MAXPOLL, but quickly decreases if a temperature spike 2152 causes a frequency surge. 2154 12. Clock Adjust Process 2156 The actual clock adjustment is performed by the clock_adjust() 2157 routine in Appendix A.5.6.1. It runs at one-second intervals to add 2158 the frequency correction and a fixed percentage of the residual 2159 offset theta_r. The theta_r is in effect the exponential decay of 2160 the theta value produced by the loop filter at each update. The TC 2161 parameter scales the time constant to match the poll interval for 2162 convenience. Note that the dispersion EPSILON increases by PHI at 2163 each second. 2165 The clock adjust process includes a timer interrupt facility driving 2166 the seconds counter c.t. It begins at zero when the service starts 2167 and increments once each second. At each interrupt the 2168 clock_adjust() routine is called to incorporate the clock discipline 2169 time and frequency adjustments, then the associations are scanned to 2170 determine if the seconds counter equals or exceeds the p.next state 2171 variable defined in the next section. If so, the poll process is 2172 called to send a packet and compute the next p.next value. 2174 13. Poll Process 2176 Each association supports a poll process that runs at regular 2177 intervals to construct and send packets in symmetric, client and 2178 broadcast server associations. It runs continuously, whether or not 2179 servers are reachable in order to manage the clock filter and reach 2180 register. 2182 13.1. Poll Process Variables 2184 Figure 32 summarizes the common names, formula names and a short 2185 description of the poll process variables(lower case) and parameters 2186 (upper case). Unless noted otherwise, all variables have assumed 2187 prefix p. 2189 +---------+---------+--------------------+ 2190 | Name | Formula | Description | 2191 +---------+---------+--------------------+ 2192 | hpoll | hpoll | host poll exponent | 2193 | last | last | last poll time | 2194 | next | next | next poll time | 2195 | reach | reach | reach register | 2196 | unreach | unreach | unreach counter | 2197 | UNREACH | 24 | unreach limit | 2198 | BCOUNT | 8 | burst count | 2199 | BURST | flag | burst enable | 2200 | IBURST | flag | iburst enable | 2201 +---------+---------+--------------------+ 2203 Figure 32: Poll Process Variables and Parameters 2205 The poll process variables are allocated in the association data 2206 structure along with the peer process variables. Following is a 2207 detailed description of the variables. The parameters will be called 2208 out in the following text. 2210 hpoll: signed integer representing the poll exponent, in log2 seconds 2212 last: integer representing the seconds counter when the most recent 2213 packet was sent 2215 next: integer representing the seconds counter when the next packet 2216 is to be sent 2218 reach: 8-bit integer shift register shared by the peer and poll 2219 processes 2221 unreach: integer representing the number of seconds the server has 2222 been unreachable 2224 13.2. Poll Process Operations 2226 As described previously, once each second the clock_adjust() routine 2227 in the clock adjust process is called. This routine calls the poll() 2228 routine in Appendix A.5.7.1 for each association in turn. If the 2229 time for the next poll message is greater than the seconds counter, 2230 the routine returns immediately. Symmetric (modes 1, 2), client 2231 (mode 3) and broadcast server (mode 5) associations routinely send 2232 packets. A broadcast client (mode 6) association runs the routine to 2233 update the reach and unreach variables, but does not send packets. 2234 The poll() routine calls the peer_xmit() routine in Appendix A.5.7.3 2235 to send a packet. If in a burst (burst > 0), nothing further is done 2236 except call the poll_update() routine to set the next poll interval. 2238 If not in a burst, the reach variable is shifted left by one bit, 2239 with zero replacing the rightmost bit. If the server has not been 2240 heard for the last three poll intervals, the clock_filter() routine 2241 is called to increase the dispersion as described in 2242 Appendix A.5.7.3. 2244 If the BURST flag is lit and the server is reachable and a valid 2245 source of synchronization is available, the client sends a burst of 2246 BCOUNT (8) packets at each poll interval. The interval between 2247 packets in the burst is two seconds. This is useful to accurately 2248 measure jitter with long poll intervals. If the IBURST flag is lit 2249 and this is the first packet sent when the server has been 2250 unreachable, the client sends a burst. This is useful to quickly 2251 reduce the synchronization distance below the distance threshold and 2252 synchronize the clock. 2254 If the P_MANY flag is lit in the p.flags word of the association, 2255 this is a manycast client association. Manycast client associations 2256 send client mode packets to designated multicast group addresses at 2257 MINPOLL intervals. The association starts out with a TTL of 1. If 2258 by the time of the next poll there are fewer than MINCLOCK servers 2259 have been mobilized, the ttl is increased by one. If the ttl reaches 2260 the limit TTLMAX, without finding MINCLOCK servers, the poll interval 2261 increases until reaching BEACON, when it starts over from the 2262 beginning. 2264 The poll() routine includes a feature that backs off the poll 2265 interval if the server becomes unreachable. If reach is nonzero, the 2266 server is reachable and unreach is set to zero; otherwise, unreach is 2267 incremented by one for each poll to the maximum UNREACH. Thereafter 2268 for each poll hpoll is increased by one, which doubles the poll 2269 interval up to the maximum MAXPOLL determined by the poll_update() 2270 routine. When the server again becomes reachable, unreach is set to 2271 zero, hpoll is reset to the tc system variable and operation resumes 2272 normally. 2274 A packet is sent by the xmit_packet() routine in Appendix A.3. Some 2275 header values are copied from the peer variables left by a previous 2276 packet and others from the system variables. Figure 33 shows which 2277 values are copied to each header field. In those implementations 2278 using floating double data types for root delay and root dispersion, 2279 these must be converted to NTP short format. All other fields are 2280 either copied intact from peer and system variables or struck as a 2281 timestamp from the system clock. 2283 +-----------------------------------+ 2284 | Packet Variable <-- Variable | 2285 +-----------------------------------+ 2286 | x.leap <-- s.leap | 2287 | x.version <-- s.version | 2288 | x.mode <-- s.mode | 2289 | x.stratum <-- s.stratum | 2290 | x.poll <-- s.poll | 2291 | x.precision <-- s.precision | 2292 | x.rootdelay <-- s.rootdelay | 2293 | x.rootdisp <-- s.rootdisp | 2294 | x.refid <-- s.refid | 2295 | x.reftime <-- s.reftime | 2296 | x.org <-- p.xmt | 2297 | x.rec <-- p.dst | 2298 | x.xmt <-- clock | 2299 | x.keyid <-- p.keyid | 2300 | x.digest <-- md5 digest | 2301 +-----------------------------------+ 2303 Figure 33: xmit_packet Packet Header 2305 The poll_update() routine shown in Appendix A.5.7.2 is called when a 2306 valid packet is received and immediately after a poll message has 2307 been sent. If in a burst, the poll interval is fixed at 2 s; 2308 otherwise, the host poll exponent hpoll is set to the minimum of 2309 ppoll from the last packet received and hpoll from the poll() 2310 routine, but not less than MINPOLL nor greater than MAXPOLL. Thus 2311 the clock discipline can be oversampled, but not undersampled. This 2312 is necessary to preserve subnet dynamic behavior and protect against 2313 protocol errors. 2315 The poll exponent is converted to an interval which when added to the 2316 last variable determines the next variable and thus the time for the 2317 next poll. Finally, the last variable is set to the current seconds 2318 counter. 2320 14. Simple Network Time Protocol (SNTP) 2322 Primary servers and clients complying with a subset of NTP, called 2323 the Simple Network Time Protocol (SNTPv4) [2], do not need to 2324 implement the mitigation algorithms described in Section 9 and 2325 following sections. SNTP is intended for primary servers equipped 2326 with a single reference clock, as well as for clients with a single 2327 upstream server and no dependent clients. The fully developed NTPv4 2328 implementation is intended for secondary servers with multiple 2329 upstream servers and multiple downstream servers or clients. Other 2330 than these considerations, NTP and SNTP servers and clients are 2331 completely interoperable and can be intermixed in NTP subnets. 2333 An SNTP primary server implementing the on-wire protocol described in 2334 Section 8 has no upstream servers except a single reference clock. 2335 In principle, it is indistinguishable from an NTP primary server that 2336 has the mitigation algorithms and therefore capable of mitigating 2337 between multiple reference clocks. 2339 Upon receiving a client request, an SNTP primary server constructs 2340 and sends the reply packet as described in Figure 34. Note that the 2341 dispersion field in the packet header must be updated as described in 2342 Section 5. 2344 +-----------------------------------+ 2345 | Packet Variable <-- Variable | 2346 +-----------------------------------+ 2347 | x.leap <-- s.leap | 2348 | x.version <-- r.version | 2349 | x.mode <-- 4 | 2350 | x.stratum <-- s.stratum | 2351 | x.poll <-- r.poll | 2352 | x.precision <-- s.precision | 2353 | x.rootdelay <-- s.rootdelay | 2354 | x.rootdisp <-- s.rootdisp | 2355 | x.refid <-- s.refid | 2356 | x.reftime <-- s.reftime | 2357 | x.org <-- r.xmt | 2358 | x.rec <-- r.dst | 2359 | x.xmt <-- clock | 2360 | x.keyid <-- r.keyid | 2361 | x.digest <-- md5 digest | 2362 +-----------------------------------+ 2364 Figure 34: fast_xmit Packet Header 2366 A SNTP client implementing the on-wire protocol has a single server 2367 and no dependent clients. It can operate with any subset of the NTP 2368 on-wire protocol, the simplest approach using only the transmit 2369 timestamp of the server packet and ignoring all other fields. 2370 However, the additional complexity to implement the full on-wire 2371 protocol is minimal so that a full implementation is encouraged. 2373 15. Security Considerations 2375 NTPv4 provides an optional authentication field that utilizes the MD5 2376 algorithm. MD5, as the case for SHA-1, is derived from MD4, which 2377 has long been known to be weak. In 2004, techniques for efficiently 2378 finding collisions in MD5 were announced. A summary of the 2379 apropriateness of MD5 can be found in [9]. 2381 In the case of NTP as specified herein, NTP broadcast clients are 2382 vulnerable to disruption by misbehaving or hostile SNTP or NTP 2383 broadcast servers elsewhere in the Internet. Access controls and/or 2384 cryptographic authentication means should be provided for additional 2385 security in such cases. 2387 Section 8 describes a potential security concern with the replay of 2388 client requests. Following the recommendations in that section 2389 provides protection against such attacks. 2391 16. IANA Considerations 2393 UDP/TCP Port 123 was previously assigned by IANA for this protocol. 2394 The IANA has assigned the IPv4 multicast group address 224.0.1.1 and 2395 the IPv6 multicast address ending :101 for NTP. This document 2396 introduces NTP extension fields allowing for the development of 2397 future extensions to the protocol, where a particular extension is to 2398 be identified by the Field Type sub-field within the extension field. 2399 IANA is requested to establish and maintain a registry for Extension 2400 Field Types associated with this protocol, populating this registry 2401 with no initial entries. As future needs arise, new Extension Field 2402 Types may be defined. Following the policies outlined in [10], new 2403 values are to be defined by IETF consensus. 2405 The IANA is requested to create a new registry for NTP Reference 2406 Identifier codes. This should include the current codes defined in 2407 Section 7.3, and may be extended on a FCFS basis. The format of the 2408 registry is: 2410 +------+----------------------------------------------------------+ 2411 | ID | Clock Source | 2412 +------+----------------------------------------------------------+ 2413 | GOES | Geosynchronous Orbit Environment Satellite | 2414 | GPS | Global Position System | 2415 | ... | ... | 2416 +------+----------------------------------------------------------+ 2418 Figure 35: Reference Identifier Codes 2420 The IANA is requested to create a new registry for NTP Kiss-o'-Death 2421 codes. This should include the current codes defined in Section 7.4, 2422 and may be extended on a FCFS basis. The format of the registry is: 2424 +------+------------------------------------------------------------+ 2425 | Code | Meaning | 2426 +------+------------------------------------------------------------+ 2427 | ACST | The association belongs to a unicast server | 2428 | AUTH | Server authentication failed | 2429 | ... | ... | 2430 +------+------------------------------------------------------------+ 2432 Figure 36: Kiss Codes 2434 For both Reference Identifiers and Kiss-o'-Death codes, IANA is 2435 requested to never assign a code begining with the character "X", as 2436 this is reserved for experimentation and development. 2438 17. Acknowledgements 2440 The editors would like to thank Karen O'Donoghue, Brian Haberman, 2441 Greg Dowd, Mark Elliot, and Harlan Stenn for technical reviews of 2442 this document. 2444 18. References 2446 18.1. Informative References 2448 [1] Mills, D., "Network Time Protocol (Version 3) Specification, 2449 Implementation", RFC 1305, March 1992. 2451 [2] Mills, D., "Simple Network Time Protocol (SNTP) Version 4 for 2452 IPv4, IPv6 and OSI", RFC 4330, January 2006. 2454 [3] Mills, D.L., "The Autokey security architecture, protocol and 2455 algorithms. Electrical and Computer Engineering Technical 2456 Report 06-1-1", NDSS , January 2006. 2458 [4] Mills, D.L., Electrical and Computer Engineering Technical 2459 Report 06-6-1, NDSS, June 2006., "Network Time Protocol Version 2460 4 Reference and Implementation Guide.", 2006. 2462 [5] Mills, D.L., "Computer Network Time Synchronization - the 2463 Network Time Protocol. CRC Press, 304pp.", 2006. 2465 [6] International Telecommunications Union, "ITU-R TF.460 Standard- 2466 frequency and time-signal emissions", February 2002. 2468 [7] Bureau International des Poids et Mesures, "Comptes Rendus de 2469 la 15e CGPM", 1976. 2471 [8] Marzullo and S. Owicki, "Maintaining the time in a distributed 2472 system.", ACM Operating Systems Review 19 , July 1985. 2474 [9] Bellovin, S. and E. Rescorla, Proceedings of the 13th annual 2475 ISOC Network and Distributed System Security Symposium, 2476 "Deploying a new Hash Algorithm", February 2006. 2478 [10] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 2479 Considerations Section in RFCs", BCP 26, RFC 2434, 2480 October 1998. 2482 18.2. Normative References 2484 [11] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 2485 September 1981. 2487 [12] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2488 Levels", BCP 14, RFC 2119, March 1997. 2490 [13] Postel, J., "Internet Protocol", STD 5, RFC 791, 2491 September 1981. 2493 [14] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 2494 August 1980. 2496 [15] Simonsen, K., "Character Mnemonics and Character Sets", 2497 RFC 1345, June 1992. 2499 [16] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 2500 April 1992. 2502 Appendix A. Code Skeleton 2504 This appendix is intended to describe the protocol and algorithms of 2505 an implementation in a general way using what is called a code 2506 skeleton program. This consists of a set of definitions, structures 2507 and code fragments which illustrate the protocol operations without 2508 the complexities of an actual implementation of the protocol. This 2509 program is not an executable and is not designed to run in the 2510 ordinary sense. It is designed to be compiled only in order to 2511 verify consistent variable and type usage. 2513 Most of the features of the reference implementation are included 2514 here, with the following exceptions: There are no provisions for 2515 reference clocks or public key (Autokey) cryptography. There is no 2516 huff-n'-puff filter, anti-clockhop hysteresis or monitoring 2517 provisions. Many of the values that can be tinkered in the reference 2518 implementation are assumed constants here. There are only minimal 2519 provisions for the kiss-o'death packet and no responding code. 2521 The program is not intended to be fast or compact, just to 2522 demonstrate the algorithms with sufficient fidelity to understand how 2523 they work. The code skeleton consists of eight segments, a header 2524 segment included by each of the other segments, plus a code segment 2525 for the main program, kernel I/O and system clock interfaces, and 2526 peer, system, clock_adjust and poll processes. These are presented 2527 in order below along with definitions and variables specific to each 2528 process. 2530 A.1. Global Definitions 2532 A.1.1. Definitions, Constants, Parameters 2534 #include /* avoids complaints about sqrt() */ 2535 #include /* for gettimeofday() and friends */ 2536 #include /* for malloc() and friends */ 2538 /* 2539 * Data types 2540 * 2541 * This program assumes the int data type is 32 bits and the long data 2542 * type is 64 bits. The native data type used in most calculations is 2543 * floating double. The data types used in some packet header fields 2544 * require conversion to and from this representation. Some header 2545 * fields involve partitioning an octet, here represeted by individual 2546 * octets. 2547 * 2548 * The 64-bit NTP timestamp format used in timestamp calculations is 2549 * unsigned seconds and fraction with the decimal point to the left of 2550 * bit 32. The only operation permitted with these values is 2551 * subtraction, yielding a signed 31-bit difference. The 32-bit NTP 2552 * short format used in delay and dispersion calculations is seconds and 2553 * fraction with the decimal point to the left of bit 16. The only 2554 * operations permitted with these values are addition and 2555 * multiplication by a constant. 2556 * 2557 * The IPv4 address is 32 bits, while the IPv6 address is 128 bits. The 2558 * message digest field is 128 bits as constructed by the MD5 algorithm. 2559 * The precision and poll interval fields are signed log2 seconds. 2560 */ 2561 typedef unsigned long tstamp; /* NTP timestamp format */ 2562 typedef unsigned int tdist; /* NTP short format */ 2563 typedef unsigned long ipaddr; /* IPv4 or IPv6 address */ 2564 typedef unsigned long digest; /* md5 digest */ 2565 typedef signed char s_char; /* precision and poll interval (log2) */ 2566 /* 2567 * Timestamp conversion macroni 2568 */ 2569 #define FRIC 65536. /* 2^16 as a double */ 2570 #define D2FP(r) ((tdist)((r) * FRIC)) /* NTP short */ 2571 #define FP2D(r) ((double)(r) / FRIC) 2573 #define FRAC 4294967296. /* 2^32 as a double */ 2574 #define D2LFP(a) ((tstamp)((a) * FRAC)) /* NTP timestamp */ 2575 #define LFP2D(a) ((double)(a) / FRAC) 2576 #define U2LFP(a) ((a).tv_sec + JAN_1970 << 32 + (unsigned long) \ 2577 ((a).tv_usec / 1e6 * FRAC)) 2579 /* 2580 * Arithmetic conversions 2581 */ 2582 #define LOG2D(a) ((a) < 0 ? 1. / (1L << -(a)) : \ 2583 1L << (a)) /* poll, etc. */ 2584 #define SQUARE(x) (x * x) 2585 #define SQRT(x) (sqrt(x)) 2587 /* 2588 * Global constants. Some of these might be converted to variables 2589 * which can be tinkered by configuration or computed on-fly. For 2590 * instance, the reference implementation computes PRECISION on-fly and 2591 * provides performance tuning for the defines marked with % below. 2592 */ 2593 #define VERSION 4 /* version number */ 2594 #define MINDISP .01 /* % minimum dispersion (s) */ 2595 #define MAXDISP 16 /* maximum dispersion (s) */ 2596 #define MAXDIST 1 /* % distance threshold (s) */ 2597 #define NOSYNC 0x3 /* leap unsync */ 2598 #define MAXSTRAT 16 /* maximum stratum (infinity metric) */ 2599 #define MINPOLL 6 /* % minimum poll interval (64 s)*/ 2600 #define MAXPOLL 17 /* % maximum poll interval (36.4 h) */ 2601 #define MINCLOCK 3 /* minimum manycast survivors */ 2602 #define MAXCLOCK 10 /* maximum manycast candidates */ 2603 #define TTLMAX 8 /* max ttl manycast */ 2604 #define BEACON 15 /* max interval between beacons */ 2606 #define PHI 15e-6 /* % frequency tolerance (15 PPM) */ 2607 #define NSTAGE 8 /* clock register stages */ 2608 #define NMAX 50 /* maximum number of peers */ 2609 #define NSANE 1 /* % minimum intersection survivors */ 2610 #define NMIN 3 /* % minimum cluster survivors */ 2612 /* 2613 * Global return values 2614 */ 2615 #define TRUE 1 /* boolean true */ 2616 #define FALSE 0 /* boolean false */ 2617 #define NULL 0 /* empty pointer */ 2619 /* 2620 * Local clock process return codes 2621 */ 2622 #define IGNORE 0 /* ignore */ 2623 #define SLEW 1 /* slew adjustment */ 2624 #define STEP 2 /* step adjustment */ 2625 #define PANIC 3 /* panic - no adjustment */ 2627 /* 2628 * System flags 2629 */ 2630 #define S_FLAGS 0 /* any system flags */ 2631 #define S_BCSTENAB 0x1 /* enable broadcast client */ 2633 /* 2634 * Peer flags 2635 */ 2636 #define P_FLAGS 0 /* any peer flags */ 2637 #define P_EPHEM 0x01 /* association is ephemeral */ 2638 #define P_BURST 0x02 /* burst enable */ 2639 #define P_IBURST 0x04 /* intial burst enable */ 2640 #define P_NOTRUST 0x08 /* authenticated access */ 2641 #define P_NOPEER 0x10 /* authenticated mobilization */ 2642 #define P_MANY 0x20 /* manycast client */ 2644 /* 2645 * Authentication codes 2646 */ 2647 #define A_NONE 0 /* no authentication */ 2648 #define A_OK 1 /* authentication OK */ 2649 #define A_ERROR 2 /* authentication error */ 2650 #define A_CRYPTO 3 /* crypto-NAK */ 2652 /* 2653 * Association state codes 2654 */ 2655 #define X_INIT 0 /* initialization */ 2656 #define X_STALE 1 /* timeout */ 2657 #define X_STEP 2 /* time step */ 2658 #define X_ERROR 3 /* authentication error */ 2659 #define X_CRYPTO 4 /* crypto-NAK received */ 2660 #define X_NKEY 5 /* untrusted key */ 2661 /* 2662 * Protocol mode definitionss 2663 */ 2664 #define M_RSVD 0 /* reserved */ 2665 #define M_SACT 1 /* symmetric active */ 2666 #define M_PASV 2 /* symmetric passive */ 2667 #define M_CLNT 3 /* client */ 2668 #define M_SERV 4 /* server */ 2669 #define M_BCST 5 /* broadcast server */ 2670 #define M_BCLN 6 /* broadcast client */ 2672 /* 2673 * Clock state definitions 2674 */ 2675 #define NSET 0 /* clock never set */ 2676 #define FSET 1 /* frequency set from file */ 2677 #define SPIK 2 /* spike detected */ 2678 #define FREQ 3 /* frequency mode */ 2679 #define SYNC 4 /* clock synchronized */ 2681 A.1.2. Packet Data Structures 2683 /* 2684 * The receive and transmit packets may contain an optional message 2685 * authentication code (MAC) consisting of a key identifier (keyid) and 2686 * message digest (mac). NTPv4 supports optional extension fields which 2687 * are inserted after the the header and before the MAC, but these are 2688 * not described here. 2689 * 2690 * Receive packet 2691 * 2692 * Note the dst timestamp is not part of the packet itself. It is 2693 * captured upon arrival and returned in the receive buffer along with 2694 * the buffer length and data. Note that some of the char fields are 2695 * packed in the actual header, but the details are omited here. 2696 */ 2697 struct r { 2698 ipaddr srcaddr; /* source (remote) address */ 2699 ipaddr dstaddr; /* destination (local) address */ 2700 char version; /* version number */ 2701 char leap; /* leap indicator */ 2702 char mode; /* mode */ 2703 char stratum; /* stratum */ 2704 char poll; /* poll interval */ 2705 s_char precision; /* precision */ 2706 tdist rootdelay; /* root delay */ 2707 tdist rootdisp; /* root dispersion */ 2708 char refid; /* reference ID */ 2709 tstamp reftime; /* reference time */ 2710 tstamp org; /* origin timestamp */ 2711 tstamp rec; /* receive timestamp */ 2712 tstamp xmt; /* transmit timestamp */ 2713 int keyid; /* key ID */ 2714 digest mac; /* message digest */ 2715 tstamp dst; /* destination timestamp */ 2716 } r; 2718 /* 2719 * Transmit packet 2720 */ 2721 struct x { 2722 ipaddr dstaddr; /* source (local) address */ 2723 ipaddr srcaddr; /* destination (remote) address */ 2724 char version; /* version number */ 2725 char leap; /* leap indicator */ 2726 char mode; /* mode */ 2727 char stratum; /* stratum */ 2728 char poll; /* poll interval */ 2729 s_char precision; /* precision */ 2730 tdist rootdelay; /* root delay */ 2731 tdist rootdisp; /* root dispersion */ 2732 char refid; /* reference ID */ 2733 tstamp reftime; /* reference time */ 2734 tstamp org; /* origin timestamp */ 2735 tstamp rec; /* receive timestamp */ 2736 tstamp xmt; /* transmit timestamp */ 2737 int keyid; /* key ID */ 2738 digest mac; /* message digest */ 2739 } x; 2741 A.1.3. Association Data Structures 2743 /* 2744 * Filter stage structure. Note the t member in this and other 2745 * structures refers to process time, not real time. Process time 2746 * increments by one second for every elapsed second of real time. 2747 */ 2748 struct f { 2749 tstamp t; /* update time */ 2750 double offset; /* clock ofset */ 2751 double delay; /* roundtrip delay */ 2752 double disp; /* dispersion */ 2753 } f; 2755 /* 2756 * Association structure. This is shared between the peer process and 2757 * poll process. 2758 */ 2759 struct p { 2761 /* 2762 * Variables set by configuration 2763 */ 2764 ipaddr srcaddr; /* source (remote) address */ 2765 ipaddr dstaddr; /* destination (local) address */ 2766 char version; /* version number */ 2767 char hmode; /* host mode */ 2768 int keyid; /* key identifier */ 2769 int flags; /* option flags */ 2771 /* 2772 * Variables set by received packet 2773 */ 2774 char leap; /* leap indicator */ 2775 char pmode; /* peer mode */ 2776 char stratum; /* stratum */ 2777 char ppoll; /* peer poll interval */ 2778 double rootdelay; /* root delay */ 2779 double rootdisp; /* root dispersion */ 2780 char refid; /* reference ID */ 2781 tstamp reftime; /* reference time */ 2782 #define begin_clear org /* beginning of clear area */ 2783 tstamp org; /* originate timestamp */ 2784 tstamp rec; /* receive timestamp */ 2785 tstamp xmt; /* transmit timestamp */ 2787 /* 2788 * Computed data 2789 */ 2790 double t; /* update time */ 2791 struct f f[NSTAGE]; /* clock filter */ 2792 double offset; /* peer offset */ 2793 double delay; /* peer delay */ 2794 double disp; /* peer dispersion */ 2795 double jitter; /* RMS jitter */ 2797 /* 2798 * Poll process variables 2799 */ 2800 char hpoll; /* host poll interval */ 2801 int burst; /* burst counter */ 2802 int reach; /* reach register */ 2803 int ttl; /* ttl (manycast) */ 2804 #define end_clear unreach /* end of clear area */ 2805 int unreach; /* unreach counter */ 2806 int outdate; /* last poll time */ 2807 int nextdate; /* next poll time */ 2808 } p; 2810 A.1.4. System Data Structures 2812 /* 2813 * Chime list. This is used by the intersection algorithm. 2814 */ 2815 struct m { /* m is for Marzullo */ 2816 struct p *p; /* peer structure pointer */ 2817 int type; /* high +1, mid 0, low -1 */ 2818 double edge; /* correctness interval edge */ 2819 } m; 2821 /* 2822 * Survivor list. This is used by the clustering algorithm. 2823 */ 2824 struct v { 2825 struct p *p; /* peer structure pointer */ 2826 double metric; /* sort metric */ 2827 } v; 2829 /* 2830 * System structure 2831 */ 2832 struct s { 2833 tstamp t; /* update time */ 2834 char leap; /* leap indicator */ 2835 char stratum; /* stratum */ 2836 char poll; /* poll interval */ 2837 char precision; /* precision */ 2838 double rootdelay; /* root delay */ 2839 double rootdisp; /* root dispersion */ 2840 char refid; /* reference ID */ 2841 tstamp reftime; /* reference time */ 2842 struct m m[NMAX]; /* chime list */ 2843 struct v v[NMAX]; /* survivor list */ 2844 struct p *p; /* association ID */ 2845 double offset; /* combined offset */ 2846 double jitter; /* combined jitter */ 2847 int flags; /* option flags */ 2848 int n; /* number of survivors */ 2849 } s; 2851 A.1.5. Local Clock Data Structures 2853 /* 2854 * Local clock structure 2855 */ 2856 struct c { 2857 tstamp t; /* update time */ 2858 int state; /* current state */ 2859 double offset; /* current offset */ 2860 double last; /* previous offset */ 2861 int count; /* jiggle counter */ 2862 double freq; /* frequency */ 2863 double jitter; /* RMS jitter */ 2864 double wander; /* RMS wander */ 2865 } c; 2867 A.1.6. Function Prototypes 2869 /* 2870 * Peer process 2871 */ 2872 void receive(struct r *); /* receive packet */ 2873 void packet(struct p *, struct r *); /* process packet */ 2874 void clock_filter(struct p *, double, double, double); /* filter */ 2875 double root_dist(struct p *); /* calculate root distance */ 2876 int fit(struct p *); /* determine fitness of server */ 2877 void clear(struct p *, int); /* clear association */ 2878 int access(struct r *); /* determine access restrictions */ 2880 /* 2881 * System process 2882 */ 2883 int main(); /* main program */ 2884 void clock_select(); /* find the best clocks */ 2885 void clock_update(struct p *); /* update the system clock */ 2886 void clock_combine(); /* combine the offsets */ 2888 /* 2889 * Local clock process 2890 */ 2891 int local_clock(struct p *, double); /* clock discipline */ 2892 void rstclock(int, double, double); /* clock state transition */ 2894 /* 2895 * Clock adjust process 2896 */ 2897 void clock_adjust(); /* one-second timer process */ 2898 /* 2899 * Poll process 2900 */ 2901 void poll(struct p *); /* poll process */ 2902 void poll_update(struct p *, int); /* update the poll interval */ 2903 void peer_xmit(struct p *); /* transmit a packet */ 2904 void fast_xmit(struct r *, int, int); /* transmit a reply packet */ 2906 /* 2907 * Utility routines 2908 */ 2909 digest md5(int); /* generate a message digest */ 2910 struct p *mobilize(ipaddr, ipaddr, int, int, int, int); /* mobilize */ 2911 struct p *find_assoc(struct r *); /* search the association table */ 2913 /* 2914 * Kernel interface 2915 */ 2916 struct r *recv_packet(); /* wait for packet */ 2917 void xmit_packet(struct x *); /* send packet */ 2918 void step_time(double); /* step time */ 2919 void adjust_time(double); /* adjust (slew) time */ 2920 tstamp get_time(); /* read time */ 2922 A.2. Main Program and Utility Routines 2924 /* 2925 * Definitions 2926 */ 2927 #define PRECISION -18 /* precision (log2 s) */ 2928 #define IPADDR 0 /* any IP address */ 2929 #define MODE 0 /* any NTP mode */ 2930 #define KEYID 0 /* any key identifier */ 2932 /* 2933 * main() - main program 2934 */ 2935 int 2936 main() 2937 { 2938 struct p *p; /* peer structure pointer */ 2939 struct r *r; /* receive packet pointer */ 2941 /* 2942 * Read command line options and initialize system variables. 2943 * The reference implementation measures the precision specific 2944 * to each machine by measuring the clock increments to read the 2945 * system clock. 2947 */ 2948 memset(&s, sizeof(s), 0); 2949 s.leap = NOSYNC; 2950 s.stratum = MAXSTRAT; 2951 s.poll = MINPOLL; 2952 s.precision = PRECISION; 2953 s.p = NULL; 2955 /* 2956 * Initialize local clock variables 2957 */ 2958 memset(&c, sizeof(c), 0); 2959 if (/* frequency file */ 0) { 2960 c.freq = /* freq */ 0; 2961 rstclock(FSET, 0, 0); 2962 } else { 2963 rstclock(NSET, 0, 0); 2964 } 2965 c.jitter = LOG2D(s.precision); 2967 /* 2968 * Read the configuration file and mobilize persistent 2969 * associations with spcified addresses, version, mode, key ID 2970 * and flags. 2971 */ 2972 while (/* mobilize configurated associations */ 0) { 2973 p = mobilize(IPADDR, IPADDR, VERSION, MODE, KEYID, 2974 P_FLAGS); 2975 } 2977 /* 2978 * Start the system timer, which ticks once per second. Then 2979 * read packets as they arrive, strike receive timestamp and 2980 * call the receive() routine. 2981 */ 2982 while (0) { 2983 r = recv_packet(); 2984 r->dst = get_time(); 2985 receive(r); 2986 } 2987 } 2989 /* 2990 * mobilize() - mobilize and initialize an association 2991 */ 2992 struct p 2993 *mobilize( 2994 ipaddr srcaddr, /* IP source address */ 2995 ipaddr dstaddr, /* IP destination address */ 2996 int version, /* version */ 2997 int mode, /* host mode */ 2998 int keyid, /* key identifier */ 2999 int flags /* peer flags */ 3000 ) 3001 { 3002 struct p *p; /* peer process pointer */ 3004 /* 3005 * Allocate and initialize association memory 3006 */ 3007 p = malloc(sizeof(struct p)); 3008 p->srcaddr = srcaddr; 3009 p->dstaddr = dstaddr; 3010 p->version = version; 3011 p->hmode = mode; 3012 p->keyid = keyid; 3013 p->hpoll = MINPOLL; 3014 clear(p, X_INIT); 3015 p->flags == flags; 3016 return (p); 3017 } 3019 /* 3020 * find_assoc() - find a matching association 3021 */ 3022 struct p /* peer structure pointer or NULL */ 3023 *find_assoc( 3024 struct r *r /* receive packet pointer */ 3025 ) 3026 { 3027 struct p *p; /* dummy peer structure pointer */ 3029 /* 3030 * Search association table for matching source 3031 * address, source port and mode. 3032 */ 3033 while (/* all associations */ 0) { 3034 if (r->srcaddr == p->srcaddr && r->mode == p->hmode) 3035 return(p); 3036 } 3037 return (NULL); 3038 } 3040 /* 3041 * md5() - compute message digest 3042 */ 3044 digest 3045 md5( 3046 int keyid /* key identifier */ 3047 ) 3048 { 3049 /* 3050 * Compute a keyed cryptographic message digest. The key 3051 * identifier is associated with a key in the local key cache. 3052 * The key is prepended to the packet header and extension fieds 3053 * and the result hashed by the MD5 algorithm as described in 3054 * RFC-1321. Return a MAC consisting of the 32-bit key ID 3055 * concatenated with the 128-bit digest. 3056 */ 3057 return (/* MD5 digest */ 0); 3058 } 3060 A.3. Kernel Input/Output Interface 3062 /* 3063 * Kernel interface to transmit and receive packets. Details are 3064 * deliberately vague and depend on the operating system. 3065 * 3066 * recv_packet - receive packet from network 3067 */ 3068 struct r /* receive packet pointer*/ 3069 *recv_packet() { 3070 return (/* receive packet r */ 0); 3071 } 3073 /* 3074 * xmit_packet - transmit packet to network 3075 */ 3076 void 3077 xmit_packet( 3078 struct x *x /* transmit packet pointer */ 3079 ) 3080 { 3081 /* send packet x */ 3082 } 3084 A.4. Kernel System Clock Interface 3086 /* 3087 * System clock utility functions 3088 * 3089 * There are three time formats: native (Unix), NTP and floating double. 3090 * The get_time() routine returns the time in NTP long format. The Unix 3091 * routines expect arguments as a structure of two signed 32-bit words 3092 * in seconds and microseconds (timeval) or nanoseconds (timespec). The 3093 * step_time() and adjust_time() routines expect signed arguments in 3094 * floating double. The simplified code shown here is for illustration 3095 * only and has not been verified. 3096 */ 3097 #define JAN_1970 2208988800UL /* 1970 - 1900 in seconds */ 3099 /* 3100 * get_time - read system time and convert to NTP format 3101 */ 3102 tstamp 3103 get_time() 3104 { 3105 struct timeval unix_time; 3107 /* 3108 * There are only two calls on this routine in the program. One 3109 * when a packet arrives from the network and the other when a 3110 * packet is placed on the send queue. Call the kernel time of 3111 * day routine (such as gettimeofday()) and convert to NTP 3112 * format. 3113 */ 3114 gettimeofday(&unix_time, NULL); 3115 return (U2LFP(unix_time)); 3116 } 3118 /* 3119 * step_time() - step system time to given offset valuet 3120 */ 3121 void 3122 step_time( 3123 double offset /* clock offset */ 3124 ) 3125 { 3126 struct timeval unix_time; 3127 tstamp ntp_time; 3129 /* 3130 * Convert from double to native format (signed) and add to the 3131 * current time. Note the addition is done in native format to 3132 * avoid overflow or loss of precision. 3133 */ 3134 gettimeofday(&unix_time, NULL); 3135 ntp_time = D2LFP(offset) + U2LFP(unix_time);; 3136 unix_time.tv_sec = ntp_time >> 32; 3137 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3138 32) / FRAC * 1e6); 3139 settimeofday(&unix_time, NULL); 3140 } 3142 /* 3143 * adjust_time() - slew system clock to given offset value 3144 */ 3145 void 3146 adjust_time( 3147 double offset /* clock offset */ 3148 ) 3149 { 3150 struct timeval unix_time; 3151 tstamp ntp_time; 3153 /* 3154 * Convert from double to native format (signed) and add to the 3155 * current time. 3156 */ 3157 ntp_time = D2LFP(offset); 3158 unix_time.tv_sec = ntp_time >> 32; 3159 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3160 32) / FRAC * 1e6); 3161 adjtime(&unix_time, NULL); 3162 } 3164 A.5. Peer Process 3166 /* 3167 * A crypto-NAK packet includes the NTP header followed by a MAC 3168 * consisting only of the key identifier with value zero. It tells the 3169 * receiver that a prior request could not be properly authenticated, 3170 * but the NTP header fields are correct. 3171 * 3172 * A kiss-o'-death packet is an NTP header with leap 0x3 (NOSYNC) and 3173 * stratum 16 (MAXSTRAT. It tells the receiver that something drastic 3174 * has happened, as revealled by the kiss code in the refid field. The 3175 * NTP header fields may or may not be correct. 3176 */ 3177 /* 3178 * Peer process parameters and constants 3179 */ 3180 #define SGATE 3 /* spike gate (clock filter */ 3181 #define BDELAY .004 /* broadcast delay (s) */ 3183 /* 3184 * Dispatch codes 3185 */ 3186 #define ERR -1 /* error */ 3187 #define DSCRD 0 /* discard packet */ 3188 #define PROC 1 /* process packet */ 3189 #define BCST 2 /* broadcast packet */ 3190 #define FXMIT 3 /* client packet */ 3191 #define NEWPS 4 /* new symmetric passive client */ 3192 #define NEWBC 5 /* new broadcast client */ 3194 /* 3195 * Dispatch matrix 3196 * active passv client server bcast */ 3197 int table[7][5] = { 3198 /* nopeer */ { NEWPS, DSCRD, FXMIT, MANY, NEWBC }, 3199 /* active */ { PROC, PROC, DSCRD, DSCRD, DSCRD }, 3200 /* passv */ { PROC, ERR, DSCRD, DSCRD, DSCRD }, 3201 /* client */ { DSCRD, DSCRD, DSCRD, PROC, DSCRD }, 3202 /* server */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3203 /* bcast */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3204 /* bclient */ { DSCRD, DSCRD, DSCRD, DSCRD, PROC} 3205 }; 3207 /* 3208 * Miscellaneous macroni 3209 * 3210 * This macro defines the authentication state. If x is 0, 3211 * authentication is optional, othewise it is required. 3212 */ 3213 #define AUTH(x, y) ((x) ? (y) == A_OK : (y) == A_OK || \ 3214 (y) == A_NONE) 3216 /* 3217 * These are used by the clear() routine 3218 */ 3219 #define BEGIN_CLEAR(p) ((char *)&((p)->begin_clear)) 3220 #define END_CLEAR(p) ((char *)&((p)->end_clear)) 3221 #define LEN_CLEAR (END_CLEAR((struct p *)0) - \ 3222 BEGIN_CLEAR((struct p *)0)) 3224 A.5.1. receive() 3226 /* 3227 * receive() - receive packet and decode modes 3228 */ 3229 void 3230 receive( 3231 struct r *r /* receive packet pointer */ 3232 ) 3234 { 3235 struct p *p; /* peer structure pointer 3236 int auth; /* authentication code */ 3237 int has_mac; /* size of MAC */ 3238 int synch; /* synchronized switch */ 3239 int auth; /* authentication code */ 3241 /* 3242 * Check access control lists. The intent here is to implement a 3243 * whitelist of those IP addresses specifically accepted and/or 3244 * a blacklist of those IP addresses specifically rejected. 3245 * There could be different lists for authenticated clients and 3246 * unauthenticated clients. 3247 */ 3248 if (!access(r)) 3249 return; /* access denied */ 3251 /* 3252 * The version must not be in the future. Format checks include 3253 * packet length, MAC length and extension field lengths, if 3254 * present. 3255 */ 3256 if (r->version > VERSION /* or format error */) 3257 return; /* format error */ 3259 /* 3260 * Authentication is conditioned by two switches which can be 3261 * specified on a per-client basis. 3262 * 3263 * P_NOPEER do not mobilize an association unless 3264 * authenticated 3265 * P_NOTRUST do not allow access unless authenticated 3266 * (implies P_NOPEER) 3267 * 3268 * There are four outcomes: 3269 * 3270 * A_NONE the packet has no MAC 3271 * A_OK the packet has a MAC and authentication 3272 * succeeds 3273 * A_ERROR the packet has a MAC and authentication fails 3274 * A_CRYPTO crypto-NAK. The MAC has four octets only. 3275 * 3276 * Note: The AUTH(x, y) macro is used to filter outcomes. If x 3277 * is zero, acceptable outcomes of y are NONE and OK. If x is 3278 * one, the only acceptable outcome of y is OK. 3279 */ 3280 has_mac = /* length of MAC field */ 0; 3281 if (has_mac == 0) { 3282 auth = A_NONE; /* not required */ 3283 } else if (has_mac == 4) { 3284 auth == A_CRYPTO; /* crypto-NAK */ 3285 } else { 3286 if (r->mac != md5(r->keyid)) 3287 auth = A_ERROR; /* auth error */ 3288 else 3289 auth = A_OK; /* auth OK */ 3290 } 3292 /* 3293 * Find association and dispatch code. If there is no 3294 * association to match, the value of p->hmode is assumed NULL. 3295 */ 3296 p = find_assoc(r); 3297 switch(table[p->hmode][r->mode]) { 3299 /* 3300 * Client packet and no association. Send server reply without 3301 * saving state. 3302 */ 3303 case FXMIT: 3305 /* 3306 * If unicast destination address, send server packet. 3307 * If authentication fails, send a crypto-NAK packet. 3308 /* 3309 if (/* not multicast dstaddr */0) { 3310 if (AUTH(p->flags & P_NOTRUST, auth)) 3311 fast_xmit(r, M_SERV, auth); 3312 else if (auth == A_ERROR) 3313 fast_xmit(r, M_SERV, A_CRYPTO); 3314 return; /* M_SERV packet sent */ 3315 } 3317 /* 3318 * This must be manycast. Do not repspond if we are not 3319 * synchronized or if our stratum is above the 3320 * manycaster. 3321 */ 3322 if (s.leap == NOSYNC || s.stratum > r->stratum) 3323 return; 3325 /* 3326 * Respond only if authentication is ok. Note that the 3327 * unicast addreess is used, not the multicast. 3328 */ 3329 if (AUTH(p->flags & P_NOTRUST, auth)) 3330 fast_xmit(r, M_SERV, auth); 3331 return; 3333 /* 3334 * New manycase client ephemeral association. It is mobilized in 3335 * the same version as in the packet. If authentication fails, 3336 * ignore the packet. Verify the server packet by comparing the 3337 * r->org timestamp in the packet with the p->xmt timestamp in 3338 * the multicast client association. If they match, the server 3339 * packet is authentic. Details omitted. 3340 */ 3341 case MANY: 3342 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3343 return; /* authentication error */ 3345 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_CLNT, 3346 r->keyid, P_EPHEM); 3347 break; 3349 /* 3350 * New symmetric passive ssociation. It is mobilized in the same 3351 * version as in the packet. If authentication fails, send a 3352 * crypto-NAK packet. If restrict no-moblize, send a symmetric 3353 * active packet instead. 3354 */ 3355 case NEWPS: 3356 if (!AUTH(p->flags & P_NOTRUST, auth)) { 3357 if (auth == A_ERROR) 3358 fast_xmit(r, M_SACT, A_CRYPTO); 3359 return; /* crypto-NAK packet sent */ 3360 } 3361 if (!AUTH(p->flags & P_NOPEER, auth)) { 3362 fast_xmit(r, M_SACT, auth); 3363 return; /* M_SACT packet sent */ 3364 } 3365 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_PASV, 3366 r->keyid, P_EPHEM); 3367 break; 3369 /* 3370 * New broadcast client association. It is mobilized in the same 3371 * version as in the packet. If authentication fails, ignore the 3372 * packet. Note this code does not support the initial volley 3373 * feature in the reference implementation. 3374 */ 3375 case NEWBC: 3376 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3377 return; /* authentication error */ 3379 if (!(s.flags & S_BCSTENAB)) 3380 return; /* broadcast not enabled */ 3382 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_BCLN, 3383 r->keyid, P_EPHEM); 3384 break; /* processing continues */ 3386 /* 3387 * Process packet. Placeholdler only. 3388 */ 3389 case PROC: 3390 break; /* processing continues */ 3392 /* 3393 * Invalid mode combination. We get here only in case of 3394 * ephemeral associations, so the correct action is simply to 3395 * toss it. 3396 */ 3397 case ERR: 3398 clear(p, X_ERROR); 3399 return; /* invalid mode combination */ 3401 /* 3402 * No match; just discard the packet. 3403 */ 3404 case DSCRD: 3405 return; /* orphan abandoned */ 3406 } 3408 /* 3409 * Next comes a rigorous schedule of timestamp checking. If the 3410 * transmit timestamp is zero, the server is horribly broken. 3411 */ 3412 if (r->xmt == 0) 3413 return; /* invalid timestamp */ 3415 /* 3416 * If the transmit timestamp duplicates a previous one, the 3417 * packet is a replay. 3418 */ 3419 if (r->xmt == p->xmt) 3420 return; /* duplicate packet */ 3422 /* 3423 * If this is a broadcast mode packet, skip further checking. 3424 * If the origin timestamp is zero, the sender has not yet heard 3425 * from us. Otherwise, if the origin timestamp does not match 3426 * the transmit timestamp, the packet is bogus. 3428 */ 3429 synch = TRUE; 3430 if (r->mode != M_BCST) { 3431 if (r->org == 0) 3432 synch = FALSE; /* unsynchronized */ 3434 else if (r->org != p->xmt) 3435 synch = FALSE; /* bogus packet */ 3436 } 3438 /* 3439 * Update the origin and destination timestamps. If 3440 * unsynchronized or bogus, abandon ship. 3441 */ 3442 p->org = r->xmt; 3443 p->rec = r->dst; 3444 if (!synch) 3445 return; /* unsynch */ 3447 /* 3448 * The timestamps are valid and the receive packet matches the 3449 * last one sent. If the packet is a crypto-NAK, the server 3450 * might have just changed keys. We demobilize the association 3451 * and wait for better times. 3452 */ 3453 if (auth == A_CRYPTO) { 3454 clear(p, X_CRYPTO); 3455 return; /* crypto-NAK */ 3456 } 3458 /* 3459 * If the association is authenticated, the key ID is nonzero 3460 * and received packets must be authenticated. This is designed 3461 * to avoid a bait-and-switch attack, which was possible in past 3462 * versions. 3463 */ 3464 if (!AUTH(p->keyid || (p->flags & P_NOTRUST), auth)) 3465 return; /* bad auth */ 3467 /* 3468 * Everything possible has been done to validate the timestamps 3469 * and prevent bad guys from disrupting the protocol or 3470 * injecting bogus data. Earn some revenue. 3471 */ 3472 packet(p, r); 3473 } 3475 A.5.1.1. packet() 3476 /* 3477 * packet() - process packet and compute offset, delay and 3478 * dispersion. 3479 */ 3480 void 3481 packet( 3482 struct p *p, /* peer structure pointer */ 3483 struct r *r /* receive packet pointer */ 3484 ) 3485 { 3486 double offset; /* sample offsset */ 3487 double delay; /* sample delay */ 3488 double disp; /* sample dispersion */ 3490 /* 3491 * By golly the packet is valid. Light up the remaining header 3492 * fields. Note that we map stratum 0 (unspecified) to MAXSTRAT 3493 * to make stratum comparisons simpler and to provide a natural 3494 * interface for radio clock drivers that operate for 3495 * convenience at stratum 0. 3496 */ 3497 p->leap = r->leap; 3498 if (r->stratum == 0) 3499 p->stratum = MAXSTRAT; 3500 else 3501 p->stratum = r->stratum; 3502 p->pmode = r->mode; 3503 p->ppoll = r->poll; 3504 p->rootdelay = FP2D(r->rootdelay); 3505 p->rootdisp = FP2D(r->rootdisp); 3506 p->refid = r->refid; 3507 p->reftime = r->reftime; 3509 /* 3510 * Verify the server is synchronized with valid stratum and 3511 * reference time not later than the transmit time. 3512 */ 3513 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3514 return; /* unsynchronized */ 3516 /* 3517 * Verify valid root distance. 3518 */ 3519 if (r->rootdelay / 2 + r->rootdisp >= MAXDISP || p->reftime > 3520 r->xmt) 3521 return; /* invalid header values */ 3523 poll_update(p, p->hpoll); 3524 p->reach |= 1; 3526 /* 3527 * Calculate offset, delay and dispersion, then pass to the 3528 * clock filter. Note carefully the implied processing. The 3529 * first-order difference is done directly in 64-bit arithmetic, 3530 * then the result is converted to floating double. All further 3531 * processing is in floating double arithmetic with rounding 3532 * done by the hardware. This is necessary in order to avoid 3533 * overflow and preseve precision. 3534 * 3535 * The delay calculation is a special case. In cases where the 3536 * server and client clocks are running at different rates and 3537 * with very fast networks, the delay can appear negative. In 3538 * order to avoid violating the Principle of Least Astonishment, 3539 * the delay is clamped not less than the system precision. 3540 */ 3541 if (p->pmode == M_BCST) { 3542 offset = LFP2D(r->xmt - r->dst); 3543 delay = BDELAY; 3544 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3545 2 * BDELAY; 3546 } else { 3547 offset = (LFP2D(r->rec - r->org) + LFP2D(r->dst - 3548 r->xmt)) / 2; 3549 delay = max(LFP2D(r->dst - r->org) - LFP2D(r->rec - 3550 r->xmt), LOG2D(s.precision)); 3551 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3552 LFP2D(r->dst - r->org); 3553 } 3554 clock_filter(p, offset, delay, disp); 3555 } 3557 A.5.2. clock_filter() 3559 /* 3560 * clock_filter(p, offset, delay, dispersion) - select the best from the 3561 * latest eight delay/offset samples. 3562 */ 3563 void 3564 clock_filter( 3565 struct p *p, /* peer structure pointer */ 3566 double offset, /* clock offset */ 3567 double delay, /* roundtrip delay */ 3568 double disp /* dispersion */ 3569 ) 3570 { 3571 struct f f[NSTAGE]; /* sorted list */ 3572 double dtemp; 3573 int i; 3575 /* 3576 * The clock filter contents consist of eight tuples (offset, 3577 * delay, dispersion, time). Shift each tuple to the left, 3578 * discarding the leftmost one. As each tuple is shifted, 3579 * increase the dispersion since the last filter update. At the 3580 * same time, copy each tuple to a temporary list. After this, 3581 * place the (offset, delay, disp, time) in the vacated 3582 * rightmost tuple. 3583 */ 3584 for (i = 1; i < NSTAGE; i++) { 3585 p->f[i] = p->f[i - 1]; 3586 p->f[i].disp += PHI * (c.t - p->t); 3587 f[i] = p->f[i]; 3588 } 3589 p->f[0].t = c.t; 3590 p->f[0].offset = offset; 3591 p->f[0].delay = delay; 3592 p->f[0].disp = disp; 3593 f[0] = p->f[0]; 3595 /* 3596 * Sort the temporary list of tuples by increasing f[].delay. 3597 * The first entry on the sorted list represents the best 3598 * sample, but it might be old. 3599 */ 3600 dtemp = p->offset; 3601 p->offset = f[0].offset; 3602 p->delay = f[0].delay; 3603 for (i = 0; i < NSTAGE; i++) { 3604 p->disp += f[i].disp / (2 ^ (i + 1)); 3605 p->jitter += SQUARE(f[i].offset - f[0].offset); 3606 } 3607 p->jitter = max(SQRT(p->jitter), LOG2D(s.precision)); 3609 /* 3610 * Prime directive: use a sample only once and never a sample 3611 * older than the latest one, but anything goes before first 3612 * synchronized. 3613 */ 3614 if (f[0].t - p->t <= 0 && s.leap != NOSYNC) 3615 return; 3617 /* 3618 * Popcorn spike suppressor. Compare the difference between the 3619 * last and current offsets to the current jitter. If greater 3620 * than SGATE (3) and if the interval since the last offset is 3621 * less than twice the system poll interval, dump the spike. 3622 * Otherwise, and if not in a burst, shake out the truechimers. 3623 */ 3624 if (fabs(p->offset - dtemp) > SGATE * p->jitter && (f[0].t - 3625 p->t) < 2 * s.poll) 3626 return; 3628 p->t = f[0].t; 3629 if (p->burst == 0) 3630 clock_select(); 3631 return; 3632 } 3634 /* 3635 * root_dist() - calculate root distance 3636 */ 3637 double 3638 root_dist( 3639 struct p *p /* peer structure pointer */ 3640 ) 3641 { 3642 /* 3643 * The root synchronization distance is the maximum error due to 3644 * all causes of the local clock relative to the primary server. 3645 * It is defined as half the total delay plus total dispersion 3646 * plus peer jitter. 3647 */ 3648 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3649 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3650 } 3652 /* 3653 * fit() - test if association p is acceptable for synchronization 3654 */ 3655 int 3656 fit( 3657 struct p *p /* peer structure pointer */ 3658 ) 3659 { 3660 /* 3661 * A stratum error occurs if (1) the server has never been 3662 * synchronized, (2) the server stratum is invalid. 3663 */ 3664 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3665 return (FALSE); 3667 /* 3668 * A distance error occurs if the root distance exceeds the 3669 * distance threshold plus an increment equal to one poll 3670 * interval. 3671 */ 3672 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 3673 return (FALSE); 3675 /* 3676 * A loop error occurs if the remote peer is synchronized to the 3677 * local peer or the remote peer is synchronized to the current 3678 * system peer. Note this is the behavior for IPv4; for IPv6 the 3679 * MD5 hash is used instead. 3680 */ 3681 if (p->refid == p->dstaddr || p->refid == s.refid) 3682 return (FALSE); 3684 /* 3685 * An unreachable error occurs if the server is unreachable. 3686 */ 3687 if (p->reach == 0) 3688 return (FALSE); 3690 return (TRUE); 3691 } 3693 /* 3694 * clear() - reinitialize for persistent association, demobilize 3695 * for ephemeral association. 3696 */ 3697 void 3698 clear( 3699 struct p *p, /* peer structure pointer */ 3700 int kiss /* kiss code */ 3701 ) 3702 { 3703 int i; 3705 /* 3706 * The first thing to do is return all resources to the bank. 3707 * Typical resources are not detailed here, but they include 3708 * dynamically allocated structures for keys, certificates, etc. 3709 * If an ephemeral association and not initialization, return 3710 * the association memory as well. 3711 */ 3712 /* return resources */ 3713 if (s.p == p) 3714 s.p = NULL; 3715 if (kiss != X_INIT && (p->flags & P_EPHEM)) { 3716 free(p); 3717 return; 3718 } 3720 /* 3721 * Initialize the association fields for general reset. 3722 */ 3723 memset(BEGIN_CLEAR(p), LEN_CLEAR, 0); 3724 p->leap = NOSYNC; 3725 p->stratum = MAXSTRAT; 3726 p->ppoll = MAXPOLL; 3727 p->hpoll = MINPOLL; 3728 p->disp = MAXDISP; 3729 p->jitter = LOG2D(s.precision); 3730 p->refid = kiss; 3731 for (i = 0; i < NSTAGE; i++) 3732 p->f[i].disp = MAXDISP; 3734 /* 3735 * Randomize the first poll just in case thousands of broadcast 3736 * clients have just been stirred up after a long absence of the 3737 * broadcast server. 3738 */ 3739 p->outdate = p->t = c.t; 3740 p->nextdate = p->outdate + (random() & ((1 << MINPOLL) - 1)); 3741 } 3743 A.5.3. fast_xmit() 3745 /* 3746 * fast_xmit() - transmit a reply packet for receive packet r 3747 */ 3748 void 3749 fast_xmit( 3750 struct r *r, /* receive packet pointer */ 3751 int mode, /* association mode */ 3752 int auth /* authentication code */ 3753 ) 3754 { 3755 struct x x; 3757 /* 3758 * Initialize header and transmit timestamp. Note that the 3759 * transmit version is copied from the receive version. This is 3760 * for backward compatibility. 3761 */ 3762 x.version = r->version; 3763 x.srcaddr = r->dstaddr; 3764 x.dstaddr = r->srcaddr; 3765 x.leap = s.leap; 3766 x.mode = mode; 3767 if (s.stratum == MAXSTRAT) 3768 x.stratum = 0; 3769 else 3770 x.stratum = s.stratum; 3771 x.poll = r->poll; 3772 x.precision = s.precision; 3773 x.rootdelay = D2FP(s.rootdelay); 3774 x.rootdisp = D2FP(s.rootdisp); 3775 x.refid = s.refid; 3776 x.reftime = s.reftime; 3777 x.org = r->xmt; 3778 x.rec = r->dst; 3779 x.xmt = get_time(); 3781 /* 3782 * If the authentication code is A.NONE, include only the 3783 * header; if A.CRYPTO, send a crypto-NAK; if A.OK, send a valid 3784 * MAC. Use the key ID in the received packet and the key in the 3785 * local key cache. 3786 */ 3787 if (auth != A_NONE) { 3788 if (auth == A_CRYPTO) { 3789 x.keyid = 0; 3790 } else { 3791 x.keyid = r->keyid; 3792 x.mac = md5(x.keyid); 3793 } 3794 } 3795 xmit_packet(&x); 3796 } 3797 A.5.4. access() 3799 /* 3800 * access() - determine access restrictions 3801 */ 3802 int 3803 access( 3804 struct r *r /* receive packet pointer */ 3805 ) 3806 { 3807 /* 3808 * The access control list is an ordered set of tuples 3809 * consisting of an address, mask and restrict word containing 3810 * defined bits. The list is searched for the first match on the 3811 * source address (r->srcaddr) and the associated restrict word 3812 * is returned. 3813 */ 3814 return (/* access bits */ 0); 3815 } 3817 A.5.5. System Process 3819 A.5.5.1. clock_select() 3821 /* 3822 * clock_select() - find the best clocks 3823 */ 3824 void 3825 clock_select() { 3826 struct p *p, *osys; /* peer structure pointers */ 3827 double low, high; /* correctness interval extents */ 3828 int allow, found, chime; /* used by intersecion algorithm */ 3829 int n, i, j; 3831 /* 3832 * We first cull the falsetickers from the server population, 3833 * leaving only the truechimers. The correctness interval for 3834 * association p is the interval from offset - root_dist() to 3835 * offset + root_dist(). The object of the game is to find a 3836 * majority clique; that is, an intersection of correctness 3837 * intervals numbering more than half the server population. 3838 * 3839 * First construct the chime list of tuples (p, type, edge) as 3840 * shown below, then sort the list by edge from lowest to 3841 * highest. 3842 */ 3843 osys = s.p; 3844 s.p = NULL; 3845 n = 0; 3846 while (fit(p)) { 3847 s.m[n].p = p; 3848 s.m[n].type = +1; 3849 s.m[n].edge = p->offset + root_dist(p); 3850 n++; 3851 s.m[n].p = p; 3852 s.m[n].type = 0; 3853 s.m[n].edge = p->offset; 3854 n++; 3855 s.m[n].p = p; 3856 s.m[n].type = -1; 3857 s.m[n].edge = p->offset - root_dist(p); 3858 n++; 3859 } 3861 /* 3862 * Find the largest contiguous intersection of correctness 3863 * intervals. Allow is the number of allowed falsetickers; found 3864 * is the number of midpoints. Note that the edge values are 3865 * limited to the range +-(2 ^ 30) < +-2e9 by the timestamp 3866 * calculations. 3867 */ 3868 low = 2e9; high = -2e9; 3869 for (allow = 0; 2 * allow < n; allow++) { 3871 /* 3872 * Scan the chime list from lowest to highest to find 3873 * the lower endpoint. 3874 */ 3875 found = 0; 3876 chime = 0; 3877 for (i = 0; i < n; i++) { 3878 chime -= s.m[i].type; 3879 if (chime >= n - found) { 3880 low = s.m[i].edge; 3881 break; 3882 } 3883 if (s.m[i].type == 0) 3884 found++; 3885 } 3887 /* 3888 * Scan the chime list from highest to lowest to find 3889 * the upper endpoint. 3890 */ 3891 chime = 0; 3892 for (i = n - 1; i >= 0; i--) { 3893 chime += s.m[i].type; 3894 if (chime >= n - found) { 3895 high = s.m[i].edge; 3896 break; 3897 } 3898 if (s.m[i].type == 0) 3899 found++; 3900 } 3902 /* 3903 * If the number of midpoints is greater than the number 3904 * of allowed falsetickers, the intersection contains at 3905 * least one truechimer with no midpoint. If so, 3906 * increment the number of allowed falsetickers and go 3907 * around again. If not and the intersection is 3908 * nonempty, declare success. 3909 */ 3910 if (found > allow) 3911 continue; 3913 if (high > low) 3914 break; 3915 } 3917 /* 3918 * Clustering algorithm. Construct a list of survivors (p, 3919 * metric) from the chime list, where metric is dominated first 3920 * by stratum and then by root distance. All other things being 3921 * equal, this is the order of preference. 3922 */ 3923 s.n = 0; 3924 for (i = 0; i < n; i++) { 3925 if (s.m[i].edge < low || s.m[i].edge > high) 3926 continue; 3928 p = s.m[i].p; 3929 s.v[n].p = p; 3930 s.v[n].metric = MAXDIST * p->stratum + root_dist(p); 3931 s.n++; 3932 } 3934 /* 3935 * There must be at least NSANE survivors to satisfy the 3936 * correctness assertions. Ordinarily, the Byzantine criteria 3937 * require four, survivors, but for the demonstration here, one 3938 * is acceptable. 3939 */ 3940 if (s.n < NSANE) 3941 return; 3943 /* 3944 * For each association p in turn, calculate the selection 3945 * jitter p->sjitter as the square root of the sum of squares 3946 * (p->offset - q->offset) over all q associations. The idea is 3947 * to repeatedly discard the survivor with maximum selection 3948 * jitter until a termination condition is met. 3949 */ 3950 while (1) { 3951 struct p *p, *q, *qmax; /* peer structure pointers */ 3952 double max, min, dtemp; 3954 max = -2e9; min = 2e9; 3955 for (i = 0; i < s.n; i++) { 3956 p = s.v[i].p; 3957 if (p->jitter < min) 3958 min = p->jitter; 3959 dtemp = 0; 3960 for (j = 0; j < n; j++) { 3961 q = s.v[j].p; 3962 dtemp += SQUARE(p->offset - q->offset); 3963 } 3964 dtemp = SQRT(dtemp); 3965 if (dtemp > max) { 3966 max = dtemp; 3967 qmax = q; 3968 } 3969 } 3971 /* 3972 * If the maximum selection jitter is less than the 3973 * minimum peer jitter, then tossing out more survivors 3974 * will not lower the minimum peer jitter, so we might 3975 * as well stop. To make sure a few survivors are left 3976 * for the clustering algorithm to chew on, we also stop 3977 * if the number of survivors is less than or equal to 3978 * NMIN (3). 3979 */ 3980 if (max < min || n <= NMIN) 3981 break; 3983 /* 3984 * Delete survivor qmax from the list and go around 3985 * again. 3986 */ 3987 s.n--; 3988 } 3989 /* 3990 * Pick the best clock. If the old system peer is on the list 3991 * and at the same stratum as the first survivor on the list, 3992 * then don't do a clock hop. Otherwise, select the first 3993 * survivor on the list as the new system peer. 3994 */ 3995 if (osys->stratum == s.v[0].p->stratum) 3996 s.p = osys; 3997 else 3998 s.p = s.v[0].p; 3999 clock_update(s.p); 4000 } 4002 A.5.5.2. root_dist() 4004 /* 4005 * root_dist() - calculate root distance 4006 */ 4007 double 4008 root_dist( 4009 struct p *p /* peer structure pointer */ 4010 ) 4011 { 4013 /* 4014 * The root synchronization distance is the maximum error due to 4015 * all causes of the local clock relative to the primary server. 4016 * It is defined as half the total delay plus total dispersion 4017 * plus peer jitter. 4018 */ 4019 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 4020 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 4021 } 4022 A.5.5.3. accept() 4024 /* 4025 * accept() - test if association p is acceptable for synchronization 4026 */ 4027 int 4028 accept( 4029 struct p *p /* peer structure pointer */ 4030 ) 4031 { 4032 /* 4033 * A stratum error occurs if (1) the server has never been 4034 * synchronized, (2) the server stratum is invalid. 4035 */ 4036 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 4037 return (FALSE); 4039 /* 4040 * A distance error occurs if the root distance exceeds the 4041 * distance threshold plus an increment equal to one poll 4042 * interval. 4043 */ 4044 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 4045 return (FALSE); 4047 /* 4048 * A loop error occurs if the remote peer is synchronized to the 4049 * local peer or the remote peer is synchronized to the current 4050 * system peer. Note this is the behavior for IPv4; for IPv6 the 4051 * MD5 hash is used instead. 4052 */ 4053 if (p->refid == p->dstaddr || p->refid == s.refid) 4054 return (FALSE); 4056 /* 4057 * An unreachable error occurs if the server is unreachable. 4058 */ 4059 if (p->reach == 0) 4060 return (FALSE); 4062 return (TRUE); 4063 } 4065 A.5.5.4. clock_update() 4067 /* 4068 * clock_update() - update the system clock 4069 */ 4071 void 4072 clock_update( 4073 struct p *p /* peer structure pointer */ 4074 ) 4075 { 4076 double dtemp; 4078 /* 4079 * If this is an old update, for instance as the result of a 4080 * system peer change, avoid it. We never use an old sample or 4081 * the same sample twice. 4082 * 4083 if (s.t >= p->t) 4084 return; 4086 /* 4087 * Combine the survivor offsets and update the system clock; the 4088 * local_clock() routine will tell us the good or bad news. 4089 */ 4090 s.t = p->t; 4091 clock_combine(); 4092 switch (local_clock(p, s.offset)) { 4094 /* 4095 * The offset is too large and probably bogus. Complain to the 4096 * system log and order the operator to set the clock manually 4097 * within PANIC range. The reference implementation includes a 4098 * command line option to disable this check and to change the 4099 * panic threshold from the default 1000 s as required. 4100 */ 4101 case PANIC: 4102 exit (0); 4104 /* 4105 * The offset is more than the step threshold (0.125 s by 4106 * default). After a step, all associations now have 4107 * inconsistent time valurs, so they are reset and started 4108 * fresh. The step threshold can be changed in the reference 4109 * implementation in order to lessen the chance the clock might 4110 * be stepped backwards. However, there may be serious 4111 * consequences, as noted in the white papers at the NTP project 4112 * site. 4113 */ 4114 case STEP: 4115 while (/* all associations */ 0) 4116 clear(p, X_STEP); 4117 s.stratum = MAXSTRAT; 4118 s.poll = MINPOLL; 4119 break; 4121 /* 4122 * The offset was less than the step threshold, which is the 4123 * normal case. Update the system variables from the peer 4124 * variables. The lower clamp on the dispersion increase is to 4125 * avoid timing loops and clockhopping when highly precise 4126 * sources are in play. The clamp can be changed from the 4127 * default .01 s in the reference implementation. 4128 */ 4129 case SLEW: 4130 s.leap = p->leap; 4131 s.stratum = p->stratum + 1; 4132 s.refid = p->refid; 4133 s.reftime = p->reftime; 4134 s.rootdelay = p->rootdelay + p->delay; 4135 dtemp = SQRT(SQUARE(p->jitter) + SQUARE(s.jitter)); 4136 dtemp += max(p->disp + PHI * (c.t - p->t) + 4137 fabs(p->offset), MINDISP); 4138 s.rootdisp = p->rootdisp + dtemp; 4139 break; 4141 /* 4142 * Some samples are discarded while, for instance, a direct 4143 * frequency measurement is being made. 4144 */ 4145 case IGNORE: 4146 break; 4147 } 4148 } 4149 A.5.5.5. clock_combine() 4151 /* 4152 * clock_combine() - combine offsets 4153 */ 4154 void 4155 clock_combine() 4156 { 4157 struct p *p; /* peer structure pointer */ 4158 double x, y, z, w; 4159 int i; 4161 /* 4162 * Combine the offsets of the clustering algorithm survivors 4163 * using a weighted average with weight determined by the root 4164 * distance. Compute the selection jitter as the weighted RMS 4165 * difference between the first survivor and the remaining 4166 * survivors. In some cases the inherent clock jitter can be 4167 * reduced by not using this algorithm, especially when frequent 4168 * clockhopping is involved. The reference implementation can be 4169 * configured to avoid this algorithm by designating a preferred 4170 * peer. 4171 */ 4172 y = z = w = 0; 4173 for (i = 0; s.v[i].p != NULL; i++) { 4174 p = s.v[i].p; 4175 x = root_dist(p); 4176 y += 1 / x; 4177 z += p->offset / x; 4178 w += SQUARE(p->offset - s.v[0].p->offset) / x; 4179 } 4180 s.offset = z / y; 4181 s.jitter = SQRT(w / y); 4182 } 4184 A.5.5.6. local_clock() 4186 /* 4187 * Clock discipline parameters and constants 4188 */ 4189 #define STEPT .128 /* step threshold (s) */ 4190 #define WATCH 900 /* stepout threshold (s) */ 4191 #define PANICT 1000 /* panic threshold (s) */ 4192 #define PLL 65536 /* PLL loop gain */ 4193 #define FLL MAXPOLL + 1 /* FLL loop gain */ 4194 #define AVG 4 /* parameter averaging constant */ 4195 #define ALLAN 1500 /* compromise Allan intercept (s) */ 4196 #define LIMIT 30 /* poll-adjust threshold */ 4197 #define MAXFREQ 500e-6 /* frequency tolerance (500 PPM) */ 4198 #define PGATE 4 /* poll-adjust gate */ 4200 /* 4201 * local_clock() - discipline the local clock 4202 */ 4203 int /* return code */ 4204 local_clock( 4205 struct p *p, /* peer structure pointer */ 4206 double offset /* clock offset from combine() */ 4207 ) 4208 { 4209 int state; /* clock discipline state */ 4210 double freq; /* frequency */ 4211 double mu; /* interval since last update */ 4212 int rval; 4213 double etemp, dtemp; 4215 /* 4216 * If the offset is too large, give up and go home. 4217 */ 4218 if (fabs(offset) > PANICT) 4219 return (PANIC); 4221 /* 4222 * Clock state machine transition function. This is where the 4223 * action is and defines how the system reacts to large time 4224 * and frequency errors. There are two main regimes: when the 4225 * offset exceeds the step threshold and when it does not. 4226 */ 4227 rval = SLEW; 4228 mu = p->t - s.t; 4229 freq = 0; 4230 if (fabs(offset) > STEPT) { 4231 switch (c.state) { 4233 /* 4234 * In S_SYNC state we ignore the first outlyer amd 4235 * switch to S_SPIK state. 4236 */ 4237 case SYNC: 4238 state = SPIK; 4239 return (rval); 4241 /* 4242 * In S_FREQ state we ignore outlyers and inlyers. At 4243 * the first outlyer after the stepout threshold, 4244 * compute the apparent frequency correction and step 4245 * the time. 4246 */ 4247 case FREQ: 4248 if (mu < WATCH) 4249 return (IGNORE); 4251 freq = (offset - c.offset) / mu; 4252 /* fall through to S_SPIK */ 4254 /* 4255 * In S_SPIK state we ignore succeeding outlyers until 4256 * either an inlyer is found or the stepout threshold is 4257 * exceeded. 4258 */ 4259 case SPIK: 4260 if (mu < WATCH) 4261 return (IGNORE); 4263 /* fall through to default */ 4265 /* 4266 * We get here by default in S_NSET and S_FSET states 4267 * and from above in S_FREQ state. Step the time and 4268 * clamp down the poll interval. 4269 * 4270 * In S_NSET state an initial frequency correction is 4271 * not available, usually because the frequency file has 4272 * not yet been written. Since the time is outside the 4273 * capture range, the clock is stepped. The frequency 4274 * will be set directly following the stepout interval. 4275 * 4276 * In S_FSET state the initial frequency has been set 4277 * from the frequency file. Since the time is outside 4278 * the capture range, the clock is stepped immediately, 4279 * rather than after the stepout interval. Guys get 4280 * nervous if it takes 17 minutes to set the clock for 4281 * the first time. 4282 * 4283 * In S_SPIK state the stepout threshold has expired and 4284 * the phase is still above the step threshold. Note 4285 * that a single spike greater than the step threshold 4286 * is always suppressed, even at the longer poll 4287 * intervals. 4288 */ 4289 default: 4291 /* 4292 * This is the kernel set time function, usually 4293 * implemented by the Unix settimeofday() system 4294 * call. 4295 */ 4296 step_time(offset); 4297 c.count = 0; 4298 s.poll = MINPOLL; 4299 rval = STEP; 4300 if (state == NSET) { 4301 rstclock(FREQ, p->t, 0); 4302 return (rval); 4303 } 4304 break; 4305 } 4306 rstclock(SYNC, p->t, 0); 4307 } else { 4309 /* 4310 * Compute the clock jitter as the RMS of exponentially 4311 * weighted offset differences. This is used by the 4312 * poll-adjust code. 4313 */ 4314 etemp = SQUARE(c.jitter); 4315 dtemp = SQUARE(max(fabs(offset - c.last), 4316 LOG2D(s.precision))); 4317 c.jitter = SQRT(etemp + (dtemp - etemp) / AVG); 4318 switch (c.state) { 4320 /* 4321 * In S_NSET state this is the first update received and 4322 * the frequency has not been initialized. The first 4323 * thing to do is directly measure the oscillator 4324 * frequency. 4325 */ 4326 case NSET: 4327 rstclock(FREQ, p->t, offset); 4328 return (IGNORE); 4330 /* 4331 * In S_FSET state this is the first update and the 4332 * frequency has been initialized. Adjust the phase, but 4333 * don't adjust the frequency until the next update. 4334 */ 4335 case FSET: 4336 rstclock(SYNC, p->t, offset); 4337 break; 4339 /* 4340 * In S_FREQ state ignore updates until the stepout 4341 * threshold. After that, correct the phase and 4342 * frequency and switch to S_SYNC state. 4343 */ 4344 case FREQ: 4345 if (c.t - s.t < WATCH) 4346 return (IGNORE); 4348 freq = (offset - c.offset) / mu; 4349 break; 4351 /* 4352 * We get here by default in S_SYNC and S_SPIK states. 4353 * Here we compute the frequency update due to PLL and 4354 * FLL contributions. 4355 */ 4356 default: 4358 /* 4359 * The FLL and PLL frequency gain constants 4360 * depend on the poll interval and Allan 4361 * intercept. The FLL is not used below one-half 4362 * the Allan intercept. Above that the loop gain 4363 * increases in steps to 1 / AVG. 4364 */ 4365 if (LOG2D(s.poll) > ALLAN / 2) { 4366 etemp = FLL - s.poll; 4367 if (etemp < AVG) 4368 etemp = AVG; 4369 freq += (offset - c.offset) / (max(mu, 4370 ALLAN) * etemp); 4371 } 4373 /* 4374 * For the PLL the integration interval 4375 * (numerator) is the minimum of the update 4376 * interval and poll interval. This allows 4377 * oversampling, but not undersampling. 4378 */ 4379 etemp = min(mu, LOG2D(s.poll)); 4380 dtemp = 4 * PLL * LOG2D(s.poll); 4381 freq += offset * etemp / (dtemp * dtemp); 4382 rstclock(SYNC, p->t, offset); 4383 break; 4384 } 4385 } 4387 /* 4388 * Calculate the new frequency and frequency stability (wander). 4390 * Compute the clock wander as the RMS of exponentially weighted 4391 * frequency differences. This is not used directly, but can, 4392 * along withthe jitter, be a highly useful monitoring and 4393 * debugging tool 4394 */ 4395 freq += c.freq; 4396 c.freq = max(min(MAXFREQ, freq), -MAXFREQ); 4397 etemp = SQUARE(c.wander); 4398 dtemp = SQUARE(freq); 4399 c.wander = SQRT(etemp + (dtemp - etemp) / AVG); 4401 /* 4402 * Here we adjust the poll interval by comparing the current 4403 * offset with the clock jitter. If the offset is less than the 4404 * clock jitter times a constant, then the averaging interval is 4405 * increased, otherwise it is decreased. A bit of hysteresis 4406 * helps calm the dance. Works best using burst mode. 4407 */ 4408 if (fabs(c.offset) < PGATE * c.jitter) { 4409 c.count += s.poll; 4410 if (c.count > LIMIT) { 4411 c.count = LIMIT; 4412 if (s.poll < MAXPOLL) { 4413 c.count = 0; 4414 s.poll++; 4415 } 4416 } 4417 } else { 4418 c.count -= s.poll << 1; 4419 if (c.count < -LIMIT) { 4420 c.count = -LIMIT; 4421 if (s.poll > MINPOLL) { 4422 c.count = 0; 4423 s.poll--; 4424 } 4425 } 4426 } 4427 return (rval); 4428 } 4429 A.5.5.7. rstclock() 4431 /* 4432 * rstclock() - clock state machine 4433 */ 4434 void 4435 rstclock( 4436 int state, /* new state */ 4437 double offset, /* new offset */ 4438 double t /* new update time */ 4439 ) 4440 { 4441 /* 4442 * Enter new state and set state variables. Note we use the time 4443 * of the last clock filter sample, which must be earlier than 4444 * the current time. 4445 */ 4446 c.state = state; 4447 c.last = c.offset = offset; 4448 s.t = t; 4449 } 4451 A.5.6. Clock Adjust Process 4453 A.5.6.1. clock_adjust() 4455 /* 4456 * clock_adjust() - runs at one-second intervals 4457 */ 4458 void 4459 clock_adjust() { 4460 double dtemp; 4462 /* 4463 * Update the process time c.t. Also increase the dispersion 4464 * since the last update. In contrast to NTPv3, NTPv4 does not 4465 * declare unsynchronized after one day, since the dispersion 4466 * threshold serves this function. When the dispersion exceeds 4467 * MAXDIST (1 s), the server is considered unfit for 4468 * synchroniztion. 4469 */ 4470 c.t++; 4471 s.rootdisp += PHI; 4473 /* 4474 * Implement the phase and frequency adjustments. The gain 4475 * factor (denominator) is not allowed to increase beyond the 4476 * Allan intercept. It doesn't make sense to average phase noise 4477 * beyond this point and it helps to damp residual offset at the 4478 * longer poll intervals. 4479 */ 4480 dtemp = c.offset / (PLL * min(LOG2D(s.poll), ALLAN)); 4481 c.offset -= dtemp; 4483 /* 4484 * This is the kernel adjust time function, usually implemented 4485 * by the Unix adjtime() system call. 4486 */ 4487 adjust_time(c.freq + dtemp); 4489 /* 4490 * Peer timer. Call the poll() routine when the poll timer 4491 * expires. 4492 */ 4493 while (/* all associations */ 0) { 4494 struct p *p; /* dummy peer structure pointer */ 4496 if (c.t >= p->nextdate) 4497 poll(p); 4498 } 4500 /* 4501 * Once per hour write the clock frequency to a file 4502 */ 4503 if (c.t % 3600 == 3599) 4504 /* write c.freq to file */ 0; 4505 } 4507 A.5.7. Poll Process 4509 /* 4510 * Poll process parameters and constants 4511 */ 4512 #define UNREACH 12 /* unreach counter threshold */ 4513 #define BCOUNT 8 /* packets in a burst */ 4514 #define BTIME 2 /* burst interval (s) */ 4516 A.5.7.1. poll() 4518 /* 4519 * poll() - determine when to send a packet for association p-> 4520 */ 4521 void 4522 poll( 4523 struct p *p /* peer structure pointer */ 4524 ) 4526 { 4527 int hpoll; 4528 int oreach; 4530 /* 4531 * This routine is called when the current time c.t catches up 4532 * to the next poll time p->nextdate. The value p->outdate is 4533 * the last time this routine was executed. The poll_update() 4534 * routine determines the next execution time p->nextdate. 4535 * 4536 * If broadcasting, just do it, but only if we are synchronized. 4537 */ 4538 hpoll = p->hpoll; 4539 if (p->hmode == M_BCST) { 4540 p->outdate = c.t; 4541 if (s.p != NULL) 4542 peer_xmit(p); 4543 poll_update(p, hpoll); 4544 return; 4545 } 4547 /* 4548 * If manycasting, start with ttl = 1. The ttl is increased by 4549 * one for each poll until MAXCLOCK servers have been found or 4550 * ttl reaches TTLMAX. If reaching MAXCLOCK, stop polling until 4551 * the number of servers falls below MINCLOCK, then start all 4552 * over. 4553 */ 4554 if (p->hmode == M_CLNT && p->flags & P_MANY) { 4555 p->outdate = c.t; 4556 if (p->unreach > BEACON) { 4557 p->unreach = 0; 4558 p->ttl = 1; 4559 peer_xmit(p); 4560 } else if (s.n < MINCLOCK) { 4561 if (p->ttl < TTLMAX) 4562 p->ttl++; 4563 peer_xmit(p); 4564 } 4565 p->unreach++; 4566 poll_update(p, hpoll); 4567 return; 4568 } 4569 if (p->burst == 0) { 4571 /* 4572 * We are not in a burst. Shift the reachability 4573 * register to the left. Hopefully, some time before the 4574 * next poll a packet will arrive and set the rightmost 4575 * bit. 4576 */ 4577 oreach = p->reach; 4578 p->outdate = c.t; 4579 p->reach << 1; 4580 if (!(p->reach & 0x7)) 4581 clock_filter(p, 0, 0, MAXDISP); 4582 if (!p->reach) { 4584 /* 4585 * The server is unreachable, so bump the 4586 * unreach counter. If the unreach threshold has 4587 * been reached, double the poll interval to 4588 * minimize wasted network traffic. Send a burst 4589 * only if enabled and the unreach threshold has 4590 * not been reached. 4591 */ 4592 if (p->flags & P_IBURST && p->unreach == 0) { 4593 p->burst = BCOUNT; 4594 } else if (p->unreach < UNREACH) 4595 p->unreach++; 4596 else 4597 hpoll++; 4598 p->unreach++; 4599 } else { 4601 /* 4602 * The server is reachable. Set the poll 4603 * interval to the system poll interval. Send a 4604 * burst only if enabled and the peer is fit. 4605 */ 4606 p->unreach = 0; 4607 hpoll = s.poll; 4608 if (p->flags & P_BURST && fit(p)) 4609 p->burst = BCOUNT; 4610 } 4611 } else { 4613 /* 4614 * If in a burst, count it down. When the reply comes 4615 * back the clock_filter() routine will call 4616 * clock_select() to process the results of the burst. 4617 */ 4618 p->burst--; 4619 } 4621 /* 4622 * Do not transmit if in broadcast client mode. 4623 */ 4624 if (p->hmode != M_BCLN) 4625 peer_xmit(p); 4626 poll_update(p, hpoll); 4627 } 4629 A.5.7.2. poll_update() 4630 /* 4631 * poll_update() - update the poll interval for association p 4632 * 4633 * Note: This routine is called by both the packet() and poll() routine. 4634 * Since the packet() routine is executed when a network packet arrives 4635 * and the poll() routine is executed as the result of timeout, a 4636 * potential race can occur, possibly causing an incorrect interval for 4637 * the next poll. This is considered so unlikely as to be negligible. 4638 */ 4639 void 4640 poll_update( 4641 struct p *p, /* peer structure pointer */ 4642 int poll /* poll interval (log2 s) */ 4643 ) 4644 { 4645 /* 4646 * This routine is called by both the poll() and packet() 4647 * routines to determine the next poll time. If within a burst 4648 * the poll interval is two seconds. Otherwise, it is the 4649 * minimum of the host poll interval and peer poll interval, but 4650 * not greater than MAXPOLL and not less than MINPOLL. The 4651 * design insures that a longer interval can be preempted by a 4652 * shorter one if required for rapid response. 4653 */ 4654 p->hpoll = max(min(MAXPOLL, poll), MINPOLL); 4655 if (p->burst > 0) { 4656 if (p->nextdate != c.t) 4657 return; 4658 else 4659 p->nextdate += BTIME; 4660 } else { 4662 /* 4663 * While not shown here, the reference implementation 4664 * randonizes the poll interval by a small factor. 4665 */ 4666 p->nextdate = p->outdate + (1 << max(min(p->ppoll, 4667 p->hpoll), MINPOLL)); 4668 } 4670 /* 4671 * It might happen that the due time has already passed. If so, 4672 * make it one second in the future. 4673 */ 4674 if (p->nextdate <= c.t) 4675 p->nextdate = c.t + 1; 4676 } 4677 A.5.7.3. peer_xmit() 4678 /* 4679 * transmit() - transmit a packet for association p 4680 */ 4681 void 4682 peer_xmit( 4683 struct p *p /* peer structure pointer */ 4684 ) 4685 { 4686 struct x x; /* transmit packet */ 4688 /* 4689 * Initialize header and transmit timestamp 4690 */ 4691 x.srcaddr = p->dstaddr; 4692 x.dstaddr = p->srcaddr; 4693 x.leap = s.leap; 4694 x.version = p->version; 4695 x.mode = p->hmode; 4696 if (s.stratum == MAXSTRAT) 4697 x.stratum = 0; 4698 else 4699 x.stratum = s.stratum; 4700 x.poll = p->hpoll; 4701 x.precision = s.precision; 4702 x.rootdelay = D2FP(s.rootdelay); 4703 x.rootdisp = D2FP(s.rootdisp); 4704 x.refid = s.refid; 4705 x.reftime = s.reftime; 4706 x.org = p->org; 4707 x.rec = p->rec; 4708 x.xmt = get_time(); 4709 p->xmt = x.xmt; 4711 /* 4712 * If the key ID is nonzero, send a valid MAC using the key ID 4713 * of the association and the key in the local key cache. If 4714 * something breaks, like a missing trusted key, don't send the 4715 * packet; just reset the association and stop until the problem 4716 * is fixed. 4717 */ 4718 if (p->keyid) 4719 if (/* p->keyid invalid */ 0) { 4720 clear(p, X_NKEY); 4721 return; 4722 } 4723 x.mac = md5(p->keyid); 4724 xmit_packet(&x); 4725 } 4726 Authors' Addresses 4728 Jack Burbank (editor) 4729 The Johns Hopkins University Applied Physics Laboratory 4730 11100 Johns Hopkins Road 4731 Laurel, MD 20723-6099 4732 US 4734 Phone: +1 443 778 7127 4735 Email: jack.burbank@jhuapl.edu 4737 William Kasch (editor) 4738 The Johns Hopkins University Applied Physics Laboratory 4739 11100 Johns Hopkins Road 4740 Laurel, MD 20723-6099 4741 US 4743 Phone: +1 443 778 7463 4744 Email: william.kasch@jhuapl.edu 4746 Jim Martin (editor) 4747 The Daedelus Group 4748 721 Alameda de las Pulgas 4749 Belmont, CA 94002 4750 US 4752 Phone: +1 408 799-2788 4753 Email: jim@daedelus.com 4755 Dr. David L. Mills 4756 University of Delaware 4757 Newark, DE 19716 4758 US 4760 Phone: +1 302 831 8247 4761 Email: mills@udel.edu 4763 Full Copyright Statement 4765 Copyright (C) The IETF Trust (2007). 4767 This document is subject to the rights, licenses and restrictions 4768 contained in BCP 78, and except as set forth therein, the authors 4769 retain all their rights. 4771 This document and the information contained herein are provided on an 4772 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 4773 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 4774 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 4775 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 4776 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 4777 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 4779 Intellectual Property 4781 The IETF takes no position regarding the validity or scope of any 4782 Intellectual Property Rights or other rights that might be claimed to 4783 pertain to the implementation or use of the technology described in 4784 this document or the extent to which any license under such rights 4785 might or might not be available; nor does it represent that it has 4786 made any independent effort to identify any such rights. Information 4787 on the procedures with respect to rights in RFC documents can be 4788 found in BCP 78 and BCP 79. 4790 Copies of IPR disclosures made to the IETF Secretariat and any 4791 assurances of licenses to be made available, or the result of an 4792 attempt made to obtain a general license or permission for the use of 4793 such proprietary rights by implementers or users of this 4794 specification can be obtained from the IETF on-line IPR repository at 4795 http://www.ietf.org/ipr. 4797 The IETF invites any interested party to bring to its attention any 4798 copyrights, patents or patent applications, or other proprietary 4799 rights that may cover technology that may be required to implement 4800 this standard. Please address the information to the IETF at 4801 ietf-ipr@ietf.org. 4803 Acknowledgment 4805 Funding for the RFC Editor function is provided by the IETF 4806 Administrative Support Activity (IASA).