idnits 2.17.1 draft-ietf-ntp-ntpv4-proto-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 4790. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4801. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4808. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4814. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 119 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 3 instances of too long lines in the document, the longest one being 1 character in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document obsoletes RFC1305, but the abstract doesn't seem to directly say this. It does mention RFC1305 though, so this could be OK. -- The draft header indicates that this document obsoletes RFC4330, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 2711 has weird spacing: '... ipaddr srcad...' == Line 2712 has weird spacing: '... ipaddr dstad...' == Line 2713 has weird spacing: '... char ver...' == Line 2714 has weird spacing: '... char lea...' == Line 2715 has weird spacing: '... char mod...' == (114 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 5, 2008) is 5683 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'NSTAGE' is mentioned on line 3584, but not defined == Missing Reference: 'NMAX' is mentioned on line 2856, but not defined -- Looks like a reference, but probably isn't: '7' on line 3210 -- Looks like a reference, but probably isn't: '5' on line 3210 -- Looks like a reference, but probably isn't: '0' on line 4191 -- Obsolete informational reference (is this intentional?): RFC 1305 (Obsoleted by RFC 5905) -- Obsolete informational reference (is this intentional?): RFC 2434 (Obsoleted by RFC 5226) -- Obsolete informational reference (is this intentional?): RFC 4330 (Obsoleted by RFC 5905) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Downref: Normative reference to an Informational RFC: RFC 1321 ** Downref: Normative reference to an Informational RFC: RFC 1345 Summary: 6 errors (**), 0 flaws (~~), 12 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NTP WG J. Burbank, Ed. 3 Internet-Draft W. Kasch, Ed. 4 Obsoletes: RFC 4330, RFC 1305 JHU/APL 5 (if approved) J. Martin, Ed. 6 Intended status: Standards Track Woven 7 Expires: March 9, 2009 D. Mills 8 U. Delaware 9 September 5, 2008 11 Network Time Protocol Version 4 Protocol And Algorithms Specification 12 draft-ietf-ntp-ntpv4-proto-11 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on March 9, 2009. 39 Abstract 41 The Network Time Protocol (NTP) is widely used to synchronize 42 computer clocks in the Internet. This document describes NTP Version 43 4 (NTPv4), which is backwards compatible with NTP Version 3 (NTPv3) 44 described in RFC 1305, as well as previous versions of the protocol. 45 NTPv4 includes a modified protocol header to accommodate the Internet 46 Protocol Version 6 address family. NTPv4 includes fundamental 47 improvements in the mitigation and discipline algorithms which extend 48 the potential accuracy to the tens of microseconds with modern 49 workstations and fast LANs. It includes a dynamic server discovery 50 scheme, so that in many cases specific server configuration is not 51 required. It corrects certain errors in the NTPv3 design and 52 implementation and includes an optional extension mechanism. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 57 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 58 2. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 5 59 3. Protocol Modes . . . . . . . . . . . . . . . . . . . . . . . 6 60 3.1. Dynamic Server Discovery . . . . . . . . . . . . . . . . 7 61 4. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 5. Implementation Model . . . . . . . . . . . . . . . . . . . . 10 63 6. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . 12 64 7. Data Structures . . . . . . . . . . . . . . . . . . . . . . . 16 65 7.1. Structure Conventions . . . . . . . . . . . . . . . . . . 16 66 7.2. Global Parameters . . . . . . . . . . . . . . . . . . . . 16 67 7.3. Packet Header Variables . . . . . . . . . . . . . . . . . 17 68 7.4. The Kiss-o'-Death Packet . . . . . . . . . . . . . . . . 23 69 7.5. NTP Extension Field Format . . . . . . . . . . . . . . . 25 70 8. On Wire Protocol . . . . . . . . . . . . . . . . . . . . . . 26 71 9. Peer Process . . . . . . . . . . . . . . . . . . . . . . . . 30 72 9.1. Peer Process Variables . . . . . . . . . . . . . . . . . 31 73 9.2. Peer Process Operations . . . . . . . . . . . . . . . . . 33 74 10. Clock Filter Algorithm . . . . . . . . . . . . . . . . . . . 37 75 11. System Process . . . . . . . . . . . . . . . . . . . . . . . 39 76 11.1. System Process Variables . . . . . . . . . . . . . . . . 40 77 11.2. System Process Operations . . . . . . . . . . . . . . . . 41 78 11.2.1. Selection Algorithm . . . . . . . . . . . . . . . . 43 79 11.2.2. Cluster Algorithm . . . . . . . . . . . . . . . . . 44 80 11.2.3. Combine Algorithm . . . . . . . . . . . . . . . . . 45 81 11.3. Clock Discipline Algorithm . . . . . . . . . . . . . . . 47 82 12. Clock Adjust Process . . . . . . . . . . . . . . . . . . . . 51 83 13. Poll Process . . . . . . . . . . . . . . . . . . . . . . . . 51 84 13.1. Poll Process Variables . . . . . . . . . . . . . . . . . 51 85 13.2. Poll Process Operations . . . . . . . . . . . . . . . . . 52 86 14. Simple Network Time Protocol (SNTP) . . . . . . . . . . . . . 54 87 15. Security Considerations . . . . . . . . . . . . . . . . . . . 55 88 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 89 17. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 57 90 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 57 91 18.1. Informative References . . . . . . . . . . . . . . . . . 57 92 18.2. Normative References . . . . . . . . . . . . . . . . . . 58 93 Appendix A. Code Skeleton . . . . . . . . . . . . . . . . . . . 58 94 A.1. Global Definitions . . . . . . . . . . . . . . . . . . . 59 95 A.1.1. Definitions, Constants, Parameters . . . . . . . . . 59 96 A.1.2. Packet Data Structures . . . . . . . . . . . . . . . 62 97 A.1.3. Association Data Structures . . . . . . . . . . . . 63 98 A.1.4. System Data Structures . . . . . . . . . . . . . . . 66 99 A.1.5. Local Clock Data Structures . . . . . . . . . . . . 67 100 A.1.6. Function Prototypes . . . . . . . . . . . . . . . . 67 101 A.2. Main Program and Utility Routines . . . . . . . . . . . . 68 102 A.3. Kernel Input/Output Interface . . . . . . . . . . . . . . 71 103 A.4. Kernel System Clock Interface . . . . . . . . . . . . . . 71 104 A.5. Peer Process . . . . . . . . . . . . . . . . . . . . . . 73 105 A.5.1. receive() . . . . . . . . . . . . . . . . . . . . . 74 106 A.5.2. clock_filter() . . . . . . . . . . . . . . . . . . . 81 107 A.5.3. fast_xmit() . . . . . . . . . . . . . . . . . . . . 85 108 A.5.4. access() . . . . . . . . . . . . . . . . . . . . . . 87 109 A.5.5. System Process . . . . . . . . . . . . . . . . . . . 87 110 A.5.6. Clock Adjust Process . . . . . . . . . . . . . . . . 101 111 A.5.7. Poll Process . . . . . . . . . . . . . . . . . . . . 102 112 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 109 113 Intellectual Property and Copyright Statements . . . . . . . . . 110 115 1. Introduction 117 This document defines the Network Time Protocol Version 4 (NTPv4), 118 which is widely used to synchronize system clocks among a set of 119 distributed time servers and clients. It describes the core 120 architecture, protocol, state machines, data structures and 121 algorithms. NTPv4 introduces new functionality to NTPv3, as 122 described in [RFC1305], and functionality expanded from SNTPv4 as 123 described in [RFC4330] (SNTPv4 is a subset of NTPv4). This document 124 obsoletes [RFC1305], and [RFC4330]. While certain minor changes have 125 been made in some protocol header fields, these do not affect the 126 interoperability between NTPv4 and previous versions of NTP and SNTP. 128 The NTP subnet model includes a number of widely accessible primary 129 time servers synchronized by wire or radio to national standards. 130 The purpose of the NTP protocol is to convey timekeeping information 131 from these primary servers to secondary time servers and clients via 132 both private networks and the public Internet. Precisely tuned 133 algorithms mitigate errors that may result from network disruptions, 134 server failures and possible hostile actions. Servers and clients 135 are configured such that values flow towards clients from the primary 136 servers at the root via branching secondary servers. 138 The NTPv4 design overcomes significant shortcomings in the NTPv3 139 design, corrects certain bugs and incorporates new features. In 140 particular, expanded NTP timestamp definitions encourage the use of 141 the floating double data type throughout the implementation. As a 142 result, the time resolution is better than one nanosecond and 143 frequency resolution is less than one nanosecond per second. 144 Additional improvements include a new clock discipline algorithm 145 which is more responsive to system clock hardware frequency 146 fluctuations. Typical primary servers using modern machines are 147 precise within a few tens of microseconds. Typical secondary servers 148 and clients on fast LANs are within a few hundred microseconds with 149 poll intervals up to 1024 seconds, which was the maximum with NTPv3. 150 With NTPv4, servers and clients are precise within a few tens of 151 milliseconds with poll intervals up to 36 hours. 153 The main body of this document describes the core protocol and data 154 structures necessary to interoperate between conforming 155 implementations. Appendix A contains a full featured example in the 156 form of a skeleton program, including data structures and code 157 segments for the core algorithms as well as the mitigation algorithms 158 used to enhance reliability and accuracy. While the skeleton program 159 and other descriptions in this document apply to a particular 160 implementation, they are not intended as the only way the required 161 functions can be implemented. While the NTPv3 symmetric key 162 authentication scheme described in this document has been carried 163 over from NTPv3, the Autokey public key authentication scheme new to 164 NTPv4 is described in [ref3]. 166 The NTP protocol includes modes of operation described in Section 2 167 using data types described in Section 6 and data structures described 168 in Section 7. The implementation model described in Section 5 is 169 based on a threaded, multi-process architecture, although other 170 architectures could be used as well. The on-wire protocol described 171 in Section 8 is based on a returnable-time design which depends only 172 on measured clock offsets, but does not require reliable message 173 delivery. Reliable message delivery such as TCP[RFC0793] can 174 actually make the delivered NTP packet less reliable since retries 175 would increase the delay value and other errors. The synchronization 176 subnet is a self-organizing, hierarchical, master-slave network with 177 synchronization paths determined by a shortest-path spanning tree and 178 defined metric. While multiple masters (primary servers) may exist, 179 there is no requirement for an election protocol. 181 This document includes material from [ref9], which contains flow 182 charts and equations unsuited for RFC format. There is much 183 additional information in [ref7], including an extensive technical 184 analysis and performance assessment of the protocol and algorithms in 185 this document. The reference implementation is available at 186 www.ntp.org. 188 The remainder of this document contains numerous variables and 189 mathematical expressions. Some variables take the form of Greek 190 characters, which are spelled out by their full case-sensitive name. 191 For example DELTA refers to the uppercase Greek character, while 192 delta refers to the lowercase character. Furthermore, subscripts are 193 denoted with '_', for example theta_i refers to the lowercase Greek 194 character theta with subscript i, or phonetically theta sub i. In 195 this document all time values are in seconds (s), and all frequencies 196 will be specified as fractional frequency offsets (FFO) (pure 197 number). It is often convenient to express these FFOs in parts per 198 million (ppm). 200 1.1. Requirements Notation 202 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 203 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 204 document are to be interpreted as described in [RFC2119]. 206 2. Modes of Operation 208 An NTP implementation operates as a primary server, secondary server 209 or client. A primary server is synchronized to a reference clock 210 directly traceable to UTC (eg, GPS, Galileo, etc). A client 211 synchronizes to one or more upstream servers, but does not provide 212 synchronization to dependent clients. A secondary server has one or 213 more upstream servers and one or more downstream servers or clients. 214 All servers and clients who are fully NTPv4 compliant MUST implement 215 the entire suite of algorithms described in this document. In order 216 to maintain stability in large NTP subnets, secondary servers SHOULD 217 be fully NTPv4 compliant. Alternative algorithms MAY be used, but 218 their output MUST be identical to the algorithms described in this 219 specification. 221 3. Protocol Modes 223 There are three NTP protocol variants, symmetric, client/server and 224 broadcast. Each is associated with an association mode as shown in 225 Figure 1. In addition, persistent associations are mobilized upon 226 startup and are never demobilized. Ephemeral associations are 227 mobilized upon the arrival of a packet and are demobilized upon error 228 or timeout. 230 +-------------------+--------------+-------------+ 231 | Association Mode | Assoc. Mode | Packet Mode | 232 +-------------------+--------------+-------------+ 233 | Symmetric Active | 1 | 1 or 2 | 234 | Symmetric Passive | 2 | 1 | 235 | Client | 3 | 4 | 236 | Server | 4 | 3 | 237 | Broadcast Server | 5 | 5 | 238 | Broadcast Client | 6 | N/A | 239 +-------------------+--------------+-------------+ 241 Figure 1: Association and Packet Modes 243 In the client/server variant a persistent client sends client (mode 244 3) packets to a server, which returns server (mode 4) packets. 245 Servers provide synchronization to one or more clients, but do not 246 accept synchronization from them. A server can also be a reference 247 clock driver which obtains time directly from a standard source such 248 as a GPS receiver or telephone modem service. In this variant, 249 clients pull synchronization from servers. 251 In the symmetric variant a peer operates as both a server and client 252 using either a symmetric active or symmetric passive association. A 253 persistent symmetric active association sends symmetric active (mode 254 1) packets to a symmetric active peer association. Alternatively, an 255 ephemeral symmetric passive association can be mobilized upon arrival 256 of a symmetric active packet with no matching association. That 257 association sends symmetric passive (mode 2) packets and persists 258 until error or timeout. Peers both push and pull synchronization to 259 and from each other. For the purposes of this document, a peer 260 operates like a client, so references to client imply peer as well. 262 In the broadcast variant a persistent broadcast server association 263 sends periodic broadcast server (mode 5) packets which can be 264 received by multiple clients. Upon reception of a broadcast server 265 packet without a matching association, an ephemeral broadcast client 266 (mode 6) association is mobilized and persists until error or 267 timeout. It is useful to provide an initial volley where the client 268 operating in client mode exchanges several packets with the server, 269 so as to calibrate the propagation delay and to run the Autokey 270 security protocol, after which the client reverts to broadcast client 271 mode. A broadcast server pushes synchronization to clients and other 272 servers. 274 Following loosely the conventions established by the telephone 275 industry, the level of each server in the hierarchy is defined by a 276 stratum number. Primary servers are assigned stratum one; secondary 277 servers at each lower level are assigned stratum numbers one greater 278 than the preceding level. As the stratum number increases, its 279 accuracy degrades depending on the particular network path and system 280 clock stability. Mean errors, measured by synchronization distances, 281 increase approximately in proportion to stratum numbers and measured 282 roundtrip delay. 284 As a standard practice, timing network topology should be organized 285 to avoid timing loops and minimize the synchronization distance. In 286 NTP the subnet topology is determined using a variant of the Bellman- 287 Ford distributed routing algorithm, which computes the shortest-path 288 spanning tree rooted on the primary servers. As a result of this 289 design, the algorithm automatically reorganizes the subnet, so as to 290 produce the most accurate and reliable time, even when there are 291 failures in the timing network. 293 3.1. Dynamic Server Discovery 295 There are two special associations, manycast client and manycast 296 server, which provide a dynamic server discovery function. There are 297 two types of manycast client associations, persistent and ephemeral. 298 The persistent manycast client sends client (mode 3) packets to a 299 designated IPv4 or IPv6 broadcast or multicast group address. 300 Designated manycast servers within range of the time-to-live (TTL) 301 field in the packet header listen for packets with that address. If 302 a server is suitable for synchronization, it returns an ordinary 303 server (mode 4) packet using the client's unicast address. Upon 304 receiving this packet, the client mobilizes an ephemeral client (mode 305 3) association. The ephemeral client association persists until 306 error or timeout. 308 A manycast client continues sending packets to search for a minimum 309 number of associations. It starts with a TTL equal to one and 310 continuously adding one to it until the minimum number of 311 associations is made or when the TTL reaches a maximum value. If the 312 TTL reaches its maximum value and yet not enough associations are 313 mobilized, the client stops transmission for a time-out period to 314 clear all associations, and then repeats the search cycle. If a 315 minimum number of associations has been mobilized, then the client 316 starts transmitting one packet per time-out period to maintain the 317 associations. Field constraints limit the minimum value to 1 and the 318 maximum to 255. These limits may be tuned for individual application 319 needs. 321 The ephemeral associations compete among themselves. As new 322 ephemeral associations are mobilized, the client runs the mitigation 323 algorithms described in Section 10 and Section 11.2 for the best 324 candidates out of the population, the remaining ephemeral 325 associations are timed out and demobilized. In this way the 326 population includes only the best candidates that have most recently 327 responded with an NTP packet to discipline the system clock. 329 4. Definitions 331 A number of technical terms are defined in this section. A timescale 332 is a frame of reference where time is expressed as the value of a 333 monotonically increasing binary counter with an indefinite number of 334 bits. It counts in seconds and fractions of a second, when a decimal 335 point is employed. The Coordinated Universal Time (UTC) timescale is 336 defined by ITU-R TF.460[ITU-R_TF.460]. Under the auspices of the 337 Metre Convention of 1865, in 1975 the CGPM[CGPM] strongly endorsed 338 the use of UTC as the basis for civil time. 340 The Coordinated Universal Time (UTC) timescale represents mean solar 341 time as disseminated by national standards laboratories. The system 342 time is represented by the system clock maintained by the hardware 343 and operating system. The goal of the NTP algorithms is to minimize 344 both the time difference and frequency difference between UTC and the 345 system clock. When these differences have been reduced below nominal 346 tolerances, the system clock is said to be synchronized to UTC. 348 The date of an event is the UTC time at which the event takes place. 349 Dates are ephemeral values designated with upper case T. Running time 350 is another timescale that is coincident to the synchronization 351 function of the NTP program. 353 A timestamp T(t) represents either the UTC date or time offset from 354 UTC at running time t. Which meaning is intended should be clear 355 from the context. Let T(t) be the time offset, R(t) the frequency 356 offset, D(t) the aging rate (first derivative of R(t) with respect to 357 t). Then, if T(t_0) is the UTC time offset determined at t = t_0, 358 the UTC time offset at time t is 360 T(t) = T(t_0) + R(t_0)(t-t_0) + 1/2 * D(t_0)(t-t_0)^2 + e, 362 where e is a stochastic error term discussed later in this document. 363 While the D(t) term is important when characterizing precision 364 oscillators, it is ordinarily neglected for computer oscillators. In 365 this document all time values are in seconds (s) and all frequency 366 values are in seconds-per-second (s/s). It is sometimes convenient 367 to express frequency offsets in parts-per-million (PPM), where 1 PPM 368 is equal to 10^(-6) seconds/second. 370 It is important in computer timekeeping applications to assess the 371 performance of the timekeeping function. The NTP performance model 372 includes four statistics which are updated each time a client makes a 373 measurement with a server. The offset (theta) represents the 374 maximum-likelihood time offset of the server clock relative to the 375 system clock. The delay (delta) represents the round trip delay 376 between the client and server. The dispersion (epsilon) represents 377 the maximum error inherent in the measurement. It increases at a 378 rate equal to the maximum disciplined system clock frequency 379 tolerance (PHI), typically 15 PPM. The jitter (psi) is defined as 380 the root-mean-square (RMS) average of the most recent offset 381 differences, represents the nominal error in estimating the offset. 383 While the theta, delta, epsilon, and psi statistics represent 384 measurements of the system clock relative to each server clock 385 separately, the NTP protocol includes mechanisms to combine the 386 statistics of several servers to more accurately discipline and 387 calibrate the system clock. The system offset (THETA) represents the 388 maximum-likelihood offset estimate for the server population. The 389 system jitter (PSI) represents the nominal error in estimating the 390 system offset. The delta and epsilon statistics are accumulated at 391 each stratum level from the reference clock to produce the root delay 392 (DELTA) and root dispersion (EPSILON) statistics. The 393 synchronization distance (LAMBDA) equal to EPSILON + DELTA / 2 394 represents the maximum error due all causes. The detailed 395 formulations of these statistics are given in Section 11.2. They are 396 available to the dependent applications in order to assess the 397 performance of the synchronization function. 399 5. Implementation Model 401 Figure 2 shows the architecture of a typical, multi-threaded 402 implementation. It includes two processes dedicated to each server, 403 a peer process to receive messages from the server or reference clock 404 and a poll process to transmit messages to the server or reference 405 clock. 407 ..................................................................... 408 . Remote . Peer/Poll . System . Clock . 409 . Servers . Processes . Process .Discipline. 410 . . . . Process . 411 .+--------+. +-----------+. +------------+ . . 412 .| |->| |. | | . . 413 .|Server 1| |Peer/Poll 1|->| | . . 414 .| |<-| |. | | . . 415 .+--------+. +-----------+. | | . . 416 . . ^ . | | . . 417 . . | . | | . . 418 .+--------+. +-----------+. | | +-----------+. . 419 .| |->| |. | Selection |->| |. +------+ . 420 .|Server 2| |Peer/Poll 2|->| and | | Combine |->| Loop | . 421 .| |<-| |. | Cluster | | Algorithm |. |Filter| . 422 .+--------+. +-----------+. | Algorithms |->| |. +------+ . 423 . . ^ . | | +-----------+. | . 424 . . | . | | . | . 425 .+--------+. +-----------+. | | . | . 426 .| |->| |. | | . | . 427 .|Server 3| |Peer/Poll 3|->| | . | . 428 .| |<-| |. | | . | . 429 .+--------+. +-----------+. +------------+ . | . 430 ....................^.........................................|...... 431 | . V . 432 | . +-----+ . 433 +--------------------------------------| VFO | . 434 . +-----+ . 435 . Clock . 436 . Adjust . 437 . Process . 438 ............ 440 Figure 2: Implementation Model 442 These processes operate on a common data structure, called an 443 association, which contains the statistics described above along with 444 various other data described in Section 9. A client sends packets to 445 one or more servers and then processes returned packets when they are 446 received. The server interchanges source and destination addresses 447 and ports, overwrites certain fields in the packet and returns it 448 immediately (in the client/server mode) or at some time later (in the 449 symmetric modes). As each NTP message is received, the offset theta 450 between the peer clock and the system clock is computed along with 451 the associated statistics delta, epsilon and psi. 453 The system process includes the selection, cluster and combine 454 algorithms that mitigate among the various servers and reference 455 clocks to determine the most accurate and reliable candidates to 456 synchronize the system clock. The selection algorithm uses Byzantine 457 fault detection principles to discard the presumably incorrect 458 candidates called "falsetickers" from the incident population, 459 leaving only good candidates called "truechimers". A truechimer is a 460 clock that maintains timekeeping accuracy to a previously published 461 and trusted standard, while a falseticker is a clock that shows 462 misleading or inconsistent time. The cluster algorithm uses 463 statistical principles to find the most accurate set of truechimers. 464 The combine algorithm computes the final clock offset by 465 statistically averaging the surviving truechimers. 467 The clock discipline process is a system process that controls the 468 time and frequency of the system clock, here represented as a 469 variable frequency oscillator (VFO). Timestamps struck from the VFO 470 close the feedback loop which maintains the system clock time. 471 Associated with the clock discipline process is the clock adjust 472 process, which runs once each second to inject a computed time offset 473 and maintain constant frequency. The RMS average of past time offset 474 differences represents the nominal error or system clock jitter. The 475 RMS average of past frequency offset differences represents the 476 oscillator frequency stability or frequency wander. These terms are 477 given precise interpretation in Section 11.3. 479 A client sends messages to each server with a poll interval of 2^tau 480 seconds, as determined by the poll exponent tau. In NTPv4, tau 481 ranges from 4 (16 s) through 17 (36 h). The value of tau is 482 determined by the clock discipline algorithm to match the loop time 483 constant T_c = 2^tau. In client/server mode the server responds 484 immediately; however, in symmetric modes each of two peers manages 485 tau as a function of current system offset and system jitter, so may 486 not agree with the same value. It is important that the dynamic 487 behavior of the clock discipline algorithm be carefully controlled in 488 order to maintain stability in the NTP subnet at large. This 489 requires that the peers agree on a common tau equal to the minimum 490 poll exponent of both peers. The NTP protocol includes provisions to 491 properly negotiate this value. 493 The implementation model includes some means to set and adjust the 494 system clock. The operating system is assumed to provide two 495 functions, one to set the time directly, for example the Unix 496 settimeofday() function, and another to adjust the time in small 497 increments advancing or retarding the time by a designated amount, 498 for example the Unix adjtime() function. In this and following 499 references, parentheses following a name indicate reference to a 500 function rather than a simple variable. In the intended design the 501 clock discipline process uses the adjtime() function if the 502 adjustment is less than a designated threshold, and the 503 settimeofday() function if above the threshold. The manner in which 504 this is done and the value of the threshold as described in 505 Section 10. 507 6. Data Types 509 All NTP time values are represented in twos-complement format, with 510 bits numbered in big-endian (as described in Appendix A of [RFC0791]) 511 fashion from zero starting at the left, or high-order, position. 512 There are three NTP time formats, a 128-bit date format, a 64-bit 513 timestamp format and a 32-bit short format, as shown in Figure 3. 514 The 128-bit date format is used where sufficient storage and word 515 size are available. It includes a 64-bit signed seconds field 516 spanning 584 billion years and a 64-bit fraction field resolving .05 517 attosecond (i.e., 0.5e-18). For convenience in mapping between 518 formats, the seconds field is divided into a 32-bit Era Number field 519 and a 32-bit Era Offset field. Eras cannot be produced by NTP 520 directly, nor is there need to do so. When necessary, they can be 521 derived from external means, such as the filesystem or dedicated 522 hardware. 524 0 1 2 3 525 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 527 | Seconds | Fraction | 528 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 530 NTP Short Format 532 0 1 2 3 533 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 535 | Seconds | 536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 537 | Fraction | 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 NTP Timestamp Format 542 0 1 2 3 543 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | Era Number | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 547 | Era Offset | 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 | | 550 | Fraction | 551 | | 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 554 NTP Date Format 556 Figure 3: NTP Time Formats 558 The 64-bit timestamp format is used in packet headers and other 559 places with limited word size. It includes a 32-bit unsigned seconds 560 field spanning 136 years and a 32-bit fraction field resolving 232 561 picoseconds. The 32-bit short format is used in delay and dispersion 562 header fields where the full resolution and range of the other 563 formats are not justified. It includes a 16-bit unsigned seconds 564 field and a 16-bit fraction field. 566 In the date and timestamp formats the prime epoch, or base date of 567 era 0, is 0 h 1 January 1900 UTC, when all bits are zero. It should 568 be noted that strictly speaking, UTC did not exist prior to 1 January 569 1972, but it is convenient to assume it has existed for all eternity, 570 even if all knowledge of historic leap seconds has been lost. Dates 571 are relative to the prime epoch; values greater than zero represent 572 times after that date; values less than zero represent times before 573 it. Note that the Era Offset field of the date format and the 574 Seconds field of the timestamp format have the same interpretation. 576 Timestamps are unsigned values and operations on them produce a 577 result in the same or adjacent eras. Era 0 includes dates from the 578 prime epoch to some time in 2036, when the timestamp field wraps 579 around and the base date for era 1 is established. In either format 580 a value of zero is a special case representing unknown or 581 unsynchronized time. Figure 4 shows a number of historic NTP dates 582 together with their corresponding Modified Julian Day (MJD), NTP era 583 and NTP timestamp. 585 +-------------+------------+-----+---------------+------------------+ 586 | Date | MJD | NTP | NTP Timestamp | Epoch | 587 | | | Era | Era Offset | | 588 +-------------+------------+-----+---------------+------------------+ 589 | 1 Jan -4712 | -2,400,001 | -49 | 1,795,583,104 | 1st day Julian | 590 | 1 Jan -1 | -679,306 | -14 | 139,775,744 | 2 BCE | 591 | 1 Jan 0 | -678,491 | -14 | 171,311,744 | 1 BCE | 592 | 1 Jan 1 | -678,575 | -14 | 202,939,144 | 1 CE | 593 | 4 Oct 1582 | -100,851 | -3 | 2,873,647,488 | Last day Julian | 594 | 15 Oct 1582 | -100,840 | -3 | 2,874,597,888 | First day | 595 | | | | | Gregorian | 596 | 31 Dec 1899 | 15019 | -1 | 4,294,880,896 | Last day NTP Era | 597 | | | | | -1 | 598 | 1 Jan 1900 | 15020 | 0 | 0 | First day NTP | 599 | | | | | Era 0 | 600 | 1 Jan 1970 | 40,587 | 0 | 2,208,988,800 | First day UNIX | 601 | 1 Jan 1972 | 41,317 | 0 | 2,272,060,800 | First day UTC | 602 | 31 Dec 1999 | 51,543 | 0 | 3,155,587,200 | Last day 20th | 603 | | | | | Century | 604 | 8 Feb 2036 | 64,731 | 1 | 63,104 | First day NTP | 605 | | | | | Era 1 | 606 +-------------+------------+-----+---------------+------------------+ 608 Figure 4: Interesting Historic NTP Dates 610 Let p be the number of significant bits in the second fraction. The 611 clock resolution is defined 2^(-p), in seconds. In order to minimize 612 bias and help make timestamps unpredictable to an intruder, the non- 613 significant bits should be set to an unbiased random bit string. The 614 clock precision is defined as the running time to read the system 615 clock, in seconds. Note that the precision defined in this way can 616 be larger or smaller than the resolution. The term rho, representing 617 the precision used in the protocol, is the larger of the two. 619 The only arithmetic operation permitted on dates and timestamps is 620 twos-complement subtraction, yielding a 127-bit or 63-bit signed 621 result. It is critical that the first-order differences between two 622 dates preserve the full 128-bit precision and the first-order 623 differences between two timestamps preserve the full 64-bit 624 precision. However, the differences are ordinarily small compared to 625 the seconds span, so they can be converted to floating double format 626 for further processing and without compromising the precision. 628 It is important to note that twos-complement arithmetic does not 629 distinguish between signed and unsigned values (although comparisons 630 can take sign into account); only the conditional branch instructions 631 do. Thus, although the distinction is made between signed dates and 632 unsigned timestamps, they are processed the same way. A perceived 633 hazard with 64-bit timestamp calculations spanning an era, such as 634 possible in 2036, might result in over-run. In point of fact, if the 635 client is set within 68 years of the server before the protocol is 636 started, correct values are obtained even if the client and server 637 are in adjacent eras. 639 Some time values are represented in exponent format, including the 640 precision, time constant and poll interval. These are in 8-bit 641 signed integer format in log2 (log base 2) seconds. The only 642 arithmetic operations permitted on them are increment and decrement. 643 For the purpose of this document and to simplify the presentation, a 644 reference to one of these variables by name means the exponentiated 645 value, e.g., the poll interval is 1024 s, while reference by name and 646 exponent means the actual value, e.g., the poll exponent is 10. 648 To convert system time in any format to NTP date and timestamp 649 formats requires that the number of seconds s from the prime epoch to 650 the system time be determined. To determine the integer era and 651 timestamp given s, 653 era = s / 2^(32) and timestamp = s - era * 2^(32), 655 which works for positive and negative dates. To determine s given 656 the era and timestamp, 658 s = era * 2^(32) + timestamp. 660 Converting between NTP and system time can be a little messy, and 661 beyond the scope of this document. Note that the number of days in 662 era 0 is one more than the number of days in most other eras and this 663 won't happen again until the year 2400 in era 3. 665 In the description of state variables to follow, explicit reference 666 to integer type implies a 32-bit unsigned integer. This simplifies 667 bounds checks, since only the upper limit needs to be defined. 669 Without explicit reference, the default type is 64-bit floating 670 double. Exceptions will be noted as necessary. 672 7. Data Structures 674 The NTP protocol state machines are defined in the following 675 sections. State variables are separated into classes according to 676 their function in packet headers, peer and poll processes, the system 677 process and the clock discipline process. Packet variables represent 678 the NTP header values in transmitted and received packets. Peer and 679 poll variables represent the contents of the association for each 680 server separately. System variables represent the state of the 681 server as seen by its dependent clients. Clock discipline variables 682 represent the internal workings of the clock discipline algorithm. 683 An example is described in Appendix A. 685 7.1. Structure Conventions 687 In order to distinguish between different variables of the same name 688 but used in different processes, the naming convention summarized in 689 Figure 5 is adopted. A receive packet variable v is a member of the 690 packet structure r with fully qualified name r.v. In a similar 691 manner x.v is a transmit packet variable, p.v is a peer variable, s.v 692 is a system variable and c.v is a clock discipline variable. There 693 is a set of peer variables for each association; there is only one 694 set of system and clock variables. 696 +------+---------------------------------+ 697 | Name | Description | 698 +------+---------------------------------+ 699 | r. | receive packet header variable | 700 | x. | transmit packet header variable | 701 | p. | peer/poll variable | 702 | s. | system variable | 703 | c. | clock discipline variable | 704 +------+---------------------------------+ 706 Figure 5: Prefix Conventions 708 7.2. Global Parameters 710 In addition to the variable classes a number of global parameters are 711 defined in this document, including those shown with values in 712 Figure 6. 714 +-----------+-------+----------------------------------+ 715 | Name | Value | Description | 716 +-----------+-------+----------------------------------+ 717 | PORT | 123 | NTP port number | 718 | VERSION | 4 | NTP version number | 719 | TOLERANCE | 15e-6 | frequency tolerance PHI (s/s) | 720 | MINPOLL | 4 | minimum poll exponent (16 s) | 721 | MAXPOLL | 17 | maximum poll exponent (36 h) | 722 | MAXDISP | 16 | maximum dispersion (16 s) | 723 | MINDISP | .005 | minimum dispersion increment (s) | 724 | MAXDIST | 1 | distance threshold (1 s) | 725 | MAXSTRAT | 16 | maximum stratum number | 726 +-----------+-------+----------------------------------+ 728 Figure 6: Global Parameters 730 While these are the only global parameters needed for 731 interoperability, a larger collection is necessary in any 732 implementation. Appendix A.1.1 contains those used by the skeleton 733 for the mitigation algorithms, clock discipline algorithm and related 734 implementation-dependent functions. Some of these parameter values 735 are cast in stone, like the NTP port number assigned by the IANA and 736 the version number assigned NTPv4 itself. Others like the frequency 737 tolerance (also called PHI), involve an assumption about the worst 738 case behavior of a system clock once synchronized and then allowed to 739 drift when its sources have become unreachable. The minimum and 740 maximum parameters define the limits of state variables as described 741 in later sections of this document. 743 While shown with fixed values in this document, some implementations 744 may make them variables adjustable by configuration commands. For 745 instance, the reference implementation computes the value of 746 PRECISION as log2 of the minimum time in several iterations to read 747 the system clock. 749 7.3. Packet Header Variables 751 The most important state variables from an external point of view are 752 the packet header variables described in Figure 7 and below. The NTP 753 packet header consists of an integral number of 32-bit (4 octet) 754 words in network byte order. The packet format consists of three 755 components, the header itself, one or more optional extension fields 756 and an optional message authentication code (MAC). The header 757 component is identical to the NTPv3 header and previous versions. 758 The optional extension fields are used by the Autokey public key 759 cryptographic algorithms described in [ref3]. The optional MAC is 760 used by both Autokey and the symmetric key cryptographic algorithm 761 described in this RFC. 763 +-----------+------------+-----------------------+ 764 | Name | Formula | Description | 765 +-----------+------------+-----------------------+ 766 | leap | leap | leap indicator (LI) | 767 | version | version | version number (VN) | 768 | mode | mode | mode | 769 | stratum | stratum | stratum | 770 | poll | poll | poll exponent | 771 | precision | rho | precision exponent | 772 | rootdelay | delta_r | root delay | 773 | rootdisp | epsilon_r | root dispersion | 774 | refid | refid | reference ID | 775 | reftime | reftime | reference timestamp | 776 | org | T1 | origin timestamp | 777 | rec | T2 | receive timestamp | 778 | xmt | T3 | transmit timestamp | 779 | dst | T4 | destination timestamp | 780 | keyid | keyid | key ID | 781 | MAC | MAC | message digest | 782 +-----------+------------+-----------------------+ 784 Figure 7: Packet Header Variables 786 The NTP packet is a UDP datagram[RFC0768]. Some fields use multiple 787 words and others are packed in smaller fields within a word. The NTP 788 packet header shown in Figure 8 has 12 words followed by optional 789 extension fields and finally an optional message authentication code 790 (MAC) consisting of the key identifier field and message digest 791 field. 793 0 1 2 3 794 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 796 |LI | VN |Mode | Stratum | Poll | Precision | 797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 798 | Root Delay | 799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 800 | Root Dispersion | 801 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 802 | Reference ID | 803 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 804 | | 805 + Reference Timestamp (64) + 806 | | 807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 808 | | 809 + Origin Timestamp (64) + 810 | | 811 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 812 | | 813 + Receive Timestamp (64) + 814 | | 815 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 816 | | 817 + Transmit Timestamp (64) + 818 | | 819 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 820 | | 821 . . 822 . Extension Field 1 (variable) . 823 . . 824 | | 825 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 826 | | 827 . . 828 . Extension Field 2 (variable) . 829 . . 830 | | 831 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 832 | Key Identifier | 833 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 834 | | 835 | MAC (128) | 836 | | 837 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 839 Figure 8: Packet Header Format 841 The extension fields are used to add optional capabilities, for 842 example, the Autokey security protocol [ref3]. The extension field 843 format is presented in order that the packet can be parsed without 844 knowledge of the extension field functions. The MAC is used by both 845 Autokey and the symmetric key authentication scheme. 847 A list of the packet header variables is shown in Figure 7 and 848 described in detail below. Except for a minor variation when using 849 the IPv6 address family, these fields are backwards compatible with 850 NTPv3. The packet header fields apply to both transmitted packets (x 851 prefix) and received packets (r prefix). In Figure 8 the size of 852 some multiple-word fields is shown in bits if not the default 32 853 bits. The basic header extends from the beginning of the packet to 854 the end of the Transmit Timestamp field. 856 The fields and associated packet variables (in parentheses) are 857 interpreted as follows: 859 LI Leap Indicator (leap): 2-bit integer warning of an impending leap 860 second to be inserted or deleted in the last minute of the current 861 month with values defined in Figure 9. 863 +-------+-------------------------------------------------+ 864 | Value | Meaning | 865 +-------+-------------------------------------------------+ 866 | 0 | no warning | 867 | 1 | last minute of the day has 61 seconds | 868 | 2 | last minute of the day has 59 seconds | 869 | 3 | alarm condition (the clock is not synchronized) | 870 +-------+-------------------------------------------------+ 872 Figure 9: Leap Indicator 874 VN Version Number (version): 3-bit integer representing the NTP 875 version number, currently 4. 877 Mode (mode): 3-bit integer representing the mode, with values defined 878 in Figure 10. 880 +-------+--------------------------+ 881 | Value | Meaning | 882 +-------+--------------------------+ 883 | 0 | reserved | 884 | 1 | symmetric active | 885 | 2 | symmetric passive | 886 | 3 | client | 887 | 4 | server | 888 | 5 | broadcast | 889 | 6 | NTP control message | 890 | 7 | reserved for private use | 891 +-------+--------------------------+ 893 Figure 10: Association Modes 895 Stratum (stratum): 8-bit integer representing the stratum, with 896 values defined in Figure 11. 898 +--------+-----------------------------------------------------+ 899 | Value | Meaning | 900 +--------+-----------------------------------------------------+ 901 | 0 | unspecified or invalid | 902 | 1 | primary server (e.g., equipped with a GPS receiver) | 903 | 2-15 | secondary server (via NTP) | 904 | 16 | unsynchronized | 905 | 17-255 | reserved | 906 +--------+-----------------------------------------------------+ 908 Figure 11: Packet Stratum 910 It is customary to map the stratum value 0 in received packets to 911 MAXSTRAT (16) in the peer variable p.stratum and to map p.stratum 912 values of MAXSTRAT or greater to 0 in transmitted packets. This 913 allows reference clocks, which normally appear at stratum 0, to be 914 conveniently mitigated using the same clock selection algorithms used 915 for external sources (See Appendix A.5.5.1 for an example). 917 Poll: 8-bit signed integer representing the maximum interval between 918 successive messages, in log2 seconds. Suggested default limits for 919 minimum and maximum poll intervals are 6 and 10, respectively. 921 Precision: 8-bit signed integer representing the precision of the 922 system clock, in log2 seconds. For instance a value of -18 923 corresponds to a precision of about one microsecond. The precision 924 can be determined when the service first starts up as the minimum 925 time of several iterations to read the system clock. 927 Root Delay (rootdelay): Total round trip delay to the reference 928 clock, in NTP short format. 930 Root Dispersion (rootdisp): Total dispersion to the reference clock, 931 in NTP short format. 933 Reference ID (refid): 32-bit code identifying the particular server 934 or reference clock. The interpretation depends on the value in the 935 stratum field. For packet stratum 0 (unspecified or invalid) this is 936 a four-character ASCII[RFC1345] string, called the kiss code, used 937 for debugging and monitoring purposes. For stratum 1 (reference 938 clock) this is a four-octet, left-justified, zero-padded ASCII string 939 assigned to the reference clock. The authoritative list of Reference 940 Identifiers is maintained by IANA, however any string beginning with 941 the ASCII character "X" is reserved for unregistered experimentation 942 and developent.The identifiers in Figure 12 have been used as ASCII 943 identifiers: 945 +------+----------------------------------------------------------+ 946 | ID | Clock Source | 947 +------+----------------------------------------------------------+ 948 | GOES | Geosynchronous Orbit Environment Satellite | 949 | GPS | Global Position System | 950 | GAL | Galileo Positioning System | 951 | PPS | Generic pulse-per-second | 952 | IRIG | Inter-Range Instrumentation Group | 953 | WWVB | LF Radio WWVB Ft. Collins, CO 60 kHz | 954 | DCF | LF Radio DCF77 Mainflingen, DE 77.5 kHz | 955 | HBG | LF Radio HBG Prangins, HB 75 kHz | 956 | MSF | LF Radio MSF Anthorn, UK 60 kHz | 957 | JJY | LF Radio JJY Fukushima, JP 40 kHz, Saga, JP 60 kHz | 958 | LORC | MF Radio LORAN C station, 100 kHz | 959 | TDF | MF Radio Allouis, FR 162 kHz | 960 | CHU | HF Radio CHU Ottawa, Ontario | 961 | WWV | HF Radio WWV Ft. Collins, CO | 962 | WWVH | HF Radio WWVH Kauai, HI | 963 | NIST | NIST telephone modem | 964 | ACTS | NIST telephone modem | 965 | USNO | USNO telephone modem | 966 | PTB | European telephone modem | 967 +------+----------------------------------------------------------+ 969 Figure 12: Reference Identifiers 971 Above stratum 1 (secondary servers and clients) this is the reference 972 identifier of the server and can be used to detect timing loops. If 973 using the IPv4 address family, the identifier is the four-octet IPv4 974 address. If using the IPv6 address family, it is the first four 975 octets of the MD5 hash of the IPv6 address. Note that, when using 976 the IPv6 address family on an NTPv4 server with a NTPv3 client, the 977 Reference Identifier field appears to be a random value and a timing 978 loop might not be detected. 980 Reference Timestamp: Time when the system clock was last set or 981 corrected, in NTP timestamp format. 983 Origin Timestamp (org): Time at the client when the request departed 984 for the server, in NTP timestamp format. 986 Receive Timestamp (rec): Time at the server when the request arrived 987 from the client, in NTP timestamp format. 989 Transmit Timestamp (xmt): Time at the server when the response left 990 for the client, in NTP timestamp format. 992 Destination Timestamp (dst): Time at the client when the reply 993 arrived from the server, in NTP timestamp format. 995 Note: Destination Timestamp field is not included as a header field; 996 it is determined upon arrival of the packet and made available in the 997 packet buffer data structure. 999 If NTP has access to the physical layer, then the timestamps are 1000 associated with the beginning of the symbol after the start of frame. 1001 Otherwise, implementations should attempt to associate the timestamp 1002 to the earliest accessible point in the frame. 1004 The MAC consists of the Key Identifier followed by the Message 1005 Digest. The message digest, or cryptosum, is calculated as in 1006 [RFC1321] over all NTP header and optional extension fields, but not 1007 the MAC itself. 1009 Extension Field n: See Section 7.5 for a description of the format of 1010 this field. 1012 Key Identifier (keyid): 32-bit unsigned integer used by the client 1013 and server to designate a secret 128-bit MD5 key. 1015 Message Digest (digest): 128-bit bitstring computed by the keyed MD5 1016 message digest computed over all the words in the header and 1017 extension fields, but not the MAC itself. 1019 7.4. The Kiss-o'-Death Packet 1021 If the Stratum field is 0, which implies unspecified or invalid, the 1022 Reference Identifier field can be used to convey messages useful for 1023 status reporting and access control. These are called Kiss-o'-Death 1024 (KoD) packets and the ASCII messages they convey are called kiss 1025 codes. The KoD packets got their name because an early use was to 1026 tell clients to stop sending packets that violate server access 1027 controls. The kiss codes can provide useful information for an 1028 intelligent client, either NTPv4 or SNTPv4. Kiss codes are encoded 1029 in four-character ASCII strings left justified and zero filled. The 1030 strings are designed for character displays and log files. A list of 1031 the currently-defined kiss codes is given in Figure 13. Recipients 1032 of kiss codes MUST inspect them and in the following cases take these 1033 actions: 1035 a. For kiss codes: DENY, RSTR the client MUST demobilize any 1036 associations to that server and stop sending packets to that 1037 server; 1039 b. For kiss code: RATE the client MUST immediately reduce its 1040 polling interval to that server and continue to reduce it each 1041 time it receives a RATE kiss code. 1043 c. Kiss codes beginning with the ASCII character "X" are for 1044 unregistered experimentation and development and MUST be ignored 1045 if not recognized. 1047 d. Other than the above conditions, KoD packets have no protocol 1048 significance and are discarded after inspection. 1050 +------+------------------------------------------------------------+ 1051 | Code | Meaning | 1052 +------+------------------------------------------------------------+ 1053 | ACST | The association belongs to a unicast server | 1054 | AUTH | Server authentication failed | 1055 | AUTO | Autokey sequence failed | 1056 | BCST | The association belongs to a broadcast server | 1057 | CRYP | Cryptographic authentication or identification failed | 1058 | DENY | Access denied by remote server | 1059 | DROP | Lost peer in symmetric mode | 1060 | RSTR | Access denied due to local policy | 1061 | INIT | The association has not yet synchronized for the first | 1062 | | time | 1063 | MCST | The association belongs to a dynamically discovered server | 1064 | NKEY | No key found. Either the key was never installed or is | 1065 | | not trusted | 1066 | RATE | Rate exceeded. The server has temporarily denied access | 1067 | | because the client exceeded the rate threshold | 1068 | RMOT | Alteration of association from a remote host running | 1069 | | ntpdc. | 1070 | STEP | A step change in system time has occurred, but the | 1071 | | association has not yet resynchronized | 1072 +------+------------------------------------------------------------+ 1074 Figure 13: Kiss Codes 1076 The Receive Timestamp and the Transmit Timestamp (set by the server) 1077 are undefined when in a KoD packet and MUST NOT be relied upon to 1078 have valid values and MUST be discarded. 1080 7.5. NTP Extension Field Format 1082 In NTPv4 one or more extension fields can be inserted after the 1083 header and before the MAC, which is always present when an extension 1084 field is present. Other than defining the field format, this 1085 document makes no use of the field contents. An extension field 1086 contains a request or response message in the format shown in 1087 Figure 14. 1089 0 1 2 3 1090 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1092 | Field Type | Length | 1093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1094 . . 1095 . Value . 1096 . . 1097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1098 | Padding (as needed) | 1099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1101 Figure 14: Extension Field Format 1103 All extension fields are zero-padded to a word (4 octets) boundary. 1104 The Field Type field is specific to the defined function and is not 1105 elaborated here. While the minimum field length containing required 1106 fields is 4 words (16 octets), a maximum field length remains to be 1107 established. 1109 The Length field is a 16-bit unsigned integer which indicates the 1110 length of the entire extension field in octets, including the Padding 1111 field. 1113 8. On Wire Protocol 1115 The heart of the NTP on-wire protocol is the core mechanism which 1116 exchanges time values between servers, peers and clients. It is 1117 inherently resistant to lost or duplicate packets. Data integrity is 1118 provided by the IP and UDP checksums. No flow control or 1119 retransmission facilities are provided or necessary. The protocol 1120 uses timestamps, either extracted from packet headers or struck from 1121 the system clock upon the arrival or departure of a packet. 1122 Timestamps are precision data and should be restruck in case of link 1123 level retransmission and corrected for the time to compute a MAC on 1124 transmit. 1126 NTP messages make use of two different communication modes, one-to- 1127 one and one-to-many, commonly referred to as unicast and broadcast. 1128 For the purposes of this document, the term broadcast is interpreted 1129 as any available one-to-many mechanism. For IPv4 this equates to 1130 either IPv4 broadcast or IPv4 multicast. For IPv6 this equates to 1131 IPv6 multicast. For this purpose, IANA has allocated the IPv4 1132 multicast address 224.0.1.1 and the IPv6 multicast address ending 1133 :101, with prefix determined by scoping rules. Any other non- 1134 allocated multicast address may also be used in addition to these 1135 allocated multicast addresses. 1137 The on-wire protocol uses four timestamps numbered t1 through t4 and 1138 three state variables org, rec and xmt, as shown in Figure 15. This 1139 figure shows the most general case where each of two peers, A and B, 1140 independently measure the offset and delay relative to the other. 1141 For purposes of illustration the packet timestamps are shown in lower 1142 case, while the state variables are shown in upper case. The state 1143 variables are copied from the packet timestamps upon arrival or 1144 departure of a packet. 1146 t2 t3 t6 t7 1147 +---------+ +---------+ +---------+ +---------+ 1148 | 0 | | t1 | | t3 | | t5 | 1149 +---------+ +---------+ +---------+ +---------+ 1150 | 0 | | t2 | | t4 | | t6 | Packet 1151 +---------+ +---------+ +---------+ +---------+ Timestamps 1152 | t1 | |t3=clock | | t5 | |t7=clock | 1153 +---------+ +---------+ +---------+ +---------+ 1154 |t2=clock | |t6=clock | 1155 +---------+ +---------+ 1156 Peer B 1157 +---------+ +---------+ +---------+ +---------+ 1158 org | T1 | | T1 | | t5<>T1? | | T5 | 1159 +---------+ +---------+ +---------+ +---------+ State 1160 rec | T2 | | T2 | | T6 | | T6 | Variables 1161 +---------+ +---------+ +---------+ +---------+ 1162 xmt | 0 | | T3 | | t3=T3? | | T7 | 1163 +---------+ +---------+ +---------+ +---------+ 1165 t2 t3 t6 t7 1166 --------------------------------------------------------- 1167 /\ \ /\ \ 1168 / \ / \ 1169 / \ / \ 1170 / \/ / \/ 1171 --------------------------------------------------------- 1172 t1 t4 t5 t8 1174 t1 t4 t5 t8 1175 +---------+ +---------+ +---------+ +---------+ 1176 | 0 | | t1 | | t3 | | t5 | 1177 +---------+ +---------+ +---------+ +---------+ 1178 | 0 | | t2 | | t4 | | t6 | Packet 1179 +---------+ +---------+ +---------+ +---------+ Timestamps 1180 |t1=clock | | t3 | |t5=clock | | t7 | 1181 +---------+ +---------+ +---------+ +---------+ 1182 |t4=clock | |t8=clock | 1183 +---------+ +---------+ 1184 Peer A 1185 +---------+ +---------+ +---------+ +---------+ 1186 org | 0 | | t3<>0? | | T3 | | t7<>T3? | 1187 +---------+ +---------+ +---------+ +---------+ State 1188 rec | 0 | | T4 | | T4 | | T8 | Variables 1189 +---------+ +---------+ +---------+ +---------+ 1190 xmt | T1 | | t1=T1? | | T5 | | t5=T5? | 1191 +---------+ +---------+ +---------+ +---------+ 1192 Figure 15: On-Wire Protocol 1194 In the figure the first packet transmitted by A contains only the 1195 origin timestamp t1, which is then copied to T1. B receives the 1196 packet at t2 and copies t1 to T1 and the receive timestamp t2 to T2. 1197 At this time or some time later at t3, B sends a packet to A 1198 containing t1 and t2 and in addition the transmit timestamp t3. All 1199 three timestamps are copied to the corresponding state variables. A 1200 receives the packet at t4 containing the three timestamps t1, t2 and 1201 t3 and in addition the destination timestamp t4. These four 1202 timestamps are used to compute the offset and delay of B relative to 1203 A, as described below. 1205 Before the xmt and org state variables are updated, two sanity checks 1206 are performed in order to protect against duplicate, bogus or 1207 replayed packets. In the exchange above, a packet is duplicate or 1208 replay if the transmit timestamp t3 in the packet matches the org 1209 state variable T3. A packet is bogus if the origin timestamp t1 in 1210 the packet does not match the xmt state variable T1. In either of 1211 these cases the state variables are updated, then the packet is 1212 discarded. To protect against replay of the last transmitted packet, 1213 the xmt state variable is set to zero immediately after a successful 1214 bogus check. 1216 The four most recent timestamps, T1 through T4, are used to compute 1217 the offset of B relative to A 1219 theta = T(B) - T(A) = 1/2 * [(T2-T1) + (T3-T4)] 1221 and the round trip delay 1223 delta = T(ABA) = (T4-T1) - (T3-T2). 1225 Note that the quantities within parentheses are computed from 64-bit 1226 unsigned timestamps and result in signed values with 63 significant 1227 bits plus sign. These values can represent dates from 68 years in 1228 the past to 68 years in the future. However, the offset and delay 1229 are computed as sums and differences of these values, which contain 1230 62 significant bits and two sign bits, so can represent unambiguous 1231 values from 34 years in the past to 34 years in the future. In other 1232 words, the time of the client must be set within 34 years of the 1233 server before the service is started. This is a fundamental 1234 limitation with 64-bit integer arithmetic. 1236 In implementations where floating double arithmetic is available, the 1237 first-order differences can be converted to floating double and the 1238 second-order sums and differences computed in that arithmetic. Since 1239 the second-order terms are typically very small relative to the 1240 timestamp magnitudes, there is no loss in significance, yet the 1241 unambiguous range is restored from 34 years to 68 years. 1243 In some scenarios where the initial frequency offset of the client is 1244 relatively large and the actual propagation time small, it is 1245 possible for the delay computation to becomes negative. For 1246 instance, if the frequency difference is 100 PPM and the interval 1247 T4-T1 is 64 s, the apparent delay is -6.4 ms. Since negative values 1248 are misleading in subsequent computations, the value of delta should 1249 be clamped not less than s.rho, where s.rho is the system precision 1250 described in Section 11.1, expressed in seconds. 1252 The discussion above assumes the most general case where two 1253 symmetric peers independently measure the offsets and delays between 1254 them. In the case of a stateless server, the protocol can be 1255 simplified. A stateless server copies T3 and T4 from the client 1256 packet to T1 and T2 of the server packet and tacks on the transmit 1257 timestamp T3 before sending it to the client. Additional details for 1258 filling in the remaining protocol fields are given in a Section 9 and 1259 following sections and in the appendix. 1261 Note that the on-wire protocol as described resists replay of a 1262 server response packet. However, it does not resist replay of the 1263 client request packet, which would result in a server reply packet 1264 with new values of T2 and T3 and result in incorrect offset and 1265 delay. This vulnerability can be avoided by setting the xmt state 1266 variable to zero after computing the offset and delay. 1268 9. Peer Process 1270 The process descriptions to follow include a listing of the important 1271 state variables followed by an overview of the process operations 1272 implemented as routines. Frequent reference is made to the skeleton 1273 in the appendix. The skeleton includes C-language fragments that 1274 describe the functions in more detail. It includes the parameters, 1275 variables and declarations necessary for a conforming NTPv4 1276 implementation. However, many additional variables and routines may 1277 be necessary in a working implementation. 1279 The peer process is called upon arrival of a server or peer packet. 1280 It runs the on-wire protocol to determine the clock offset and round 1281 trip delay and in addition computes statistics used by the system and 1282 poll processes. Peer variables are instantiated in the association 1283 data structure when the structure is initialized and updated by 1284 arriving packets. There is a peer process, poll process and 1285 association for each server. 1287 9.1. Peer Process Variables 1289 Figure 16, Figure 17, Figure 18 and Figure 19 summarize the common 1290 names, formula names and a short description of the peer variables. 1291 The common names and formula names are interchangeable; formula names 1292 are intended to increase readability of equations in this 1293 specification. Unless noted otherwise, all peer variables have 1294 assumed prefix p. 1296 +---------+----------+-----------------------+ 1297 | Name | Formula | Description | 1298 +---------+----------+-----------------------+ 1299 | srcaddr | srcaddr | source address | 1300 | srcport | srcport | source port | 1301 | dstaddr | dstaddr | destination address | 1302 | dstport | destport | destination port | 1303 | keyid | keyid | key identifier key ID | 1304 +---------+----------+-----------------------+ 1306 Figure 16: Peer Process Configuration Variables 1308 +-----------+------------+---------------------+ 1309 | Name | Formula | Description | 1310 +-----------+------------+---------------------+ 1311 | leap | leap | leap indicator | 1312 | version | version | version number | 1313 | mode | mode | mode | 1314 | stratum | stratum | stratum | 1315 | ppoll | ppoll | peer poll exponent | 1316 | rootdelay | delta_r | root delay | 1317 | rootdisp | epsilon_r | root dispersion | 1318 | refid | refid | reference ID | 1319 | reftime | reftime | reference timestamp | 1320 +-----------+------------+---------------------+ 1322 Figure 17: Peer Process Packet Variables 1324 +------+---------+--------------------+ 1325 | Name | Formula | Description | 1326 +------+---------+--------------------+ 1327 | org | T1 | origin timestamp | 1328 | rec | T2 | receive timestamp | 1329 | xmt | T3 | transmit timestamp | 1330 | t | t | packet time | 1331 +------+---------+--------------------+ 1333 Figure 18: Peer Process Timestamp Variables 1335 +--------+---------+-----------------+ 1336 | Name | Formula | Description | 1337 +--------+---------+-----------------+ 1338 | offset | theta | clock offset | 1339 | delay | delta | roundtrip delay | 1340 | disp | epsilon | dispersion | 1341 | jitter | psi | jitter | 1342 | filter | filter | clock filter | 1343 | tp | t_p | filter time | 1344 +--------+---------+-----------------+ 1346 Figure 19: Peer Process Statistics Variables 1348 The following configuration variables are normally initialized when 1349 the association is mobilized, either from a configuration file or 1350 upon the arrival of the first packet for an unknown association. 1352 srcaddr: IP address of the remote server or reference clock. This 1353 becomes the destination IP address in packets sent from this 1354 association. 1356 srcport: UDP port number of the server or reference clock. This 1357 becomes the destination port number in packets sent from this 1358 association. When operating in symmetric modes (1 and 2) this field 1359 must contain the NTP port number PORT (123) assigned by the IANA. In 1360 other modes it can contain any number consistent with local policy. 1362 dstaddr: IP address of the client. This becomes the source IP 1363 address in packets sent from this association. 1365 dstport: UDP port number of the client, ordinarily the NTP port 1366 number PORT (123) assigned by the IANA. This becomes the source port 1367 number in packets sent from this association. 1369 keyid: Symmetric key ID for the 128-bit MD5 key used to generate and 1370 verify the MAC. The client and server or peer can use different 1371 values, but they must map to the same key. 1373 The variables defined in Figure 17 are updated from the packet header 1374 as each packet arrives. They are interpreted in the same way as the 1375 packet variables of the same names. It is convenient for later 1376 processing to convert the NTP short format packet values r.rootdelay 1377 and r.rootdisp to floating doubles as peer variables. 1379 The variables defined in Figure 18 include the timestamps exchanged 1380 by the on-wire protocol in Section 8. The t variable is the seconds 1381 counter c.t associated with these values. The c.t variable is 1382 maintained by the clock adjust process described in Section 12. It 1383 counts the seconds since the service was started. The variables 1384 defined in Figure 19 include the statistics computed by the 1385 clock_filter() routine described in Section 10. The tp variable is 1386 the seconds counter associated with these values. 1388 9.2. Peer Process Operations 1390 The receive routine defines the process flow upon the arrival of a 1391 packet. An example is described by the receive() routine in 1392 Appendix A.5.1. There is no specific method required for access 1393 control, although it is recommended that implementations include such 1394 a scheme, which is similar to many others now in widespread use. The 1395 access() routine in Appendix A.5.4 describes a method of implementing 1396 access restrictions using an access control list (ACL). Format 1397 checks require correct field length and alignment, acceptable version 1398 number (1-4) and correct extension field syntax, if present. 1400 There is no specific requirement for authentication; however, if 1401 authentication is implemented, then the MD5 keyed hash algorithm 1402 described in [RFC1321] must be supported. 1404 Next, the association table is searched for matching source address 1405 and source port, for example using the find_assoc() routine in 1406 Appendix A.5.1. Figure 20 is a dispatch table where the columns 1407 correspond to the packet mode and rows correspond to the association 1408 mode. The intersection of the association and packet modes 1409 dispatches processing to one of the following steps. 1411 +------------------+---------------------------------------+ 1412 | | Packet Mode | 1413 +------------------+-------+-------+-------+-------+-------+ 1414 | Association Mode | 1 | 2 | 3 | 4 | 5 | 1415 +------------------+-------+-------+-------+-------+-------+ 1416 | No Association 0 | NEWPS | DSCRD | FXMIT | MANY | NEWBC | 1417 | Symm. Active 1 | PROC | PROC | DSCRD | DSCRD | DSCRD | 1418 | Symm. Passive 2 | PROC | ERR | DSCRD | DSCRD | DSCRD | 1419 | Client 3 | DSCRD | DSCRD | DSCRD | PROC | DSCRD | 1420 | Server 4 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1421 | Broadcast 5 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1422 | Bcast Client 6 | DSCRD | DSCRD | DSCRD | DSCRD | PROC | 1423 +------------------+-------+-------+-------+-------+-------+ 1425 Figure 20: Peer Dispatch Table 1427 DSCRD. This indicates a nonfatal violation of protocol as the result 1428 of a programming error, long delayed packet or replayed packet. The 1429 peer process discards the packet and exits. 1431 ERR. This indicates a fatal violation of protocol as the result of a 1432 programming error, long delayed packet or replayed packet. The peer 1433 process discards the packet, demobilizes the symmetric passive 1434 association and exits. 1436 FXMIT. This indicates a client (mode 3) packet matching no 1437 association (mode 0). If the destination address is not a broadcast 1438 address, the server constructs a server (mode 4) packet and returns 1439 it to the client without retaining state. The server packet header 1440 is constructed. An example is described by the fast_xmit() routine 1441 in Appendix A.5.3. The packet header is assembled from the receive 1442 packet and system variables as shown in Figure 21. If the 1443 s.rootdelay and s.rootdisp system variables are stored in floating 1444 double, they must be converted to NTP short format first. 1446 +-----------------------------------+ 1447 | Packet Variable --> Variable | 1448 +-----------------------------------+ 1449 | r.leap --> p.leap | 1450 | r.mode --> p.mode | 1451 | r.stratum --> p.stratum | 1452 | r.poll --> p.ppoll | 1453 | r.rootdelay --> p.rootdelay | 1454 | r.rootdisp --> p.rootdisp | 1455 | r.refid --> p.refid | 1456 | r.reftime --> p.reftime | 1457 | r.keyid --> p.keyid | 1458 +-----------------------------------+ 1460 Figure 21: Receive Packet Header 1462 Note that, if authentication fails, the server returns a special 1463 message called a crypto-NAK. This message includes the normal NTP 1464 header data shown in Figure 8, but with a MAC consisting of four 1465 octets of zeros. The client MAY accept or reject the data in the 1466 message. After these actions the peer process exits. 1468 If the destination address is a multicast address, the sender is 1469 operating in manycast client mode. If the packet is valid and the 1470 server stratum is less than the client stratum, the server sends an 1471 ordinary server (mode 4) packet, but using its unicast destination 1472 address. A crypto-NAK is not sent if authentication fails. After 1473 these actions the peer process exits. 1475 MANY: This indicates a server (mode 4) packet matching no 1476 association. Ordinarily, this can happen only as the result of a 1477 manycast server reply to a previously sent multicast client packet. 1478 If the packet is valid, an ordinary client (mode 3) association is 1479 mobilized and operation continues as if the association was mobilized 1480 by the configuration file. 1482 NEWBC. This indicates a broadcast (mode 5) packet matching no 1483 association. The client mobilizes either a client (mode 3) or 1484 broadcast client (mode 6) association. Examples are shown in the 1485 mobilize() and clear() routines in Appendix A.2. Then the packet is 1486 validated and the peer variables initialized. An example is provided 1487 by the packet() routine in Appendix A.5.1.1. 1489 If the implementation supports no additional security or calibration 1490 functions, the association mode is set to broadcast client (mode 6) 1491 and the peer process exits. Implementations supporting public key 1492 authentication MAY run the Autokey or equivalent security protocol. 1493 Implementations SHOULD set the association mode to 3 and run a short 1494 client/server exchange to determine the propagation delay. Following 1495 the exchange the association mode is set to 6 and the peer process 1496 continues in listen-only mode. Note the distinction between a mode-6 1497 packet, which is reserved for the NTP monitor and control functions, 1498 and a mode-6 association. 1500 NEWPS. This indicates a symmetric active (mode 1) packet matching no 1501 association. The client mobilizes a symmetric passive (mode 2) 1502 association. An example is shown in the mobilize() and clear() 1503 routines in Appendix A.2. Processing continues in the PROC section 1504 below. 1506 PROC. This indicates a packet matching an existing association. The 1507 packet timestamps are carefully checked to avoid invalid, duplicate 1508 or bogus packets. Additional checks are summarized in Figure 22. 1509 Note that all packets, including a crypto-NAK, are considered valid 1510 only if they survive these tests. 1512 +--------------------------+----------------------------------------+ 1513 | Packet Type | Description | 1514 +--------------------------+----------------------------------------+ 1515 | 1 duplicate packet | The packet is at best an old duplicate | 1516 | | or at worst a replay by a hacker. | 1517 | | This can happen in symmetric modes if | 1518 | | the poll intervals are uneven. | 1519 | 2 bogus packet | | 1520 | 3 invalid | One or more timestamp fields are | 1521 | | invalid. This normally happens in | 1522 | | symmetric modes when one peer sends | 1523 | | the first packet to the other and | 1524 | | before the other has received its | 1525 | | first reply. | 1526 | 4 access denied | The access controls have blacklisted | 1527 | | the source. | 1528 | 5 authentication failure | The cryptographic message digest does | 1529 | | not match the MAC. | 1530 | 6 unsynchronized | The server is not synchronized to a | 1531 | | valid source. | 1532 | 7 bad header data | One or more header fields are invalid. | 1533 +--------------------------+----------------------------------------+ 1535 Figure 22: Packet Error Checks 1537 Processing continues by coping the packet variables to the peer 1538 variables as shown in Figure 21. An example is described by the 1539 packet() routine in Appendix A.5.1.1. The receive() routine 1540 implements tests 1-5 in Figure 22; the packet() routine implements 1541 tests 6-7. If errors are found the packet is discarded and the peer 1542 process exits. 1544 The on-wire protocol calculates the clock offset theta and round trip 1545 delay delta from the four most recent timestamps as described in 1546 Section 8. While it is in principle possible to do all calculations 1547 except the first-order timestamp differences in fixed-point 1548 arithmetic, it is much easier to convert the first-order differences 1549 to floating doubles and do the remaining calculations in that 1550 arithmetic, and this will be assumed in the following description. 1552 Next, the 8-bit p.reach shift register in the poll process described 1553 in Section 13 is used to determine whether the server is reachable 1554 and the data are fresh. The register is shifted left by one bit when 1555 a packet is sent and the rightmost bit is set to zero. As valid 1556 packets arrive, the rightmost bit is set to one. If the register 1557 contains any nonzero bits, the server is considered reachable; 1558 otherwise, it is unreachable. Since the peer poll interval might 1559 have changed since the last packet, the host poll interval is 1560 reviewed. An example is provided by the poll_update() routine in 1561 Appendix A.5.7.2. 1563 The dispersion statistic epsilon(t) represents the maximum error due 1564 to the frequency tolerance and time since the last packet was sent. 1565 It is initialized 1567 epsilon(t_0) = r.rho + s.rho + PHI * (T4-T1) 1569 when the measurement is made at t_0 according to the seconds counter. 1570 Here r.rho is the packet precision described in Section 7.3 and s.rho 1571 is the system precision described in Section 11.1, both expressed in 1572 seconds. These terms are necessary to account for the uncertainty in 1573 reading the system clock in both the server and the client. 1575 The dispersion then grows at constant rate PHI; in other words, at 1576 time t, epsilon(t) = epsilon(t_0) + PHI * (t-t_0). With the default 1577 value PHI = 15 PPM, this amounts to about 1.3 s per day. With this 1578 understanding, the argument t will be dropped and the dispersion 1579 represented simply as epsilon. The remaining statistics are computed 1580 by the clock filter algorithm described in the next section. 1582 10. Clock Filter Algorithm 1584 The clock filter algorithm is part of the peer process. It grooms 1585 the stream of on-wire data to select the samples most likely to 1586 represent accurate time. The algorithm produces the variables shown 1587 in Figure 19, including the offset (theta), delay (delta), dispersion 1588 (epsilon), jitter (psi) and time of arrival (t). These data are used 1589 by the mitigation algorithms to determine the best and final offset 1590 used to discipline the system clock. They are also used to determine 1591 the server health and whether it is suitable for synchronization. 1593 The clock filter algorithm saves the most recent sample tuples 1594 (theta, delta, epsilon, t) in the filter structure, which functions 1595 as an 8-stage shift register. The tuples are saved in the order that 1596 packets arrive. Here t is the packet time of arrival according to 1597 the seconds counter and should not be confused with the peer variable 1598 tp. 1600 The following scheme is used to insure sufficient samples are in the 1601 filter and that old stale data are discarded. Initially, the tuples 1602 of all stages are set to the dummy tuple (0, MAXDISP, MAXDISP, 0). 1603 As valid packets arrive, tuples are shifted into the filter causing 1604 old tuples to be discarded, so eventually only valid tuples remain. 1605 If the three low order bits of the reach register are zero, 1606 indicating three poll intervals have expired with no valid packets 1607 received, the poll process calls the clock filter algorithm with a 1608 dummy tuple just as if the tuple had arrived from the network. If 1609 this persists for eight poll intervals, the register returns to the 1610 initial condition. 1612 In the next step the shift register stages are copied to a temporary 1613 list and the list sorted by increasing delta. Let i index the stages 1614 starting with the lowest delta. If the first tuple epoch t_0 is not 1615 later than the last valid sample epoch tp, the routine exits without 1616 affecting the current peer variables. Otherwise, let epsilon_i be 1617 the dispersion of the ith entry, then 1618 i=n-1 1619 --- epsilon_i 1620 epsilon = \ ---------- 1621 / (i+1) 1622 --- 2 1623 i=0 1625 is the peer dispersion p.disp. Note the overload of epsilon, whether 1626 input to the clock filter or output, the meaning should be clear from 1627 context. 1629 The observer should note (a) if all stages contain the dummy tuple 1630 with dispersion MAXDISP, the computed dispersion is a little less 1631 than 16 s, (b) each time a valid tuple is shifted into the register, 1632 the dispersion drops by a little less than half, depending on the 1633 valid tuples dispersion, (c) after the fourth valid packet the 1634 dispersion is usually a little less than 1 s, which is the assumed 1635 value of the MAXDIST parameter used by the selection algorithm to 1636 determine whether the peer variables are acceptable or not. 1638 Let the first stage offset in the sorted list be theta_0; then, for 1639 the other stages in any order, the jitter is the RMS average 1640 +----- -----+^1/2 1641 | n-1 | 1642 | --- | 1643 1 | \ 2 | 1644 psi = -------- * | / (theta_0-theta_j) | 1645 (n-1) | --- | 1646 | j=1 | 1647 +----- -----+ 1649 where n is the number of valid tuples in the filter (n > 1). In 1650 order to insure consistency and avoid divide exceptions in other 1651 computations, the psi is bounded from below by the system precision 1652 s.rho expressed in seconds. While not in general considered a major 1653 factor in ranking server quality, jitter is a valuable indicator of 1654 fundamental timekeeping performance and network congestion state. Of 1655 particular importance to the mitigation algorithms is the peer 1656 synchronization distance, which is computed from the delay and 1657 dispersion. 1659 lambda = (delta / 2) + epsilon. 1661 Note that epsilon and therefore lambda increase at rate PHI. The 1662 lambda is not a state variable, since lambda is recalculated at each 1663 use. It is a component of the root synchronization distance used by 1664 the mitigation algorithms as a metric to evaluate the quality of time 1665 available from each server. 1667 It is important to note that, unlike NTPv3, NTPv4 associations do not 1668 show a timeout condition by setting the stratum to 16 and leap 1669 indicator to 3. The association variables retain the values 1670 determined upon arrival of the last packet. In NTPv4 lambda 1671 increases with time, so eventually the synchronization distance 1672 exceeds the distance threshold MAXDIST, in which case the association 1673 is considered unfit for synchronization. 1675 An example implementation of the clock filter algorithm is shown in 1676 the clock_filter() routine of Appendix A.5.2. 1678 11. System Process 1680 As each new sample (theta, delta, epsilon, jitter, t) is produced by 1681 the clock filter algorithm, all peer processes are scanned by the 1682 mitigation algorithms consisting of the selection, cluster, combine 1683 and clock discipline algorithms in the system process. The selection 1684 algorithm scans all associations and casts off the falsetickers, 1685 which have demonstrably incorrect time, leaving the truechimers as 1686 result. In a series of rounds the cluster algorithm discards the 1687 association statistically furthest from the centroid until a 1688 specified minimum number of survivors remain. The combine algorithm 1689 produces the best and final statistics on a weighted average basis. 1690 The final offset is passed to the clock discipline algorithm to steer 1691 the system clock to the correct time. 1693 The cluster algorithm selects one of the survivors as the system 1694 peer. The associated statistics (theta, delta, epsilon, jitter, t) 1695 are used to construct the system variables inherited by dependent 1696 servers and clients and made available to other applications running 1697 on the same machine. 1699 11.1. System Process Variables 1701 Figure 23 summarizes the common names, formula names and a short 1702 description of each system variable. Unless noted otherwise, all 1703 variables have assumed prefix s. 1705 +-----------+------------+------------------------+ 1706 | Name | Formula | Description | 1707 +-----------+------------+------------------------+ 1708 | t | t | update time | 1709 | p | p | system peer identifier | 1710 | leap | leap | leap indicator | 1711 | stratum | stratum | stratum | 1712 | precision | rho | precision | 1713 | offset | THETA | combined offset | 1714 | jitter | PSI | combined jitter | 1715 | rootdelay | DELTA | root delay | 1716 | rootdisp | EPSILON | root dispersion | 1717 | v | v | survivor list | 1718 | refid | refid | reference ID | 1719 | reftime | reftime | reference time | 1720 | NMIN | 3 | minimum survivors | 1721 | CMIN | 1 | minimum candidates | 1722 +-----------+------------+------------------------+ 1724 Figure 23: System Process Variables 1726 Except for the t, p, offset and jitter variables and the NMIN and 1727 CMIN constants, the variables have the same format and interpretation 1728 as the peer variables of the same name. The NMIN and CMIN parameters 1729 are used by the selection and cluster algorithms described in the 1730 next section. 1732 The t variable is the seconds counter at the time of the last update. 1733 An example is shown by the clock_update() routine in 1734 Appendix A.5.5.4. The p variable is the system peer identifier 1735 determined by the cluster() routine in Section 11.2.2. The precision 1736 variable has the same format as the packet variable of the same name. 1737 The precision is defined as the larger of the resolution and time to 1738 read the clock, in log2 units. For instance, the precision of a 1739 mains-frequency clock incrementing at 60 Hz is 16 ms, even when the 1740 system clock hardware representation is to the nanosecond. 1742 The offset and jitter variables are determined by the combine 1743 algorithm in Section 11.2.3. These values represent the best and 1744 final offset and jitter used to discipline the system clock. 1745 Initially, all variables are cleared to zero, then the leap is set to 1746 3 (unsynchronized) and stratum is set to MAXSTRAT (16). Remember 1747 that MAXSTRAT is mapped to zero in the transmitted packet. 1749 11.2. System Process Operations 1751 Figure 24 summarizes the system process operations performed by the 1752 clock select routine. The selection algorithm described in 1753 Section 11.2.1 produces a majority clique of presumed correct 1754 candidates (truechimers) based on agreement principles. The cluster 1755 algorithm described in Section 11.2.2 discards outlyers to produce 1756 the most accurate survivors. The combine algorithm described in 1757 Section 11.2.3 provides the best and final offset for the clock 1758 discipline algorithm. An example is described in Appendix A.5.5.6. 1759 If the selection algorithm cannot produce a majority clique, or if it 1760 cannot produce at least CMIN survivors, the system process exits 1761 without disciplining the system clock. If successful, the cluster 1762 algorithm selects the statistically best candidate as the system peer 1763 and its variables are inherited as the system variables. 1765 +-----------------+ 1766 | clock_select() | 1767 +-----------------+ 1768 ................................|........... 1769 . V . 1770 . yes +---------+ +-----------------+ . 1771 . +--| accept? | | scan candidates | . 1772 . | +---------+ | | . 1773 . V no | | | . 1774 . +---------+ | | | . 1775 . | add peer| | | | . 1776 . +---------- | | | . 1777 . | V | | . 1778 . +---------->-->| | . 1779 . | | . 1780 . Selection Algorithm +-----------------+ . 1781 .................................|.......... 1782 V 1783 no +-------------------+ 1784 +-------------| survivors? | 1785 | +-------------------+ 1786 | | yes 1787 | V 1788 | +-------------------+ 1789 | | Cluster Algorithm | 1790 | +-------------------+ 1791 | | 1792 | V 1793 V yes +-------------------+ 1794 |<------------| n < CMIN? | 1795 | +-------------------+ 1796 V | 1797 +-----------------+ V no 1798 | s.p = NULL | +-------------------+ 1799 +-----------------+ | s.p = v_0.p | 1800 | +-------------------+ 1801 V | 1802 +-----------------+ V 1803 | return (UNSYNC) | +-------------------+ 1804 +-----------------+ | return (SYNC) | 1805 +-------------------+ 1807 Figure 24: Clock Select Routine 1809 11.2.1. Selection Algorithm 1811 Note that the selection and cluster algorithms are described 1812 separately, but combined in the code skeleton. The selection 1813 algorithm operates to find an intersection interval containing a 1814 majority clique of truechimers using Byzantine agreement principles 1815 originally proposed by Marzullo [ref6], but modified to improve 1816 accuracy. An overview of the algorithm is given below and described 1817 in the first half of the clock_select() routine in Appendix A.5.5.1. 1819 First, those servers which are unusable according to the rules of the 1820 protocol are detected and discarded as shown by the accept() routine 1821 in Appendix A.5.5.3. Next, a set of tuples (p, type, edge) is 1822 generated for the remaining candidates. Here, p is the association 1823 identifier and type identifies the upper (+1), middle (0) and lower 1824 (-1) endpoints of a correctness interval centered on theta for that 1825 candidate. This results in three tuples, lowpoint (p, -1, theta - 1826 lambda), midpoint (p, 0, theta) and highpoint (p, +1, theta + 1827 lambda), where lambda is the root synchronization distance. An 1828 example of this calculation is shown by the rootdist() routine in 1829 Appendix A.5.1.1. The steps of the algorithm are: 1831 1. For each of m associations, place three tuples as defined above 1832 on the candidate list. 1834 2. Sort the tuples on the list by the edge component. Order the 1835 lowpoint, midpoint and highpoint of these intervals from lowest to 1836 highest. Set the number of falsetickers f = 0. 1838 3. Set the number of midpoints d = 0. Set c = 0. Scan from lowest 1839 endpoint to highest. Add one to c for every lowpoint, subtract one 1840 for every highpoint, add one to d for every midpoint. If c >= m - f, 1841 stop; set l = current lowpoint. 1843 4. Set c = 0. Scan from highest endpoint to lowest. Add one to c 1844 for every highpoint, subtract one for every lowpoint, add one to d 1845 for every midpoint. If c >= m - f, stop; set u = current highpoint. 1847 5. Is d = f and l < u? If yes, then follow step 5A; else, follow 1848 step 5B. 1850 5A. Success: the intersection interval is [l, u]. 1852 5B. Add one to f. Is f < (m / 2)? If yes, then go to step 3 again. 1853 If no, then go to step 6. 1855 6. Failure; a majority clique could not be found. There are no 1856 suitable candidates to discipline the system clock. 1858 The algorithm is described in detail in Appendix A.5.5.1. Note that 1859 it starts with the assumption that there are no falsetickers (f = 0) 1860 and attempts to find a nonempty intersection interval containing the 1861 midpoints of all correct servers, i.e., truechimers. If a nonempty 1862 interval cannot be found, it increases the number of assumed 1863 falsetickers by one and tries again. If a nonempty interval is found 1864 and the number of falsetickers is less than the number of 1865 truechimers, a majority clique has been found and the midpoint of 1866 each truechimer (theta) represents the candidates available to the 1867 cluster algorithm. 1869 If a majority clique is not found, or if the number of truechimers is 1870 less than CMIN, there are insufficient candidates to discipline the 1871 system clock. CMIN defines the minimum number of servers consistent 1872 with the correctness requirements. Suspicious operators would set 1873 CMIN to insure multiple redundant servers are available for the 1874 algorithms to mitigate properly. However, for historic reasons the 1875 default value for CMIN is one. 1877 11.2.2. Cluster Algorithm 1879 The candidates of the majority clique are placed on the survivor list 1880 v in the form of tuples (p, theta_p, psi_p, lambda_p), where p is an 1881 association identifier, theta_p, psi_p, and stratum_p the current 1882 offset, jitter and stratum of association p, respectively, and 1883 lambda_p is a merit factor equal to stratum_p * MAXDIST + lambda, 1884 where lambda is the root synchronization distance for association p. 1885 The list is processed by the cluster algorithm below. An example is 1886 shown by the second half of the clock_select() algorithm in 1887 Appendix A.5.5.1. 1889 1. Let (p, theta_p, psi_p, lambda_p) represent a survivor candidate. 1891 2. Sort the candidates by increasing lambda_p. Let n be the number 1892 of candidates and NMIN the minimum required number of survivors. 1894 3. For each candidate compute the selection jitter psi_s: 1896 +----- -----+^1/2 1897 | n-1 | 1898 | --- | 1899 | 1 \ 2 | 1900 psi_s = | ---- * / (theta_s - theta_j) | 1901 | n-1 --- | 1902 | j=1 | 1903 +----- -----+ 1905 4. Select psi_max as the candidate with maximum psi_s. 1907 5. Select psi_min as the candidate with minimum psi_p. 1909 6. Is psi_max < psi_min or n <= NMIN? If yes, follow step 6A; 1910 otherwise, follow step 6B. 1912 6A. Done. The remaining candidates on the survivor list are ranked 1913 in the order of preference. The first entry on the list represents 1914 the system peer; its variables are used later to update the system 1915 variables. 1917 6B. Delete the outlyer candidate with psi_max; reduce n by one and go 1918 back to step 3. 1920 The algorithm operates in a series of rounds where each round 1921 discards the statistical outlyer with maximum selection jitter psi_s. 1922 However, if psi_s is less than the minimum peer jitter psi_p, no 1923 improvement is possible by discarding outlyers. This and the minimum 1924 number of survivors represent the terminating conditions of the 1925 algorithm. Upon termination, the final value of psi_max is saved as 1926 the system selection jitter PSI_s for use later. 1928 11.2.3. Combine Algorithm 1930 The clock combine route processes the remaining survivors to produce 1931 the best and final data for the clock discipline algorithm. The 1932 routine processes peer offset and jitter statistics to produce the 1933 combined system offset THETA and system peer jitter PSI_p, where each 1934 server statistic is weighted by the reciprocal of the root 1935 synchronization distance and the result normalized. An example is 1936 shown by the clock_combine() routine in Appendix A.5.5.5 1938 The combined THETA is passed to the clock update routine. The first 1939 candidate on the survivor list is nominated as the system peer with 1940 identifier p. The system peer jitter PSI_p is a component of the 1941 system jitter PSI. It is used along with the selection jitter PSI_s 1942 to produce the system jitter: 1944 PSI = [(PSI_s)^2 + (PSI_p)^2]^1/2 1946 Each time an update is received from the system peer, the clock 1947 update routine is called. By rule, an update is discarded if its 1948 time of arrival p.t is not strictly later than the last update used 1949 s.t. The labels IGNOR, PANIC, ADJ and STEP refer to return codes 1950 from the local clock routine described in the next section. 1952 IGNORE means the update has been ignored as an outlyer. PANIC means 1953 the offset is greater than the panic threshold PANICT (1000 s) and 1954 SHOULD cause the program to exit with a diagnostic message to the 1955 system log. STEP means the offset is less than the panic threshold, 1956 but greater than the step threshold STEPT (125 ms). In this case the 1957 clock is stepped to the correct offset, but since this means all peer 1958 data have been invalidated, all associations MUST be reset and the 1959 client begins as at initial start. 1961 ADJ means the offset is less than the step threshold and thus a valid 1962 update. In this case the system variables are updated from the peer 1963 variables as shown in Figure 25. 1965 +-------------------------------------------+ 1966 | System Variable <-- System Peer Variable | | 1967 +-------------------------------------------+ 1968 | s.leap <-- p.leap | 1969 | s.stratum <-- p.stratum + 1 | 1970 | s.offset <-- THETA | 1971 | s.jitter <-- PSI | 1972 | s.rootdelay <-- p.delta_r + delta | 1973 | s.rootdisp <-- p.epsilon_r + p.epsilon + | 1974 | p.psi + PHI * (s.t - p.t) | 1975 | + |THETA| | 1976 | s.refid <-- p.refid | 1977 | s.reftime <-- p.reftime | 1978 | s.t <-- p.t | 1979 +-------------------------------------------+ 1981 Figure 25: System Variables Update 1983 There is an important detail not shown. The dispersion increment 1984 (p.epsilon + p.psi + PHI * (s.t - p.t) + |THETA|) is bounded from 1985 below by MINDISP. In subnets with very fast processors and networks 1986 and very small delay and dispersion this forces a monotone-definite 1987 increase in s.rootdisp (EPSILON), which avoids loops between peers 1988 operating at the same stratum. 1990 The system variables are available to dependent application programs 1991 as nominal performance statistics. The system offset THETA is the 1992 clock offset relative to the available synchronization sources. The 1993 system jitter PSI is an estimate of the error in determining this 1994 value, elsewhere called the expected error. The root delay DELTA is 1995 the total round trip delay relative to the primary server. The root 1996 dispersion EPSILON is the dispersion accumulated over the network 1997 from the primary server. Finally, the root synchronization distance 1998 is defined 2000 LAMBDA = EPSILON + DELTA / 2, 2002 which represents the maximum error due all causes and is designated 2003 the root synchronization distance. 2005 An example of the clock update routine is provided 2006 inAppendix A.5.5.4. 2008 11.3. Clock Discipline Algorithm 2010 The NTPv4 clock discipline algorithm, shortened to discipline in the 2011 following, functions as a combination of two philosophically quite 2012 different feedback control systems. In a phase-locked loop (PLL) 2013 design, periodic phase updates at update intervals mu seconds are 2014 used directly to minimize the time error and indirectly the frequency 2015 error. In a frequency-locked loop (FLL) design, periodic frequency 2016 updates at intervals mu are used directly to minimize the frequency 2017 error and indirectly the time error. As shown in [ref7], a PLL 2018 usually works better when network jitter dominates, while a FLL works 2019 better when oscillator wander dominates. This section contains an 2020 outline of how the NTPv4 design works. An in-depth discussion of the 2021 design principles is provided in [ref7], which also includes a 2022 performance analysis. 2024 The discipline is implemented as the feedback control system shown in 2025 Figure 26. The variable theta_r represents the combine algorithm 2026 offset (reference phase) and theta_c the VFO offset (control phase). 2027 Each update produces a signal V_d representing the instantaneous 2028 phase difference theta_r - theta_c. The clock filter for each server 2029 functions as a tapped delay line, with the output taken at the tap 2030 selected by the clock filter algorithm. The selection, cluster and 2031 combine algorithms combine the data from multiple filters to produce 2032 the signal V_s. The loop filter, with impulse response F(t), 2033 produces the signal V_c which controls the VFO frequency omega_c and 2034 thus the integral of the phase theta_c which closes the loop. The 2035 V_c signal is generated by the clock adjust process in Section 12. 2036 The detailed equations that implement these functions are best 2037 presented in the routines of Appendix A.5.5.6 and Appendix A.5.6.1. 2039 theta_r + +---------\ +----------------+ 2040 NTP --------->| Phase \ V_d | | V_s 2041 theta_c - | Detector ------>| Clock Filter |----+ 2042 +-------->| / | | | 2043 | +---------/ +----------------+ | 2044 | | 2045 ----- | 2046 / \ | 2047 | VFO | | 2048 \ / | 2049 ----- ....................................... | 2050 ^ . Loop Filter . | 2051 | . +---------+ x +-------------+ . | 2052 | V_c . | |<-----| | . | 2053 +------.-| Clock | y | Phase/Freq |<---------+ 2054 . | Adjust |<-----| Prediction | . 2055 . | | | | . 2056 . +---------+ +-------------+ . 2057 ....................................... 2059 Figure 26: Clock Discipline Feedback Loop 2061 Ordinarily, the pseudo-linear feedback loop described above operates 2062 to discipline the system clock. However, there are cases where a 2063 nonlinear algorithm offers considerable improvement. One case is 2064 when the discipline starts without knowledge of the intrinsic clock 2065 frequency. The pseudo-linear loop takes several hours to develop an 2066 accurate measurement and during most of that time the poll interval 2067 cannot be increased. The nonlinear loop described below does this in 2068 15 minutes. Another case is when occasional bursts of large jitter 2069 are present due to congested network links. The state machine 2070 described below resists error bursts lasting less than 15 minutes. 2072 Figure 27 contains a summary of the variables and parameters 2073 including the variables (lower case) or parameters (upper case) name, 2074 formula name and short description. Unless noted otherwise, all 2075 variables have assumed prefix c. The variables t, tc, state, hyster 2076 and count are integers; the remaining variables are floating doubles. 2077 The function of each will be explained in the algorithm descriptions 2078 below. 2080 +--------+------------+--------------------------+ 2081 | Name | Formula | Description | 2082 +--------+------------+--------------------------+ 2083 | t | timer | seconds counter | 2084 | offset | theta | combined offset | 2085 | resid | theta_r | residual offset | 2086 | freq | phi | clock frequency | 2087 | jitter | psi | clock offset jitter | 2088 | wander | omega | clock frequency wander | 2089 | tc | tau | time constant (log2) | 2090 | state | state | state | 2091 | adj | adj | frequency adjustment | 2092 | hyster | hyster | hysteresis counter | 2093 | STEPT | 125 | step threshold (.125 s) | 2094 | WATCH | 900 | stepout thresh(s) | 2095 | PANICT | 1000 | panic threshold (1000 s) | 2096 | LIMIT | 30 | hysteresis limit | 2097 | PGATE | 4 | hysteresis gate | 2098 | TC | 16 | time constant scale | 2099 | AVG | 8 | averaging constant | 2100 +--------+------------+--------------------------+ 2102 Figure 27: Clock Discipline Variables and Parameters 2104 The process terminates immediately if the offset is greater than the 2105 panic threshold PANICT (1000 s). The state transition function is 2106 described by the rstclock() function in Appendix A.5.5.7. Figure 28 2107 shows the state transition function used by this routine. It has 2108 four columns showing respectively the state name, predicate and 2109 action if the offset theta is less than the step threshold, the 2110 predicate and actions otherwise, and finally some comments. 2112 +-------+---------------------+-------------------+--------------+ 2113 | State | theta < STEP | theta > STEP | Comments | 2114 +-------+---------------------+-------------------+--------------+ 2115 | NSET | ->FREQ | ->FREQ | no frequency | 2116 | | adjust time | step time | file | 2117 +-------+---------------------+-------------------+--------------+ 2118 | FSET | ->SYNC | ->SYNC | frequency | 2119 | | adjust time | step time | file | 2120 +-------+---------------------+-------------------+--------------+ 2121 | SPIK | ->SYNC | if < 900 s ->SPIK | outlyer | 2122 | | adjust freq | else ->SYNC | detected | 2123 | | adjust time | step freq | | 2124 | | | step time | | 2125 +-------+---------------------+-------------------+--------------+ 2126 | FREQ | if < 900 s ->FREQ | if < 900 s ->FREQ | initial | 2127 | | else ->SYNC | else ->SYNC | frequency | 2128 | | step freq | step freq | | 2129 | | adjust time | adjust time | | 2130 +-------+---------------------+-------------------+--------------+ 2131 | SYNC | ->SYNC | if < 900 s ->SPIK | normal | 2132 | | adjust freq | else ->SYNC | operation | 2133 | | adjust time | step freq | | 2134 | | | step time | | 2135 +-------+---------------------+-------------------+--------------+ 2137 Figure 28: State Transition Function 2139 In the table entries the next state is identified by the arrow -> 2140 with the actions listed below. Actions such as adjust time and 2141 adjust frequency are implemented by the PLL/FLL feedback loop in the 2142 local_clock() routine. A step clock action is implemented by setting 2143 the clock directly, but this is done only after the stepout threshold 2144 WATCH (900 s) when the offset is more than the step threshold STEPT 2145 (.125 s). This resists clock steps under conditions of extreme 2146 network congestion. 2148 The jitter (psi) and wander (omega) statistics are computed using an 2149 exponential average with weight factor AVG. The time constant 2150 exponent (tau) is determined by comparing psi with the magnitude of 2151 the current offset theta. If the offset is greater than PGATE (4) 2152 times the clock jitter, the hysteresis counter hyster is reduced by 2153 two; otherwise, it is increased by one. If hyster increases to the 2154 upper limit LIMIT (30), tau is increased by one; if it decreases to 2155 the lower limit -LIMIT (-30), tau is decreased by one. Normally, tau 2156 hovers near MAXPOLL, but quickly decreases if a temperature spike 2157 causes a frequency surge. 2159 12. Clock Adjust Process 2161 The actual clock adjustment process runs at one-second intervals to 2162 add the frequency correction and a fixed percentage of the residual 2163 offset theta_r. The theta_r is in effect the exponential decay of 2164 the theta value produced by the loop filter at each update. The TC 2165 parameter scales the time constant to match the poll interval for 2166 convenience. Note that the dispersion EPSILON increases by PHI at 2167 each second. 2169 The clock adjust process includes a timer interrupt facility driving 2170 the seconds counter c.t. It begins at zero when the service starts 2171 and increments once each second. At each interrupt the 2172 clock_adjust() routine is called to incorporate the clock discipline 2173 time and frequency adjustments, then the associations are scanned to 2174 determine if the seconds counter equals or exceeds the p.next state 2175 variable defined in the next section. If so, the poll process is 2176 called to send a packet and compute the next p.next value. 2178 An example of the clock adjustment process is shown by the 2179 clock_adjust() routine in Appendix A.5.6.1. 2181 13. Poll Process 2183 Each association supports a poll process that runs at regular 2184 intervals to construct and send packets in symmetric, client and 2185 broadcast server associations. It runs continuously, whether or not 2186 servers are reachable in order to manage the clock filter and reach 2187 register. 2189 13.1. Poll Process Variables 2191 Figure 29 summarizes the common names, formula names and a short 2192 description of the poll process variables(lower case) and parameters 2193 (upper case). Unless noted otherwise, all variables have assumed 2194 prefix p. 2196 +---------+---------+--------------------+ 2197 | Name | Formula | Description | 2198 +---------+---------+--------------------+ 2199 | hpoll | hpoll | host poll exponent | 2200 | last | last | last poll time | 2201 | next | next | next poll time | 2202 | reach | reach | reach register | 2203 | unreach | unreach | unreach counter | 2204 | UNREACH | 24 | unreach limit | 2205 | BCOUNT | 8 | burst count | 2206 | BURST | flag | burst enable | 2207 | IBURST | flag | iburst enable | 2208 +---------+---------+--------------------+ 2210 Figure 29: Poll Process Variables and Parameters 2212 The poll process variables are allocated in the association data 2213 structure along with the peer process variables. Following is a 2214 detailed description of the variables. The parameters will be called 2215 out in the following text. 2217 hpoll: signed integer representing the poll exponent, in log2 seconds 2219 last: integer representing the seconds counter when the most recent 2220 packet was sent 2222 next: integer representing the seconds counter when the next packet 2223 is to be sent 2225 reach: 8-bit integer shift register shared by the peer and poll 2226 processes 2228 unreach: integer representing the number of seconds the server has 2229 been unreachable 2231 13.2. Poll Process Operations 2233 As described previously, once each second the clock adjust process is 2234 called. This routine calls the poll routine for each association in 2235 turn. If the time for the next poll message is greater than the 2236 seconds counter, the routine returns immediately. Symmetric (modes 2237 1, 2), client (mode 3) and broadcast server (mode 5) associations 2238 routinely send packets. A broadcast client (mode 6) association runs 2239 the routine to update the reach and unreach variables, but does not 2240 send packets. The poll process calls the transmit process to send a 2241 packet. If in a burst (burst > 0), nothing further is done except 2242 call the poll update routine to set the next poll interval. 2244 If not in a burst, the reach variable is shifted left by one bit, 2245 with zero replacing the rightmost bit. If the server has not been 2246 heard for the last three poll intervals, the clock filter routine is 2247 called to increase the dispersion. An example is shown in 2248 Appendix A.5.7.3. 2250 If the BURST flag is lit and the server is reachable and a valid 2251 source of synchronization is available, the client sends a burst of 2252 BCOUNT (8) packets at each poll interval. The interval between 2253 packets in the burst is two seconds. This is useful to accurately 2254 measure jitter with long poll intervals. If the IBURST flag is lit 2255 and this is the first packet sent when the server has been 2256 unreachable, the client sends a burst. This is useful to quickly 2257 reduce the synchronization distance below the distance threshold and 2258 synchronize the clock. 2260 If the P_MANY flag is lit in the p.flags word of the association, 2261 this is a manycast client association. Manycast client associations 2262 send client mode packets to designated multicast group addresses at 2263 MINPOLL intervals. The association starts out with a TTL of 1. If 2264 by the time of the next poll there are fewer than MINCLOCK servers 2265 have been mobilized, the ttl is increased by one. If the ttl reaches 2266 the limit TTLMAX, without finding MINCLOCK servers, the poll interval 2267 increases until reaching BEACON, when it starts over from the 2268 beginning. 2270 The poll() routine includes a feature that backs off the poll 2271 interval if the server becomes unreachable. If reach is nonzero, the 2272 server is reachable and unreach is set to zero; otherwise, unreach is 2273 incremented by one for each poll to the maximum UNREACH. Thereafter 2274 for each poll hpoll is increased by one, which doubles the poll 2275 interval up to the maximum MAXPOLL determined by the poll_update() 2276 routine. When the server again becomes reachable, unreach is set to 2277 zero, hpoll is reset to the tc system variable and operation resumes 2278 normally. 2280 A packet is sent by the transmit process. Some header values are 2281 copied from the peer variables left by a previous packet and others 2282 from the system variables. Figure 30 shows which values are copied 2283 to each header field. In those implementations using floating double 2284 data types for root delay and root dispersion, these must be 2285 converted to NTP short format. All other fields are either copied 2286 intact from peer and system variables or struck as a timestamp from 2287 the system clock. 2289 +-----------------------------------+ 2290 | Packet Variable <-- Variable | 2291 +-----------------------------------+ 2292 | x.leap <-- s.leap | 2293 | x.version <-- s.version | 2294 | x.mode <-- s.mode | 2295 | x.stratum <-- s.stratum | 2296 | x.poll <-- s.poll | 2297 | x.precision <-- s.precision | 2298 | x.rootdelay <-- s.rootdelay | 2299 | x.rootdisp <-- s.rootdisp | 2300 | x.refid <-- s.refid | 2301 | x.reftime <-- s.reftime | 2302 | x.org <-- p.xmt | 2303 | x.rec <-- p.dst | 2304 | x.xmt <-- clock | 2305 | x.keyid <-- p.keyid | 2306 | x.digest <-- md5 digest | 2307 +-----------------------------------+ 2309 Figure 30: xmit_packet Packet Header 2311 The poll update routine is called when a valid packet is received and 2312 immediately after a poll message has been sent. If in a burst, the 2313 poll interval is fixed at 2 s; otherwise, the host poll exponent 2314 hpoll is set to the minimum of ppoll from the last packet received 2315 and hpoll from the poll routine, but not less than MINPOLL nor 2316 greater than MAXPOLL. Thus the clock discipline can be oversampled, 2317 but not undersampled. This is necessary to preserve subnet dynamic 2318 behavior and protect against protocol errors. 2320 The poll exponent is converted to an interval which when added to the 2321 last poll time variable determines the value of the next poll time 2322 variable. Finally, the last poll time variable is set to the current 2323 seconds counter. 2325 14. Simple Network Time Protocol (SNTP) 2327 Primary servers and clients complying with a subset of NTP, called 2328 the Simple Network Time Protocol (SNTPv4) [RFC4330], do not need to 2329 implement the mitigation algorithms described in Section 9 and 2330 following sections. SNTP is intended for primary servers equipped 2331 with a single reference clock, as well as for clients with a single 2332 upstream server and no dependent clients. The fully developed NTPv4 2333 implementation is intended for secondary servers with multiple 2334 upstream servers and multiple downstream servers or clients. Other 2335 than these considerations, NTP and SNTP servers and clients are 2336 completely interoperable and can be intermixed in NTP subnets. 2338 An SNTP primary server implementing the on-wire protocol described in 2339 Section 8 has no upstream servers except a single reference clock. 2340 In principle, it is indistinguishable from an NTP primary server that 2341 has the mitigation algorithms and therefore capable of mitigating 2342 between multiple reference clocks. 2344 Upon receiving a client request, an SNTP primary server constructs 2345 and sends the reply packet as described in Figure 31. Note that the 2346 dispersion field in the packet header must be updated as described in 2347 Section 5. 2349 +-----------------------------------+ 2350 | Packet Variable <-- Variable | 2351 +-----------------------------------+ 2352 | x.leap <-- s.leap | 2353 | x.version <-- r.version | 2354 | x.mode <-- 4 | 2355 | x.stratum <-- s.stratum | 2356 | x.poll <-- r.poll | 2357 | x.precision <-- s.precision | 2358 | x.rootdelay <-- s.rootdelay | 2359 | x.rootdisp <-- s.rootdisp | 2360 | x.refid <-- s.refid | 2361 | x.reftime <-- s.reftime | 2362 | x.org <-- r.xmt | 2363 | x.rec <-- r.dst | 2364 | x.xmt <-- clock | 2365 | x.keyid <-- r.keyid | 2366 | x.digest <-- md5 digest | 2367 +-----------------------------------+ 2369 Figure 31: fast_xmit Packet Header 2371 A SNTP client implementing the on-wire protocol has a single server 2372 and no dependent clients. It can operate with any subset of the NTP 2373 on-wire protocol, the simplest approach using only the transmit 2374 timestamp of the server packet and ignoring all other fields. 2375 However, the additional complexity to implement the full on-wire 2376 protocol is minimal so that a full implementation is encouraged. 2378 15. Security Considerations 2380 NTPv4 provides an optional authentication field that utilizes the MD5 2381 algorithm. MD5, as the case for SHA-1, is derived from MD4, which 2382 has long been known to be weak. In 2004, techniques for efficiently 2383 finding collisions in MD5 were announced. A summary of the 2384 appropriateness of MD5 can be found in [ref8]. 2386 In the case of NTP as specified herein, NTP broadcast clients are 2387 vulnerable to disruption by misbehaving or hostile SNTP or NTP 2388 broadcast servers elsewhere in the Internet. Access controls and/or 2389 cryptographic authentication means should be provided for additional 2390 security in such cases. 2392 Section 8 describes a potential security concern with the replay of 2393 client requests. Following the recommendations in that section 2394 provides protection against such attacks. 2396 16. IANA Considerations 2398 UDP/TCP Port 123 was previously assigned by IANA for this protocol. 2399 The IANA has assigned the IPv4 multicast group address 224.0.1.1 and 2400 the IPv6 multicast address ending :101 for NTP. This document 2401 introduces NTP extension fields allowing for the development of 2402 future extensions to the protocol, where a particular extension is to 2403 be identified by the Field Type sub-field within the extension field. 2404 IANA is requested to establish and maintain a registry for Extension 2405 Field Types associated with this protocol, populating this registry 2406 with no initial entries. As future needs arise, new Extension Field 2407 Types may be defined. Following the policies outlined in [RFC2434], 2408 new values are to be defined by IETF consensus. 2410 The IANA is requested to create a new registry for NTP Reference 2411 Identifier codes. This should include the current codes defined in 2412 Section 7.3, and may be extended on a FCFS basis. The format of the 2413 registry is: 2415 +------+----------------------------------------------------------+ 2416 | ID | Clock Source | 2417 +------+----------------------------------------------------------+ 2418 | GOES | Geosynchronous Orbit Environment Satellite | 2419 | GPS | Global Position System | 2420 | ... | ... | 2421 +------+----------------------------------------------------------+ 2423 Figure 32: Reference Identifier Codes 2425 The IANA is requested to create a new registry for NTP Kiss-o'-Death 2426 codes. This should include the current codes defined in Section 7.4, 2427 and may be extended on a FCFS basis. The format of the registry is: 2429 +------+------------------------------------------------------------+ 2430 | Code | Meaning | 2431 +------+------------------------------------------------------------+ 2432 | ACST | The association belongs to a unicast server | 2433 | AUTH | Server authentication failed | 2434 | ... | ... | 2435 +------+------------------------------------------------------------+ 2437 Figure 33: Kiss Codes 2439 For both Reference Identifiers and Kiss-o'-Death codes, IANA is 2440 requested to never assign a code beginning with the character "X", as 2441 this is reserved for experimentation and development. 2443 17. Acknowledgements 2445 The editors would like to thank Karen O'Donoghue, Brian Haberman, 2446 Greg Dowd, Mark Elliot, Harlan Stenn, Yaakov Stein, Stewart Bryant, 2447 and Danny Mayer for technical reviews and specific text contributions 2448 to this document. 2450 18. References 2452 18.1. Informative References 2454 [CGPM] Bureau International des Poids et Mesures, "Comptes Rendus 2455 de la 15e CGPM", 1976. 2457 [ITU-R_TF.460] 2458 International Telecommunications Union, "ITU-R TF.460 2459 Standard-frequency and time-signal emissions", 2460 February 2002. 2462 [RFC1305] Mills, D., "Network Time Protocol (Version 3) 2463 Specification, Implementation", RFC 1305, March 1992. 2465 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2466 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2467 October 1998. 2469 [RFC4330] Mills, D., "Simple Network Time Protocol (SNTP) Version 4 2470 for IPv4, IPv6 and OSI", RFC 4330, January 2006. 2472 [ref3] Mills, D.L., "The Autokey security architecture, protocol 2473 and algorithms. Electrical and Computer Engineering 2474 Technical Report 06-1-1", NDSS , January 2006. 2476 [ref6] Marzullo and S. Owicki, "Maintaining the time in a 2477 distributed system.", ACM Operating Systems Review 19 , 2478 July 1985. 2480 [ref7] Mills, D.L., "Computer Network Time Synchronization - the 2481 Network Time Protocol. CRC Press, 304pp.", 2006. 2483 [ref8] Bellovin, S. and E. Rescorla, Proceedings of the 13th 2484 annual ISOC Network and Distributed System Security 2485 Symposium, "Deploying a new Hash Algorithm", 2486 February 2006. 2488 [ref9] Mills, D.L., Electrical and Computer Engineering Technical 2489 Report 06-6-1, NDSS, June 2006., "Network Time Protocol 2490 Version 4 Reference and Implementation Guide.", 2006. 2492 18.2. Normative References 2494 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 2495 August 1980. 2497 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 2498 September 1981. 2500 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 2501 RFC 793, September 1981. 2503 [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 2504 April 1992. 2506 [RFC1345] Simonsen, K., "Character Mnemonics and Character Sets", 2507 RFC 1345, June 1992. 2509 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2510 Requirement Levels", BCP 14, RFC 2119, March 1997. 2512 Appendix A. Code Skeleton 2514 This appendix is intended to describe the protocol and algorithms of 2515 an implementation in a general way using what is called a code 2516 skeleton program. This consists of a set of definitions, structures 2517 and code fragments which illustrate the protocol operations without 2518 the complexities of an actual implementation of the protocol. This 2519 program is not an executable and is not designed to run in the 2520 ordinary sense. It is designed to be compiled only in order to 2521 verify consistent variable and type usage. 2523 Most of the features of the reference implementation are included 2524 here, with the following exceptions: There are no provisions for 2525 reference clocks or public key (Autokey) cryptography. There is no 2526 huff-n'-puff filter, anti-clockhop hysteresis or monitoring 2527 provisions. Many of the values that can be tinkered in the reference 2528 implementation are assumed constants here. There are only minimal 2529 provisions for the kiss-o'death packet and no responding code. 2531 The program is not intended to be fast or compact, just to 2532 demonstrate the algorithms with sufficient fidelity to understand how 2533 they work. The code skeleton consists of eight segments, a header 2534 segment included by each of the other segments, plus a code segment 2535 for the main program, kernel I/O and system clock interfaces, and 2536 peer, system, clock_adjust and poll processes. These are presented 2537 in order below along with definitions and variables specific to each 2538 process. 2540 A.1. Global Definitions 2542 A.1.1. Definitions, Constants, Parameters 2544 #include /* avoids complaints about sqrt() */ 2545 #include /* for gettimeofday() and friends */ 2546 #include /* for malloc() and friends */ 2548 /* 2549 * Data types 2550 * 2551 * This program assumes the int data type is 32 bits and the long data 2552 * type is 64 bits. The native data type used in most calculations is 2553 * floating double. The data types used in some packet header fields 2554 * require conversion to and from this representation. Some header 2555 * fields involve partitioning an octet, here represented by individual 2556 * octets. 2557 * 2558 * The 64-bit NTP timestamp format used in timestamp calculations is 2559 * unsigned seconds and fraction with the decimal point to the left of 2560 * bit 32. The only operation permitted with these values is 2561 * subtraction, yielding a signed 31-bit difference. The 32-bit NTP 2562 * short format used in delay and dispersion calculations is seconds and 2563 * fraction with the decimal point to the left of bit 16. The only 2564 * operations permitted with these values are addition and 2565 * multiplication by a constant. 2566 * 2567 * The IPv4 address is 32 bits, while the IPv6 address is 128 bits. The 2568 * message digest field is 128 bits as constructed by the MD5 algorithm. 2569 * The precision and poll interval fields are signed log2 seconds. 2570 */ 2572 typedef unsigned long tstamp; /* NTP timestamp format */ 2573 typedef unsigned int tdist; /* NTP short format */ 2574 typedef unsigned long ipaddr; /* IPv4 or IPv6 address */ 2575 typedef unsigned long digest; /* md5 digest */ 2576 typedef signed char s_char; /* precision and poll interval (log2) */ 2578 /* 2579 * Timestamp conversion macroni 2580 */ 2581 #define FRIC 65536. /* 2^16 as a double */ 2582 #define D2FP(r) ((tdist)((r) * FRIC)) /* NTP short */ 2583 #define FP2D(r) ((double)(r) / FRIC) 2585 #define FRAC 4294967296. /* 2^32 as a double */ 2586 #define D2LFP(a) ((tstamp)((a) * FRAC)) /* NTP timestamp */ 2587 #define LFP2D(a) ((double)(a) / FRAC) 2588 #define U2LFP(a) ((a).tv_sec + JAN_1970 << 32 + (unsigned long) \ 2589 ((a).tv_usec / 1e6 * FRAC)) 2591 /* 2592 * Arithmetic conversions 2593 */ 2594 #define LOG2D(a) ((a) < 0 ? 1. / (1L << -(a)) : \ 2595 1L << (a)) /* poll, etc. */ 2596 #define SQUARE(x) (x * x) 2597 #define SQRT(x) (sqrt(x)) 2599 /* 2600 * Global constants. Some of these might be converted to variables 2601 * which can be tinkered by configuration or computed on-fly. For 2602 * instance, the reference implementation computes PRECISION on-fly and 2603 * provides performance tuning for the defines marked with % below. 2604 */ 2605 #define VERSION 4 /* version number */ 2606 #define MINDISP .01 /* % minimum dispersion (s) */ 2607 #define MAXDISP 16 /* maximum dispersion (s) */ 2608 #define MAXDIST 1 /* % distance threshold (s) */ 2609 #define NOSYNC 0x3 /* leap unsync */ 2610 #define MAXSTRAT 16 /* maximum stratum (infinity metric) */ 2611 #define MINPOLL 6 /* % minimum poll interval (64 s)*/ 2612 #define MAXPOLL 17 /* % maximum poll interval (36.4 h) */ 2613 #define MINCLOCK 3 /* minimum manycast survivors */ 2614 #define MAXCLOCK 10 /* maximum manycast candidates */ 2615 #define TTLMAX 8 /* max ttl manycast */ 2616 #define BEACON 15 /* max interval between beacons */ 2618 #define PHI 15e-6 /* % frequency tolerance (15 PPM) */ 2619 #define NSTAGE 8 /* clock register stages */ 2620 #define NMAX 50 /* maximum number of peers */ 2621 #define NSANE 1 /* % minimum intersection survivors */ 2622 #define NMIN 3 /* % minimum cluster survivors */ 2624 /* 2625 * Global return values 2626 */ 2627 #define TRUE 1 /* boolean true */ 2628 #define FALSE 0 /* boolean false */ 2629 #define NULL 0 /* empty pointer */ 2631 /* 2632 * Local clock process return codes 2633 */ 2634 #define IGNORE 0 /* ignore */ 2635 #define SLEW 1 /* slew adjustment */ 2636 #define STEP 2 /* step adjustment */ 2637 #define PANIC 3 /* panic - no adjustment */ 2639 /* 2640 * System flags 2641 */ 2642 #define S_FLAGS 0 /* any system flags */ 2643 #define S_BCSTENAB 0x1 /* enable broadcast client */ 2645 /* 2646 * Peer flags 2647 */ 2648 #define P_FLAGS 0 /* any peer flags */ 2649 #define P_EPHEM 0x01 /* association is ephemeral */ 2650 #define P_BURST 0x02 /* burst enable */ 2651 #define P_IBURST 0x04 /* intial burst enable */ 2652 #define P_NOTRUST 0x08 /* authenticated access */ 2653 #define P_NOPEER 0x10 /* authenticated mobilization */ 2654 #define P_MANY 0x20 /* manycast client */ 2656 /* 2657 * Authentication codes 2658 */ 2659 #define A_NONE 0 /* no authentication */ 2660 #define A_OK 1 /* authentication OK */ 2661 #define A_ERROR 2 /* authentication error */ 2662 #define A_CRYPTO 3 /* crypto-NAK */ 2664 /* 2665 * Association state codes 2666 */ 2667 #define X_INIT 0 /* initialization */ 2668 #define X_STALE 1 /* timeout */ 2669 #define X_STEP 2 /* time step */ 2670 #define X_ERROR 3 /* authentication error */ 2671 #define X_CRYPTO 4 /* crypto-NAK received */ 2672 #define X_NKEY 5 /* untrusted key */ 2674 /* 2675 * Protocol mode definitionss 2676 */ 2677 #define M_RSVD 0 /* reserved */ 2678 #define M_SACT 1 /* symmetric active */ 2679 #define M_PASV 2 /* symmetric passive */ 2680 #define M_CLNT 3 /* client */ 2681 #define M_SERV 4 /* server */ 2682 #define M_BCST 5 /* broadcast server */ 2683 #define M_BCLN 6 /* broadcast client */ 2685 /* 2686 * Clock state definitions 2687 */ 2688 #define NSET 0 /* clock never set */ 2689 #define FSET 1 /* frequency set from file */ 2690 #define SPIK 2 /* spike detected */ 2691 #define FREQ 3 /* frequency mode */ 2692 #define SYNC 4 /* clock synchronized */ 2694 A.1.2. Packet Data Structures 2696 /* 2697 * The receive and transmit packets may contain an optional message 2698 * authentication code (MAC) consisting of a key identifier (keyid) and 2699 * message digest (mac). NTPv4 supports optional extension fields which 2700 * are inserted after the the header and before the MAC, but these are 2701 * not described here. 2702 * 2703 * Receive packet 2704 * 2705 * Note the dst timestamp is not part of the packet itself. It is 2706 * captured upon arrival and returned in the receive buffer along with 2707 * the buffer length and data. Note that some of the char fields are 2708 * packed in the actual header, but the details are omitted here. 2709 */ 2710 struct r { 2711 ipaddr srcaddr; /* source (remote) address */ 2712 ipaddr dstaddr; /* destination (local) address */ 2713 char version; /* version number */ 2714 char leap; /* leap indicator */ 2715 char mode; /* mode */ 2716 char stratum; /* stratum */ 2717 char poll; /* poll interval */ 2718 s_char precision; /* precision */ 2719 tdist rootdelay; /* root delay */ 2720 tdist rootdisp; /* root dispersion */ 2721 char refid; /* reference ID */ 2722 tstamp reftime; /* reference time */ 2723 tstamp org; /* origin timestamp */ 2724 tstamp rec; /* receive timestamp */ 2725 tstamp xmt; /* transmit timestamp */ 2726 int keyid; /* key ID */ 2727 digest mac; /* message digest */ 2728 tstamp dst; /* destination timestamp */ 2729 } r; 2731 /* 2732 * Transmit packet 2733 */ 2734 struct x { 2735 ipaddr dstaddr; /* source (local) address */ 2736 ipaddr srcaddr; /* destination (remote) address */ 2737 char version; /* version number */ 2738 char leap; /* leap indicator */ 2739 char mode; /* mode */ 2740 char stratum; /* stratum */ 2741 char poll; /* poll interval */ 2742 s_char precision; /* precision */ 2743 tdist rootdelay; /* root delay */ 2744 tdist rootdisp; /* root dispersion */ 2745 char refid; /* reference ID */ 2746 tstamp reftime; /* reference time */ 2747 tstamp org; /* origin timestamp */ 2748 tstamp rec; /* receive timestamp */ 2749 tstamp xmt; /* transmit timestamp */ 2750 int keyid; /* key ID */ 2751 digest mac; /* message digest */ 2752 } x; 2754 A.1.3. Association Data Structures 2756 /* 2757 * Filter stage structure. Note the t member in this and other 2758 * structures refers to process time, not real time. Process time 2759 * increments by one second for every elapsed second of real time. 2760 */ 2761 struct f { 2762 tstamp t; /* update time */ 2763 double offset; /* clock ofset */ 2764 double delay; /* roundtrip delay */ 2765 double disp; /* dispersion */ 2766 } f; 2768 /* 2769 * Association structure. This is shared between the peer process and 2770 * poll process. 2771 */ 2772 struct p { 2774 /* 2775 * Variables set by configuration 2776 */ 2777 ipaddr srcaddr; /* source (remote) address */ 2778 ipaddr dstaddr; /* destination (local) address */ 2779 char version; /* version number */ 2780 char hmode; /* host mode */ 2781 int keyid; /* key identifier */ 2782 int flags; /* option flags */ 2784 /* 2785 * Variables set by received packet 2786 */ 2787 char leap; /* leap indicator */ 2788 char pmode; /* peer mode */ 2789 char stratum; /* stratum */ 2790 char ppoll; /* peer poll interval */ 2791 double rootdelay; /* root delay */ 2792 double rootdisp; /* root dispersion */ 2793 char refid; /* reference ID */ 2794 tstamp reftime; /* reference time */ 2795 #define begin_clear org /* beginning of clear area */ 2796 tstamp org; /* originate timestamp */ 2797 tstamp rec; /* receive timestamp */ 2798 tstamp xmt; /* transmit timestamp */ 2800 /* 2801 * Computed data 2802 */ 2803 double t; /* update time */ 2804 struct f f[NSTAGE]; /* clock filter */ 2805 double offset; /* peer offset */ 2806 double delay; /* peer delay */ 2807 double disp; /* peer dispersion */ 2808 double jitter; /* RMS jitter */ 2810 /* 2811 * Poll process variables 2812 */ 2813 char hpoll; /* host poll interval */ 2814 int burst; /* burst counter */ 2815 int reach; /* reach register */ 2816 int ttl; /* ttl (manycast) */ 2817 #define end_clear unreach /* end of clear area */ 2818 int unreach; /* unreach counter */ 2819 int outdate; /* last poll time */ 2820 int nextdate; /* next poll time */ 2821 } p; 2823 A.1.4. System Data Structures 2825 /* 2826 * Chime list. This is used by the intersection algorithm. 2827 */ 2828 struct m { /* m is for Marzullo */ 2829 struct p *p; /* peer structure pointer */ 2830 int type; /* high +1, mid 0, low -1 */ 2831 double edge; /* correctness interval edge */ 2832 } m; 2834 /* 2835 * Survivor list. This is used by the clustering algorithm. 2836 */ 2837 struct v { 2838 struct p *p; /* peer structure pointer */ 2839 double metric; /* sort metric */ 2840 } v; 2842 /* 2843 * System structure 2844 */ 2845 struct s { 2846 tstamp t; /* update time */ 2847 char leap; /* leap indicator */ 2848 char stratum; /* stratum */ 2849 char poll; /* poll interval */ 2850 char precision; /* precision */ 2851 double rootdelay; /* root delay */ 2852 double rootdisp; /* root dispersion */ 2853 char refid; /* reference ID */ 2854 tstamp reftime; /* reference time */ 2855 struct m m[NMAX]; /* chime list */ 2856 struct v v[NMAX]; /* survivor list */ 2857 struct p *p; /* association ID */ 2858 double offset; /* combined offset */ 2859 double jitter; /* combined jitter */ 2860 int flags; /* option flags */ 2861 int n; /* number of survivors */ 2862 } s; 2864 A.1.5. Local Clock Data Structures 2866 /* 2867 * Local clock structure 2868 */ 2869 struct c { 2870 tstamp t; /* update time */ 2871 int state; /* current state */ 2872 double offset; /* current offset */ 2873 double last; /* previous offset */ 2874 int count; /* jiggle counter */ 2875 double freq; /* frequency */ 2876 double jitter; /* RMS jitter */ 2877 double wander; /* RMS wander */ 2878 } c; 2880 A.1.6. Function Prototypes 2882 /* 2883 * Peer process 2884 */ 2885 void receive(struct r *); /* receive packet */ 2886 void packet(struct p *, struct r *); /* process packet */ 2887 void clock_filter(struct p *, double, double, double); /* filter */ 2888 double root_dist(struct p *); /* calculate root distance */ 2889 int fit(struct p *); /* determine fitness of server */ 2890 void clear(struct p *, int); /* clear association */ 2891 int access(struct r *); /* determine access restrictions */ 2893 /* 2894 * System process 2895 */ 2896 int main(); /* main program */ 2897 void clock_select(); /* find the best clocks */ 2898 void clock_update(struct p *); /* update the system clock */ 2899 void clock_combine(); /* combine the offsets */ 2901 /* 2902 * Local clock process 2903 */ 2904 int local_clock(struct p *, double); /* clock discipline */ 2905 void rstclock(int, double, double); /* clock state transition */ 2907 /* 2908 * Clock adjust process 2909 */ 2910 void clock_adjust(); /* one-second timer process */ 2911 /* 2912 * Poll process 2913 */ 2914 void poll(struct p *); /* poll process */ 2915 void poll_update(struct p *, int); /* update the poll interval */ 2916 void peer_xmit(struct p *); /* transmit a packet */ 2917 void fast_xmit(struct r *, int, int); /* transmit a reply packet */ 2919 /* 2920 * Utility routines 2921 */ 2922 digest md5(int); /* generate a message digest */ 2923 struct p *mobilize(ipaddr, ipaddr, int, int, int, int); /* mobilize */ 2924 struct p *find_assoc(struct r *); /* search the association table */ 2926 /* 2927 * Kernel interface 2928 */ 2929 struct r *recv_packet(); /* wait for packet */ 2930 void xmit_packet(struct x *); /* send packet */ 2931 void step_time(double); /* step time */ 2932 void adjust_time(double); /* adjust (slew) time */ 2933 tstamp get_time(); /* read time */ 2935 A.2. Main Program and Utility Routines 2937 /* 2938 * Definitions 2939 */ 2940 #define PRECISION -18 /* precision (log2 s) */ 2941 #define IPADDR 0 /* any IP address */ 2942 #define MODE 0 /* any NTP mode */ 2943 #define KEYID 0 /* any key identifier */ 2945 /* 2946 * main() - main program 2947 */ 2948 int 2949 main() 2950 { 2951 struct p *p; /* peer structure pointer */ 2952 struct r *r; /* receive packet pointer */ 2954 /* 2955 * Read command line options and initialize system variables. 2956 * The reference implementation measures the precision specific 2957 * to each machine by measuring the clock increments to read the 2958 * system clock. 2960 */ 2961 memset(&s, sizeof(s), 0); 2962 s.leap = NOSYNC; 2963 s.stratum = MAXSTRAT; 2964 s.poll = MINPOLL; 2965 s.precision = PRECISION; 2966 s.p = NULL; 2968 /* 2969 * Initialize local clock variables 2970 */ 2971 memset(&c, sizeof(c), 0); 2972 if (/* frequency file */ 0) { 2973 c.freq = /* freq */ 0; 2974 rstclock(FSET, 0, 0); 2975 } else { 2976 rstclock(NSET, 0, 0); 2977 } 2978 c.jitter = LOG2D(s.precision); 2980 /* 2981 * Read the configuration file and mobilize persistent 2982 * associations with specified addresses, version, mode, key ID 2983 * and flags. 2984 */ 2985 while (/* mobilize configurated associations */ 0) { 2986 p = mobilize(IPADDR, IPADDR, VERSION, MODE, KEYID, 2987 P_FLAGS); 2988 } 2990 /* 2991 * Start the system timer, which ticks once per second. Then 2992 * read packets as they arrive, strike receive timestamp and 2993 * call the receive() routine. 2994 */ 2995 while (0) { 2996 r = recv_packet(); 2997 r->dst = get_time(); 2998 receive(r); 2999 } 3000 } 3002 /* 3003 * mobilize() - mobilize and initialize an association 3004 */ 3005 struct p 3006 *mobilize( 3007 ipaddr srcaddr, /* IP source address */ 3008 ipaddr dstaddr, /* IP destination address */ 3009 int version, /* version */ 3010 int mode, /* host mode */ 3011 int keyid, /* key identifier */ 3012 int flags /* peer flags */ 3013 ) 3014 { 3015 struct p *p; /* peer process pointer */ 3017 /* 3018 * Allocate and initialize association memory 3019 */ 3020 p = malloc(sizeof(struct p)); 3021 p->srcaddr = srcaddr; 3022 p->dstaddr = dstaddr; 3023 p->version = version; 3024 p->hmode = mode; 3025 p->keyid = keyid; 3026 p->hpoll = MINPOLL; 3027 clear(p, X_INIT); 3028 p->flags == flags; 3029 return (p); 3030 } 3032 /* 3033 * find_assoc() - find a matching association 3034 */ 3035 struct p /* peer structure pointer or NULL */ 3036 *find_assoc( 3037 struct r *r /* receive packet pointer */ 3038 ) 3039 { 3040 struct p *p; /* dummy peer structure pointer */ 3042 /* 3043 * Search association table for matching source 3044 * address, source port and mode. 3045 */ 3046 while (/* all associations */ 0) { 3047 if (r->srcaddr == p->srcaddr && r->mode == p->hmode) 3048 return(p); 3049 } 3050 return (NULL); 3051 } 3053 /* 3054 * md5() - compute message digest 3055 */ 3057 digest 3058 md5( 3059 int keyid /* key identifier */ 3060 ) 3061 { 3062 /* 3063 * Compute a keyed cryptographic message digest. The key 3064 * identifier is associated with a key in the local key cache. 3065 * The key is prepended to the packet header and extension fields 3066 * and the result hashed by the MD5 algorithm as described in 3067 * RFC-1321. Return a MAC consisting of the 32-bit key ID 3068 * concatenated with the 128-bit digest. 3069 */ 3070 return (/* MD5 digest */ 0); 3071 } 3073 A.3. Kernel Input/Output Interface 3075 /* 3076 * Kernel interface to transmit and receive packets. Details are 3077 * deliberately vague and depend on the operating system. 3078 * 3079 * recv_packet - receive packet from network 3080 */ 3081 struct r /* receive packet pointer*/ 3082 *recv_packet() { 3083 return (/* receive packet r */ 0); 3084 } 3086 /* 3087 * xmit_packet - transmit packet to network 3088 */ 3089 void 3090 xmit_packet( 3091 struct x *x /* transmit packet pointer */ 3092 ) 3093 { 3094 /* send packet x */ 3095 } 3097 A.4. Kernel System Clock Interface 3099 /* 3100 * System clock utility functions 3101 * 3102 * There are three time formats: native (Unix), NTP and floating double. 3103 * The get_time() routine returns the time in NTP long format. The Unix 3104 * routines expect arguments as a structure of two signed 32-bit words 3105 * in seconds and microseconds (timeval) or nanoseconds (timespec). The 3106 * step_time() and adjust_time() routines expect signed arguments in 3107 * floating double. The simplified code shown here is for illustration 3108 * only and has not been verified. 3109 */ 3110 #define JAN_1970 2208988800UL /* 1970 - 1900 in seconds */ 3112 /* 3113 * get_time - read system time and convert to NTP format 3114 */ 3115 tstamp 3116 get_time() 3117 { 3118 struct timeval unix_time; 3120 /* 3121 * There are only two calls on this routine in the program. One 3122 * when a packet arrives from the network and the other when a 3123 * packet is placed on the send queue. Call the kernel time of 3124 * day routine (such as gettimeofday()) and convert to NTP 3125 * format. 3126 */ 3127 gettimeofday(&unix_time, NULL); 3128 return (U2LFP(unix_time)); 3129 } 3131 /* 3132 * step_time() - step system time to given offset valuet 3133 */ 3134 void 3135 step_time( 3136 double offset /* clock offset */ 3137 ) 3138 { 3139 struct timeval unix_time; 3140 tstamp ntp_time; 3142 /* 3143 * Convert from double to native format (signed) and add to the 3144 * current time. Note the addition is done in native format to 3145 * avoid overflow or loss of precision. 3146 */ 3147 gettimeofday(&unix_time, NULL); 3148 ntp_time = D2LFP(offset) + U2LFP(unix_time);; 3149 unix_time.tv_sec = ntp_time >> 32; 3150 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3151 32) / FRAC * 1e6); 3152 settimeofday(&unix_time, NULL); 3153 } 3155 /* 3156 * adjust_time() - slew system clock to given offset value 3157 */ 3158 void 3159 adjust_time( 3160 double offset /* clock offset */ 3161 ) 3162 { 3163 struct timeval unix_time; 3164 tstamp ntp_time; 3166 /* 3167 * Convert from double to native format (signed) and add to the 3168 * current time. 3169 */ 3170 ntp_time = D2LFP(offset); 3171 unix_time.tv_sec = ntp_time >> 32; 3172 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3173 32) / FRAC * 1e6); 3174 adjtime(&unix_time, NULL); 3175 } 3177 A.5. Peer Process 3179 /* 3180 * A crypto-NAK packet includes the NTP header followed by a MAC 3181 * consisting only of the key identifier with value zero. It tells the 3182 * receiver that a prior request could not be properly authenticated, 3183 * but the NTP header fields are correct. 3184 * 3185 * A kiss-o'-death packet is an NTP header with leap 0x3 (NOSYNC) and 3186 * stratum 16 (MAXSTRAT. It tells the receiver that something drastic 3187 * has happened, as revealed by the kiss code in the refid field. The 3188 * NTP header fields may or may not be correct. 3189 */ 3190 /* 3191 * Peer process parameters and constants 3192 */ 3193 #define SGATE 3 /* spike gate (clock filter */ 3194 #define BDELAY .004 /* broadcast delay (s) */ 3196 /* 3197 * Dispatch codes 3198 */ 3199 #define ERR -1 /* error */ 3200 #define DSCRD 0 /* discard packet */ 3201 #define PROC 1 /* process packet */ 3202 #define BCST 2 /* broadcast packet */ 3203 #define FXMIT 3 /* client packet */ 3204 #define NEWPS 4 /* new symmetric passive client */ 3205 #define NEWBC 5 /* new broadcast client */ 3207 /* 3208 * Dispatch matrix 3209 * active passv client server bcast */ 3210 int table[7][5] = { 3211 /* nopeer */ { NEWPS, DSCRD, FXMIT, MANY, NEWBC }, 3212 /* active */ { PROC, PROC, DSCRD, DSCRD, DSCRD }, 3213 /* passv */ { PROC, ERR, DSCRD, DSCRD, DSCRD }, 3214 /* client */ { DSCRD, DSCRD, DSCRD, PROC, DSCRD }, 3215 /* server */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3216 /* bcast */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3217 /* bclient */ { DSCRD, DSCRD, DSCRD, DSCRD, PROC} 3218 }; 3220 /* 3221 * Miscellaneous macroni 3222 * 3223 * This macro defines the authentication state. If x is 0, 3224 * authentication is optional, otherwise it is required. 3225 */ 3226 #define AUTH(x, y) ((x) ? (y) == A_OK : (y) == A_OK || \ 3227 (y) == A_NONE) 3229 /* 3230 * These are used by the clear() routine 3231 */ 3232 #define BEGIN_CLEAR(p) ((char *)&((p)->begin_clear)) 3233 #define END_CLEAR(p) ((char *)&((p)->end_clear)) 3234 #define LEN_CLEAR (END_CLEAR((struct p *)0) - \ 3235 BEGIN_CLEAR((struct p *)0)) 3237 A.5.1. receive() 3239 /* 3240 * receive() - receive packet and decode modes 3241 */ 3242 void 3243 receive( 3244 struct r *r /* receive packet pointer */ 3245 ) 3247 { 3248 struct p *p; /* peer structure pointer 3249 int auth; /* authentication code */ 3250 int has_mac; /* size of MAC */ 3251 int synch; /* synchronized switch */ 3252 int auth; /* authentication code */ 3254 /* 3255 * Check access control lists. The intent here is to implement a 3256 * whitelist of those IP addresses specifically accepted and/or 3257 * a blacklist of those IP addresses specifically rejected. 3258 * There could be different lists for authenticated clients and 3259 * unauthenticated clients. 3260 */ 3261 if (!access(r)) 3262 return; /* access denied */ 3264 /* 3265 * The version must not be in the future. Format checks include 3266 * packet length, MAC length and extension field lengths, if 3267 * present. 3268 */ 3269 if (r->version > VERSION /* or format error */) 3270 return; /* format error */ 3272 /* 3273 * Authentication is conditioned by two switches which can be 3274 * specified on a per-client basis. 3275 * 3276 * P_NOPEER do not mobilize an association unless 3277 * authenticated 3278 * P_NOTRUST do not allow access unless authenticated 3279 * (implies P_NOPEER) 3280 * 3281 * There are four outcomes: 3282 * 3283 * A_NONE the packet has no MAC 3284 * A_OK the packet has a MAC and authentication 3285 * succeeds 3286 * A_ERROR the packet has a MAC and authentication fails 3287 * A_CRYPTO crypto-NAK. The MAC has four octets only. 3288 * 3289 * Note: The AUTH(x, y) macro is used to filter outcomes. If x 3290 * is zero, acceptable outcomes of y are NONE and OK. If x is 3291 * one, the only acceptable outcome of y is OK. 3292 */ 3293 has_mac = /* length of MAC field */ 0; 3294 if (has_mac == 0) { 3295 auth = A_NONE; /* not required */ 3296 } else if (has_mac == 4) { 3297 auth == A_CRYPTO; /* crypto-NAK */ 3298 } else { 3299 if (r->mac != md5(r->keyid)) 3300 auth = A_ERROR; /* auth error */ 3301 else 3302 auth = A_OK; /* auth OK */ 3303 } 3305 /* 3306 * Find association and dispatch code. If there is no 3307 * association to match, the value of p->hmode is assumed NULL. 3308 */ 3309 p = find_assoc(r); 3310 switch(table[p->hmode][r->mode]) { 3312 /* 3313 * Client packet and no association. Send server reply without 3314 * saving state. 3315 */ 3316 case FXMIT: 3318 /* 3319 * If unicast destination address, send server packet. 3320 * If authentication fails, send a crypto-NAK packet. 3321 /* 3322 if (/* not multicast dstaddr */0) { 3323 if (AUTH(p->flags & P_NOTRUST, auth)) 3324 fast_xmit(r, M_SERV, auth); 3325 else if (auth == A_ERROR) 3326 fast_xmit(r, M_SERV, A_CRYPTO); 3327 return; /* M_SERV packet sent */ 3328 } 3330 /* 3331 * This must be manycast. Do not respond if we are not 3332 * synchronized or if our stratum is above the 3333 * manycaster. 3334 */ 3335 if (s.leap == NOSYNC || s.stratum > r->stratum) 3336 return; 3338 /* 3339 * Respond only if authentication is ok. Note that the 3340 * unicast address is used, not the multicast. 3341 */ 3342 if (AUTH(p->flags & P_NOTRUST, auth)) 3343 fast_xmit(r, M_SERV, auth); 3344 return; 3346 /* 3347 * New manycast client ephemeral association. It is mobilized in 3348 * the same version as in the packet. If authentication fails, 3349 * ignore the packet. Verify the server packet by comparing the 3350 * r->org timestamp in the packet with the p->xmt timestamp in 3351 * the multicast client association. If they match, the server 3352 * packet is authentic. Details omitted. 3353 */ 3354 case MANY: 3355 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3356 return; /* authentication error */ 3358 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_CLNT, 3359 r->keyid, P_EPHEM); 3360 break; 3362 /* 3363 * New symmetric passive association. It is mobilized in the same 3364 * version as in the packet. If authentication fails, send a 3365 * crypto-NAK packet. If restrict no-moblize, send a symmetric 3366 * active packet instead. 3367 */ 3368 case NEWPS: 3369 if (!AUTH(p->flags & P_NOTRUST, auth)) { 3370 if (auth == A_ERROR) 3371 fast_xmit(r, M_SACT, A_CRYPTO); 3372 return; /* crypto-NAK packet sent */ 3373 } 3374 if (!AUTH(p->flags & P_NOPEER, auth)) { 3375 fast_xmit(r, M_SACT, auth); 3376 return; /* M_SACT packet sent */ 3377 } 3378 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_PASV, 3379 r->keyid, P_EPHEM); 3380 break; 3382 /* 3383 * New broadcast client association. It is mobilized in the same 3384 * version as in the packet. If authentication fails, ignore the 3385 * packet. Note this code does not support the initial volley 3386 * feature in the reference implementation. 3387 */ 3388 case NEWBC: 3389 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3390 return; /* authentication error */ 3392 if (!(s.flags & S_BCSTENAB)) 3393 return; /* broadcast not enabled */ 3395 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_BCLN, 3396 r->keyid, P_EPHEM); 3397 break; /* processing continues */ 3399 /* 3400 * Process packet. Placeholdler only. 3401 */ 3402 case PROC: 3403 break; /* processing continues */ 3405 /* 3406 * Invalid mode combination. We get here only in case of 3407 * ephemeral associations, so the correct action is simply to 3408 * toss it. 3409 */ 3410 case ERR: 3411 clear(p, X_ERROR); 3412 return; /* invalid mode combination */ 3414 /* 3415 * No match; just discard the packet. 3416 */ 3417 case DSCRD: 3418 return; /* orphan abandoned */ 3419 } 3421 /* 3422 * Next comes a rigorous schedule of timestamp checking. If the 3423 * transmit timestamp is zero, the server is horribly broken. 3424 */ 3425 if (r->xmt == 0) 3426 return; /* invalid timestamp */ 3428 /* 3429 * If the transmit timestamp duplicates a previous one, the 3430 * packet is a replay. 3431 */ 3432 if (r->xmt == p->xmt) 3433 return; /* duplicate packet */ 3435 /* 3436 * If this is a broadcast mode packet, skip further checking. 3437 * If the origin timestamp is zero, the sender has not yet heard 3438 * from us. Otherwise, if the origin timestamp does not match 3439 * the transmit timestamp, the packet is bogus. 3441 */ 3442 synch = TRUE; 3443 if (r->mode != M_BCST) { 3444 if (r->org == 0) 3445 synch = FALSE; /* unsynchronized */ 3447 else if (r->org != p->xmt) 3448 synch = FALSE; /* bogus packet */ 3449 } 3451 /* 3452 * Update the origin and destination timestamps. If 3453 * unsynchronized or bogus, abandon ship. 3454 */ 3455 p->org = r->xmt; 3456 p->rec = r->dst; 3457 if (!synch) 3458 return; /* unsynch */ 3460 /* 3461 * The timestamps are valid and the receive packet matches the 3462 * last one sent. If the packet is a crypto-NAK, the server 3463 * might have just changed keys. We demobilize the association 3464 * and wait for better times. 3465 */ 3466 if (auth == A_CRYPTO) { 3467 clear(p, X_CRYPTO); 3468 return; /* crypto-NAK */ 3469 } 3471 /* 3472 * If the association is authenticated, the key ID is nonzero 3473 * and received packets must be authenticated. This is designed 3474 * to avoid a bait-and-switch attack, which was possible in past 3475 * versions. 3476 */ 3477 if (!AUTH(p->keyid || (p->flags & P_NOTRUST), auth)) 3478 return; /* bad auth */ 3480 /* 3481 * Everything possible has been done to validate the timestamps 3482 * and prevent bad guys from disrupting the protocol or 3483 * injecting bogus data. Earn some revenue. 3484 */ 3485 packet(p, r); 3486 } 3488 A.5.1.1. packet() 3489 /* 3490 * packet() - process packet and compute offset, delay and 3491 * dispersion. 3492 */ 3493 void 3494 packet( 3495 struct p *p, /* peer structure pointer */ 3496 struct r *r /* receive packet pointer */ 3497 ) 3498 { 3499 double offset; /* sample offsset */ 3500 double delay; /* sample delay */ 3501 double disp; /* sample dispersion */ 3503 /* 3504 * By golly the packet is valid. Light up the remaining header 3505 * fields. Note that we map stratum 0 (unspecified) to MAXSTRAT 3506 * to make stratum comparisons simpler and to provide a natural 3507 * interface for radio clock drivers that operate for 3508 * convenience at stratum 0. 3509 */ 3510 p->leap = r->leap; 3511 if (r->stratum == 0) 3512 p->stratum = MAXSTRAT; 3513 else 3514 p->stratum = r->stratum; 3515 p->pmode = r->mode; 3516 p->ppoll = r->poll; 3517 p->rootdelay = FP2D(r->rootdelay); 3518 p->rootdisp = FP2D(r->rootdisp); 3519 p->refid = r->refid; 3520 p->reftime = r->reftime; 3522 /* 3523 * Verify the server is synchronized with valid stratum and 3524 * reference time not later than the transmit time. 3525 */ 3526 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3527 return; /* unsynchronized */ 3529 /* 3530 * Verify valid root distance. 3531 */ 3532 if (r->rootdelay / 2 + r->rootdisp >= MAXDISP || p->reftime > 3533 r->xmt) 3534 return; /* invalid header values */ 3536 poll_update(p, p->hpoll); 3537 p->reach |= 1; 3539 /* 3540 * Calculate offset, delay and dispersion, then pass to the 3541 * clock filter. Note carefully the implied processing. The 3542 * first-order difference is done directly in 64-bit arithmetic, 3543 * then the result is converted to floating double. All further 3544 * processing is in floating double arithmetic with rounding 3545 * done by the hardware. This is necessary in order to avoid 3546 * overflow and preseve precision. 3547 * 3548 * The delay calculation is a special case. In cases where the 3549 * server and client clocks are running at different rates and 3550 * with very fast networks, the delay can appear negative. In 3551 * order to avoid violating the Principle of Least Astonishment, 3552 * the delay is clamped not less than the system precision. 3553 */ 3554 if (p->pmode == M_BCST) { 3555 offset = LFP2D(r->xmt - r->dst); 3556 delay = BDELAY; 3557 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3558 2 * BDELAY; 3559 } else { 3560 offset = (LFP2D(r->rec - r->org) + LFP2D(r->dst - 3561 r->xmt)) / 2; 3562 delay = max(LFP2D(r->dst - r->org) - LFP2D(r->rec - 3563 r->xmt), LOG2D(s.precision)); 3564 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3565 LFP2D(r->dst - r->org); 3566 } 3567 clock_filter(p, offset, delay, disp); 3568 } 3570 A.5.2. clock_filter() 3572 /* 3573 * clock_filter(p, offset, delay, dispersion) - select the best from the 3574 * latest eight delay/offset samples. 3575 */ 3576 void 3577 clock_filter( 3578 struct p *p, /* peer structure pointer */ 3579 double offset, /* clock offset */ 3580 double delay, /* roundtrip delay */ 3581 double disp /* dispersion */ 3582 ) 3583 { 3584 struct f f[NSTAGE]; /* sorted list */ 3585 double dtemp; 3586 int i; 3588 /* 3589 * The clock filter contents consist of eight tuples (offset, 3590 * delay, dispersion, time). Shift each tuple to the left, 3591 * discarding the leftmost one. As each tuple is shifted, 3592 * increase the dispersion since the last filter update. At the 3593 * same time, copy each tuple to a temporary list. After this, 3594 * place the (offset, delay, disp, time) in the vacated 3595 * rightmost tuple. 3596 */ 3597 for (i = 1; i < NSTAGE; i++) { 3598 p->f[i] = p->f[i - 1]; 3599 p->f[i].disp += PHI * (c.t - p->t); 3600 f[i] = p->f[i]; 3601 } 3602 p->f[0].t = c.t; 3603 p->f[0].offset = offset; 3604 p->f[0].delay = delay; 3605 p->f[0].disp = disp; 3606 f[0] = p->f[0]; 3608 /* 3609 * Sort the temporary list of tuples by increasing f[].delay. 3610 * The first entry on the sorted list represents the best 3611 * sample, but it might be old. 3612 */ 3613 dtemp = p->offset; 3614 p->offset = f[0].offset; 3615 p->delay = f[0].delay; 3616 for (i = 0; i < NSTAGE; i++) { 3617 p->disp += f[i].disp / (2 ^ (i + 1)); 3618 p->jitter += SQUARE(f[i].offset - f[0].offset); 3619 } 3620 p->jitter = max(SQRT(p->jitter), LOG2D(s.precision)); 3622 /* 3623 * Prime directive: use a sample only once and never a sample 3624 * older than the latest one, but anything goes before first 3625 * synchronized. 3626 */ 3627 if (f[0].t - p->t <= 0 && s.leap != NOSYNC) 3628 return; 3630 /* 3631 * Popcorn spike suppressor. Compare the difference between the 3632 * last and current offsets to the current jitter. If greater 3633 * than SGATE (3) and if the interval since the last offset is 3634 * less than twice the system poll interval, dump the spike. 3635 * Otherwise, and if not in a burst, shake out the truechimers. 3636 */ 3637 if (fabs(p->offset - dtemp) > SGATE * p->jitter && (f[0].t - 3638 p->t) < 2 * s.poll) 3639 return; 3641 p->t = f[0].t; 3642 if (p->burst == 0) 3643 clock_select(); 3644 return; 3645 } 3647 /* 3648 * root_dist() - calculate root distance 3649 */ 3650 double 3651 root_dist( 3652 struct p *p /* peer structure pointer */ 3653 ) 3654 { 3655 /* 3656 * The root synchronization distance is the maximum error due to 3657 * all causes of the local clock relative to the primary server. 3658 * It is defined as half the total delay plus total dispersion 3659 * plus peer jitter. 3660 */ 3661 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3662 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3663 } 3665 /* 3666 * fit() - test if association p is acceptable for synchronization 3667 */ 3668 int 3669 fit( 3670 struct p *p /* peer structure pointer */ 3671 ) 3672 { 3673 /* 3674 * A stratum error occurs if (1) the server has never been 3675 * synchronized, (2) the server stratum is invalid. 3676 */ 3677 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3678 return (FALSE); 3680 /* 3681 * A distance error occurs if the root distance exceeds the 3682 * distance threshold plus an increment equal to one poll 3683 * interval. 3684 */ 3685 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 3686 return (FALSE); 3688 /* 3689 * A loop error occurs if the remote peer is synchronized to the 3690 * local peer or the remote peer is synchronized to the current 3691 * system peer. Note this is the behavior for IPv4; for IPv6 the 3692 * MD5 hash is used instead. 3693 */ 3694 if (p->refid == p->dstaddr || p->refid == s.refid) 3695 return (FALSE); 3697 /* 3698 * An unreachable error occurs if the server is unreachable. 3699 */ 3700 if (p->reach == 0) 3701 return (FALSE); 3703 return (TRUE); 3704 } 3706 /* 3707 * clear() - reinitialize for persistent association, demobilize 3708 * for ephemeral association. 3709 */ 3710 void 3711 clear( 3712 struct p *p, /* peer structure pointer */ 3713 int kiss /* kiss code */ 3714 ) 3715 { 3716 int i; 3718 /* 3719 * The first thing to do is return all resources to the bank. 3720 * Typical resources are not detailed here, but they include 3721 * dynamically allocated structures for keys, certificates, etc. 3722 * If an ephemeral association and not initialization, return 3723 * the association memory as well. 3724 */ 3725 /* return resources */ 3726 if (s.p == p) 3727 s.p = NULL; 3728 if (kiss != X_INIT && (p->flags & P_EPHEM)) { 3729 free(p); 3730 return; 3731 } 3733 /* 3734 * Initialize the association fields for general reset. 3735 */ 3736 memset(BEGIN_CLEAR(p), LEN_CLEAR, 0); 3737 p->leap = NOSYNC; 3738 p->stratum = MAXSTRAT; 3739 p->ppoll = MAXPOLL; 3740 p->hpoll = MINPOLL; 3741 p->disp = MAXDISP; 3742 p->jitter = LOG2D(s.precision); 3743 p->refid = kiss; 3744 for (i = 0; i < NSTAGE; i++) 3745 p->f[i].disp = MAXDISP; 3747 /* 3748 * Randomize the first poll just in case thousands of broadcast 3749 * clients have just been stirred up after a long absence of the 3750 * broadcast server. 3751 */ 3752 p->outdate = p->t = c.t; 3753 p->nextdate = p->outdate + (random() & ((1 << MINPOLL) - 1)); 3754 } 3756 A.5.3. fast_xmit() 3758 /* 3759 * fast_xmit() - transmit a reply packet for receive packet r 3760 */ 3761 void 3762 fast_xmit( 3763 struct r *r, /* receive packet pointer */ 3764 int mode, /* association mode */ 3765 int auth /* authentication code */ 3766 ) 3767 { 3768 struct x x; 3770 /* 3771 * Initialize header and transmit timestamp. Note that the 3772 * transmit version is copied from the receive version. This is 3773 * for backward compatibility. 3774 */ 3775 x.version = r->version; 3776 x.srcaddr = r->dstaddr; 3777 x.dstaddr = r->srcaddr; 3778 x.leap = s.leap; 3779 x.mode = mode; 3780 if (s.stratum == MAXSTRAT) 3781 x.stratum = 0; 3782 else 3783 x.stratum = s.stratum; 3784 x.poll = r->poll; 3785 x.precision = s.precision; 3786 x.rootdelay = D2FP(s.rootdelay); 3787 x.rootdisp = D2FP(s.rootdisp); 3788 x.refid = s.refid; 3789 x.reftime = s.reftime; 3790 x.org = r->xmt; 3791 x.rec = r->dst; 3792 x.xmt = get_time(); 3794 /* 3795 * If the authentication code is A.NONE, include only the 3796 * header; if A.CRYPTO, send a crypto-NAK; if A.OK, send a valid 3797 * MAC. Use the key ID in the received packet and the key in the 3798 * local key cache. 3799 */ 3800 if (auth != A_NONE) { 3801 if (auth == A_CRYPTO) { 3802 x.keyid = 0; 3803 } else { 3804 x.keyid = r->keyid; 3805 x.mac = md5(x.keyid); 3806 } 3807 } 3808 xmit_packet(&x); 3809 } 3810 A.5.4. access() 3812 /* 3813 * access() - determine access restrictions 3814 */ 3815 int 3816 access( 3817 struct r *r /* receive packet pointer */ 3818 ) 3819 { 3820 /* 3821 * The access control list is an ordered set of tuples 3822 * consisting of an address, mask and restrict word containing 3823 * defined bits. The list is searched for the first match on the 3824 * source address (r->srcaddr) and the associated restrict word 3825 * is returned. 3826 */ 3827 return (/* access bits */ 0); 3828 } 3830 A.5.5. System Process 3832 A.5.5.1. clock_select() 3834 /* 3835 * clock_select() - find the best clocks 3836 */ 3837 void 3838 clock_select() { 3839 struct p *p, *osys; /* peer structure pointers */ 3840 double low, high; /* correctness interval extents */ 3841 int allow, found, chime; /* used by intersection algorithm */ 3842 int n, i, j; 3844 /* 3845 * We first cull the falsetickers from the server population, 3846 * leaving only the truechimers. The correctness interval for 3847 * association p is the interval from offset - root_dist() to 3848 * offset + root_dist(). The object of the game is to find a 3849 * majority clique; that is, an intersection of correctness 3850 * intervals numbering more than half the server population. 3851 * 3852 * First construct the chime list of tuples (p, type, edge) as 3853 * shown below, then sort the list by edge from lowest to 3854 * highest. 3855 */ 3856 osys = s.p; 3857 s.p = NULL; 3858 n = 0; 3859 while (fit(p)) { 3860 s.m[n].p = p; 3861 s.m[n].type = +1; 3862 s.m[n].edge = p->offset + root_dist(p); 3863 n++; 3864 s.m[n].p = p; 3865 s.m[n].type = 0; 3866 s.m[n].edge = p->offset; 3867 n++; 3868 s.m[n].p = p; 3869 s.m[n].type = -1; 3870 s.m[n].edge = p->offset - root_dist(p); 3871 n++; 3872 } 3874 /* 3875 * Find the largest contiguous intersection of correctness 3876 * intervals. Allow is the number of allowed falsetickers; found 3877 * is the number of midpoints. Note that the edge values are 3878 * limited to the range +-(2 ^ 30) < +-2e9 by the timestamp 3879 * calculations. 3880 */ 3881 low = 2e9; high = -2e9; 3882 for (allow = 0; 2 * allow < n; allow++) { 3884 /* 3885 * Scan the chime list from lowest to highest to find 3886 * the lower endpoint. 3887 */ 3888 found = 0; 3889 chime = 0; 3890 for (i = 0; i < n; i++) { 3891 chime -= s.m[i].type; 3892 if (chime >= n - found) { 3893 low = s.m[i].edge; 3894 break; 3895 } 3896 if (s.m[i].type == 0) 3897 found++; 3898 } 3900 /* 3901 * Scan the chime list from highest to lowest to find 3902 * the upper endpoint. 3903 */ 3904 chime = 0; 3905 for (i = n - 1; i >= 0; i--) { 3906 chime += s.m[i].type; 3907 if (chime >= n - found) { 3908 high = s.m[i].edge; 3909 break; 3910 } 3911 if (s.m[i].type == 0) 3912 found++; 3913 } 3915 /* 3916 * If the number of midpoints is greater than the number 3917 * of allowed falsetickers, the intersection contains at 3918 * least one truechimer with no midpoint. If so, 3919 * increment the number of allowed falsetickers and go 3920 * around again. If not and the intersection is 3921 * nonempty, declare success. 3922 */ 3923 if (found > allow) 3924 continue; 3926 if (high > low) 3927 break; 3928 } 3930 /* 3931 * Clustering algorithm. Construct a list of survivors (p, 3932 * metric) from the chime list, where metric is dominated first 3933 * by stratum and then by root distance. All other things being 3934 * equal, this is the order of preference. 3935 */ 3936 s.n = 0; 3937 for (i = 0; i < n; i++) { 3938 if (s.m[i].edge < low || s.m[i].edge > high) 3939 continue; 3941 p = s.m[i].p; 3942 s.v[n].p = p; 3943 s.v[n].metric = MAXDIST * p->stratum + root_dist(p); 3944 s.n++; 3945 } 3947 /* 3948 * There must be at least NSANE survivors to satisfy the 3949 * correctness assertions. Ordinarily, the Byzantine criteria 3950 * require four, survivors, but for the demonstration here, one 3951 * is acceptable. 3952 */ 3953 if (s.n < NSANE) 3954 return; 3956 /* 3957 * For each association p in turn, calculate the selection 3958 * jitter p->sjitter as the square root of the sum of squares 3959 * (p->offset - q->offset) over all q associations. The idea is 3960 * to repeatedly discard the survivor with maximum selection 3961 * jitter until a termination condition is met. 3962 */ 3963 while (1) { 3964 struct p *p, *q, *qmax; /* peer structure pointers */ 3965 double max, min, dtemp; 3967 max = -2e9; min = 2e9; 3968 for (i = 0; i < s.n; i++) { 3969 p = s.v[i].p; 3970 if (p->jitter < min) 3971 min = p->jitter; 3972 dtemp = 0; 3973 for (j = 0; j < n; j++) { 3974 q = s.v[j].p; 3975 dtemp += SQUARE(p->offset - q->offset); 3976 } 3977 dtemp = SQRT(dtemp); 3978 if (dtemp > max) { 3979 max = dtemp; 3980 qmax = q; 3981 } 3982 } 3984 /* 3985 * If the maximum selection jitter is less than the 3986 * minimum peer jitter, then tossing out more survivors 3987 * will not lower the minimum peer jitter, so we might 3988 * as well stop. To make sure a few survivors are left 3989 * for the clustering algorithm to chew on, we also stop 3990 * if the number of survivors is less than or equal to 3991 * NMIN (3). 3992 */ 3993 if (max < min || n <= NMIN) 3994 break; 3996 /* 3997 * Delete survivor qmax from the list and go around 3998 * again. 3999 */ 4000 s.n--; 4001 } 4002 /* 4003 * Pick the best clock. If the old system peer is on the list 4004 * and at the same stratum as the first survivor on the list, 4005 * then don't do a clock hop. Otherwise, select the first 4006 * survivor on the list as the new system peer. 4007 */ 4008 if (osys->stratum == s.v[0].p->stratum) 4009 s.p = osys; 4010 else 4011 s.p = s.v[0].p; 4012 clock_update(s.p); 4013 } 4015 A.5.5.2. root_dist() 4017 /* 4018 * root_dist() - calculate root distance 4019 */ 4020 double 4021 root_dist( 4022 struct p *p /* peer structure pointer */ 4023 ) 4024 { 4026 /* 4027 * The root synchronization distance is the maximum error due to 4028 * all causes of the local clock relative to the primary server. 4029 * It is defined as half the total delay plus total dispersion 4030 * plus peer jitter. 4031 */ 4032 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 4033 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 4034 } 4035 A.5.5.3. accept() 4037 /* 4038 * accept() - test if association p is acceptable for synchronization 4039 */ 4040 int 4041 accept( 4042 struct p *p /* peer structure pointer */ 4043 ) 4044 { 4045 /* 4046 * A stratum error occurs if (1) the server has never been 4047 * synchronized, (2) the server stratum is invalid. 4048 */ 4049 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 4050 return (FALSE); 4052 /* 4053 * A distance error occurs if the root distance exceeds the 4054 * distance threshold plus an increment equal to one poll 4055 * interval. 4056 */ 4057 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 4058 return (FALSE); 4060 /* 4061 * A loop error occurs if the remote peer is synchronized to the 4062 * local peer or the remote peer is synchronized to the current 4063 * system peer. Note this is the behavior for IPv4; for IPv6 the 4064 * MD5 hash is used instead. 4065 */ 4066 if (p->refid == p->dstaddr || p->refid == s.refid) 4067 return (FALSE); 4069 /* 4070 * An unreachable error occurs if the server is unreachable. 4071 */ 4072 if (p->reach == 0) 4073 return (FALSE); 4075 return (TRUE); 4076 } 4078 A.5.5.4. clock_update() 4080 /* 4081 * clock_update() - update the system clock 4082 */ 4084 void 4085 clock_update( 4086 struct p *p /* peer structure pointer */ 4087 ) 4088 { 4089 double dtemp; 4091 /* 4092 * If this is an old update, for instance as the result of a 4093 * system peer change, avoid it. We never use an old sample or 4094 * the same sample twice. 4095 * 4096 if (s.t >= p->t) 4097 return; 4099 /* 4100 * Combine the survivor offsets and update the system clock; the 4101 * local_clock() routine will tell us the good or bad news. 4102 */ 4103 s.t = p->t; 4104 clock_combine(); 4105 switch (local_clock(p, s.offset)) { 4107 /* 4108 * The offset is too large and probably bogus. Complain to the 4109 * system log and order the operator to set the clock manually 4110 * within PANIC range. The reference implementation includes a 4111 * command line option to disable this check and to change the 4112 * panic threshold from the default 1000 s as required. 4113 */ 4114 case PANIC: 4115 exit (0); 4117 /* 4118 * The offset is more than the step threshold (0.125 s by 4119 * default). After a step, all associations now have 4120 * inconsistent time valurs, so they are reset and started 4121 * fresh. The step threshold can be changed in the reference 4122 * implementation in order to lessen the chance the clock might 4123 * be stepped backwards. However, there may be serious 4124 * consequences, as noted in the white papers at the NTP project 4125 * site. 4126 */ 4127 case STEP: 4128 while (/* all associations */ 0) 4129 clear(p, X_STEP); 4130 s.stratum = MAXSTRAT; 4131 s.poll = MINPOLL; 4132 break; 4134 /* 4135 * The offset was less than the step threshold, which is the 4136 * normal case. Update the system variables from the peer 4137 * variables. The lower clamp on the dispersion increase is to 4138 * avoid timing loops and clockhopping when highly precise 4139 * sources are in play. The clamp can be changed from the 4140 * default .01 s in the reference implementation. 4141 */ 4142 case SLEW: 4143 s.leap = p->leap; 4144 s.stratum = p->stratum + 1; 4145 s.refid = p->refid; 4146 s.reftime = p->reftime; 4147 s.rootdelay = p->rootdelay + p->delay; 4148 dtemp = SQRT(SQUARE(p->jitter) + SQUARE(s.jitter)); 4149 dtemp += max(p->disp + PHI * (c.t - p->t) + 4150 fabs(p->offset), MINDISP); 4151 s.rootdisp = p->rootdisp + dtemp; 4152 break; 4154 /* 4155 * Some samples are discarded while, for instance, a direct 4156 * frequency measurement is being made. 4157 */ 4158 case IGNORE: 4159 break; 4160 } 4161 } 4162 A.5.5.5. clock_combine() 4164 /* 4165 * clock_combine() - combine offsets 4166 */ 4167 void 4168 clock_combine() 4169 { 4170 struct p *p; /* peer structure pointer */ 4171 double x, y, z, w; 4172 int i; 4174 /* 4175 * Combine the offsets of the clustering algorithm survivors 4176 * using a weighted average with weight determined by the root 4177 * distance. Compute the selection jitter as the weighted RMS 4178 * difference between the first survivor and the remaining 4179 * survivors. In some cases the inherent clock jitter can be 4180 * reduced by not using this algorithm, especially when frequent 4181 * clockhopping is involved. The reference implementation can be 4182 * configured to avoid this algorithm by designating a preferred 4183 * peer. 4184 */ 4185 y = z = w = 0; 4186 for (i = 0; s.v[i].p != NULL; i++) { 4187 p = s.v[i].p; 4188 x = root_dist(p); 4189 y += 1 / x; 4190 z += p->offset / x; 4191 w += SQUARE(p->offset - s.v[0].p->offset) / x; 4192 } 4193 s.offset = z / y; 4194 s.jitter = SQRT(w / y); 4195 } 4197 A.5.5.6. local_clock() 4199 /* 4200 * Clock discipline parameters and constants 4201 */ 4202 #define STEPT .128 /* step threshold (s) */ 4203 #define WATCH 900 /* stepout threshold (s) */ 4204 #define PANICT 1000 /* panic threshold (s) */ 4205 #define PLL 65536 /* PLL loop gain */ 4206 #define FLL MAXPOLL + 1 /* FLL loop gain */ 4207 #define AVG 4 /* parameter averaging constant */ 4208 #define ALLAN 1500 /* compromise Allan intercept (s) */ 4209 #define LIMIT 30 /* poll-adjust threshold */ 4210 #define MAXFREQ 500e-6 /* frequency tolerance (500 PPM) */ 4211 #define PGATE 4 /* poll-adjust gate */ 4213 /* 4214 * local_clock() - discipline the local clock 4215 */ 4216 int /* return code */ 4217 local_clock( 4218 struct p *p, /* peer structure pointer */ 4219 double offset /* clock offset from combine() */ 4220 ) 4221 { 4222 int state; /* clock discipline state */ 4223 double freq; /* frequency */ 4224 double mu; /* interval since last update */ 4225 int rval; 4226 double etemp, dtemp; 4228 /* 4229 * If the offset is too large, give up and go home. 4230 */ 4231 if (fabs(offset) > PANICT) 4232 return (PANIC); 4234 /* 4235 * Clock state machine transition function. This is where the 4236 * action is and defines how the system reacts to large time 4237 * and frequency errors. There are two main regimes: when the 4238 * offset exceeds the step threshold and when it does not. 4239 */ 4240 rval = SLEW; 4241 mu = p->t - s.t; 4242 freq = 0; 4243 if (fabs(offset) > STEPT) { 4244 switch (c.state) { 4246 /* 4247 * In S_SYNC state we ignore the first outlyer amd 4248 * switch to S_SPIK state. 4249 */ 4250 case SYNC: 4251 state = SPIK; 4252 return (rval); 4254 /* 4255 * In S_FREQ state we ignore outlyers and inlyers. At 4256 * the first outlyer after the stepout threshold, 4257 * compute the apparent frequency correction and step 4258 * the time. 4259 */ 4260 case FREQ: 4261 if (mu < WATCH) 4262 return (IGNORE); 4264 freq = (offset - c.offset) / mu; 4265 /* fall through to S_SPIK */ 4267 /* 4268 * In S_SPIK state we ignore succeeding outlyers until 4269 * either an inlyer is found or the stepout threshold is 4270 * exceeded. 4271 */ 4272 case SPIK: 4273 if (mu < WATCH) 4274 return (IGNORE); 4276 /* fall through to default */ 4278 /* 4279 * We get here by default in S_NSET and S_FSET states 4280 * and from above in S_FREQ state. Step the time and 4281 * clamp down the poll interval. 4282 * 4283 * In S_NSET state an initial frequency correction is 4284 * not available, usually because the frequency file has 4285 * not yet been written. Since the time is outside the 4286 * capture range, the clock is stepped. The frequency 4287 * will be set directly following the stepout interval. 4288 * 4289 * In S_FSET state the initial frequency has been set 4290 * from the frequency file. Since the time is outside 4291 * the capture range, the clock is stepped immediately, 4292 * rather than after the stepout interval. Guys get 4293 * nervous if it takes 17 minutes to set the clock for 4294 * the first time. 4295 * 4296 * In S_SPIK state the stepout threshold has expired and 4297 * the phase is still above the step threshold. Note 4298 * that a single spike greater than the step threshold 4299 * is always suppressed, even at the longer poll 4300 * intervals. 4301 */ 4302 default: 4304 /* 4305 * This is the kernel set time function, usually 4306 * implemented by the Unix settimeofday() system 4307 * call. 4308 */ 4309 step_time(offset); 4310 c.count = 0; 4311 s.poll = MINPOLL; 4312 rval = STEP; 4313 if (state == NSET) { 4314 rstclock(FREQ, p->t, 0); 4315 return (rval); 4316 } 4317 break; 4318 } 4319 rstclock(SYNC, p->t, 0); 4320 } else { 4322 /* 4323 * Compute the clock jitter as the RMS of exponentially 4324 * weighted offset differences. This is used by the 4325 * poll-adjust code. 4326 */ 4327 etemp = SQUARE(c.jitter); 4328 dtemp = SQUARE(max(fabs(offset - c.last), 4329 LOG2D(s.precision))); 4330 c.jitter = SQRT(etemp + (dtemp - etemp) / AVG); 4331 switch (c.state) { 4333 /* 4334 * In S_NSET state this is the first update received and 4335 * the frequency has not been initialized. The first 4336 * thing to do is directly measure the oscillator 4337 * frequency. 4338 */ 4339 case NSET: 4340 rstclock(FREQ, p->t, offset); 4341 return (IGNORE); 4343 /* 4344 * In S_FSET state this is the first update and the 4345 * frequency has been initialized. Adjust the phase, but 4346 * don't adjust the frequency until the next update. 4347 */ 4348 case FSET: 4349 rstclock(SYNC, p->t, offset); 4350 break; 4352 /* 4353 * In S_FREQ state ignore updates until the stepout 4354 * threshold. After that, correct the phase and 4355 * frequency and switch to S_SYNC state. 4356 */ 4357 case FREQ: 4358 if (c.t - s.t < WATCH) 4359 return (IGNORE); 4361 freq = (offset - c.offset) / mu; 4362 break; 4364 /* 4365 * We get here by default in S_SYNC and S_SPIK states. 4366 * Here we compute the frequency update due to PLL and 4367 * FLL contributions. 4368 */ 4369 default: 4371 /* 4372 * The FLL and PLL frequency gain constants 4373 * depend on the poll interval and Allan 4374 * intercept. The FLL is not used below one-half 4375 * the Allan intercept. Above that the loop gain 4376 * increases in steps to 1 / AVG. 4377 */ 4378 if (LOG2D(s.poll) > ALLAN / 2) { 4379 etemp = FLL - s.poll; 4380 if (etemp < AVG) 4381 etemp = AVG; 4382 freq += (offset - c.offset) / (max(mu, 4383 ALLAN) * etemp); 4384 } 4386 /* 4387 * For the PLL the integration interval 4388 * (numerator) is the minimum of the update 4389 * interval and poll interval. This allows 4390 * oversampling, but not undersampling. 4391 */ 4392 etemp = min(mu, LOG2D(s.poll)); 4393 dtemp = 4 * PLL * LOG2D(s.poll); 4394 freq += offset * etemp / (dtemp * dtemp); 4395 rstclock(SYNC, p->t, offset); 4396 break; 4397 } 4398 } 4400 /* 4401 * Calculate the new frequency and frequency stability (wander). 4403 * Compute the clock wander as the RMS of exponentially weighted 4404 * frequency differences. This is not used directly, but can, 4405 * along with the jitter, be a highly useful monitoring and 4406 * debugging tool 4407 */ 4408 freq += c.freq; 4409 c.freq = max(min(MAXFREQ, freq), -MAXFREQ); 4410 etemp = SQUARE(c.wander); 4411 dtemp = SQUARE(freq); 4412 c.wander = SQRT(etemp + (dtemp - etemp) / AVG); 4414 /* 4415 * Here we adjust the poll interval by comparing the current 4416 * offset with the clock jitter. If the offset is less than the 4417 * clock jitter times a constant, then the averaging interval is 4418 * increased, otherwise it is decreased. A bit of hysteresis 4419 * helps calm the dance. Works best using burst mode. 4420 */ 4421 if (fabs(c.offset) < PGATE * c.jitter) { 4422 c.count += s.poll; 4423 if (c.count > LIMIT) { 4424 c.count = LIMIT; 4425 if (s.poll < MAXPOLL) { 4426 c.count = 0; 4427 s.poll++; 4428 } 4429 } 4430 } else { 4431 c.count -= s.poll << 1; 4432 if (c.count < -LIMIT) { 4433 c.count = -LIMIT; 4434 if (s.poll > MINPOLL) { 4435 c.count = 0; 4436 s.poll--; 4437 } 4438 } 4439 } 4440 return (rval); 4441 } 4442 A.5.5.7. rstclock() 4444 /* 4445 * rstclock() - clock state machine 4446 */ 4447 void 4448 rstclock( 4449 int state, /* new state */ 4450 double offset, /* new offset */ 4451 double t /* new update time */ 4452 ) 4453 { 4454 /* 4455 * Enter new state and set state variables. Note we use the time 4456 * of the last clock filter sample, which must be earlier than 4457 * the current time. 4458 */ 4459 c.state = state; 4460 c.last = c.offset = offset; 4461 s.t = t; 4462 } 4464 A.5.6. Clock Adjust Process 4466 A.5.6.1. clock_adjust() 4468 /* 4469 * clock_adjust() - runs at one-second intervals 4470 */ 4471 void 4472 clock_adjust() { 4473 double dtemp; 4475 /* 4476 * Update the process time c.t. Also increase the dispersion 4477 * since the last update. In contrast to NTPv3, NTPv4 does not 4478 * declare unsynchronized after one day, since the dispersion 4479 * threshold serves this function. When the dispersion exceeds 4480 * MAXDIST (1 s), the server is considered unfit for 4481 * synchronization. 4482 */ 4483 c.t++; 4484 s.rootdisp += PHI; 4486 /* 4487 * Implement the phase and frequency adjustments. The gain 4488 * factor (denominator) is not allowed to increase beyond the 4489 * Allan intercept. It doesn't make sense to average phase noise 4490 * beyond this point and it helps to damp residual offset at the 4491 * longer poll intervals. 4492 */ 4493 dtemp = c.offset / (PLL * min(LOG2D(s.poll), ALLAN)); 4494 c.offset -= dtemp; 4496 /* 4497 * This is the kernel adjust time function, usually implemented 4498 * by the Unix adjtime() system call. 4499 */ 4500 adjust_time(c.freq + dtemp); 4502 /* 4503 * Peer timer. Call the poll() routine when the poll timer 4504 * expires. 4505 */ 4506 while (/* all associations */ 0) { 4507 struct p *p; /* dummy peer structure pointer */ 4509 if (c.t >= p->nextdate) 4510 poll(p); 4511 } 4513 /* 4514 * Once per hour write the clock frequency to a file 4515 */ 4516 if (c.t % 3600 == 3599) 4517 /* write c.freq to file */ 0; 4518 } 4520 A.5.7. Poll Process 4522 /* 4523 * Poll process parameters and constants 4524 */ 4525 #define UNREACH 12 /* unreach counter threshold */ 4526 #define BCOUNT 8 /* packets in a burst */ 4527 #define BTIME 2 /* burst interval (s) */ 4529 A.5.7.1. poll() 4531 /* 4532 * poll() - determine when to send a packet for association p-> 4533 */ 4534 void 4535 poll( 4536 struct p *p /* peer structure pointer */ 4537 ) 4539 { 4540 int hpoll; 4541 int oreach; 4543 /* 4544 * This routine is called when the current time c.t catches up 4545 * to the next poll time p->nextdate. The value p->outdate is 4546 * the last time this routine was executed. The poll_update() 4547 * routine determines the next execution time p->nextdate. 4548 * 4549 * If broadcasting, just do it, but only if we are synchronized. 4550 */ 4551 hpoll = p->hpoll; 4552 if (p->hmode == M_BCST) { 4553 p->outdate = c.t; 4554 if (s.p != NULL) 4555 peer_xmit(p); 4556 poll_update(p, hpoll); 4557 return; 4558 } 4560 /* 4561 * If manycasting, start with ttl = 1. The ttl is increased by 4562 * one for each poll until MAXCLOCK servers have been found or 4563 * ttl reaches TTLMAX. If reaching MAXCLOCK, stop polling until 4564 * the number of servers falls below MINCLOCK, then start all 4565 * over. 4566 */ 4567 if (p->hmode == M_CLNT && p->flags & P_MANY) { 4568 p->outdate = c.t; 4569 if (p->unreach > BEACON) { 4570 p->unreach = 0; 4571 p->ttl = 1; 4572 peer_xmit(p); 4573 } else if (s.n < MINCLOCK) { 4574 if (p->ttl < TTLMAX) 4575 p->ttl++; 4576 peer_xmit(p); 4577 } 4578 p->unreach++; 4579 poll_update(p, hpoll); 4580 return; 4581 } 4582 if (p->burst == 0) { 4584 /* 4585 * We are not in a burst. Shift the reachability 4586 * register to the left. Hopefully, some time before the 4587 * next poll a packet will arrive and set the rightmost 4588 * bit. 4589 */ 4590 oreach = p->reach; 4591 p->outdate = c.t; 4592 p->reach << 1; 4593 if (!(p->reach & 0x7)) 4594 clock_filter(p, 0, 0, MAXDISP); 4595 if (!p->reach) { 4597 /* 4598 * The server is unreachable, so bump the 4599 * unreach counter. If the unreach threshold has 4600 * been reached, double the poll interval to 4601 * minimize wasted network traffic. Send a burst 4602 * only if enabled and the unreach threshold has 4603 * not been reached. 4604 */ 4605 if (p->flags & P_IBURST && p->unreach == 0) { 4606 p->burst = BCOUNT; 4607 } else if (p->unreach < UNREACH) 4608 p->unreach++; 4609 else 4610 hpoll++; 4611 p->unreach++; 4612 } else { 4614 /* 4615 * The server is reachable. Set the poll 4616 * interval to the system poll interval. Send a 4617 * burst only if enabled and the peer is fit. 4618 */ 4619 p->unreach = 0; 4620 hpoll = s.poll; 4621 if (p->flags & P_BURST && fit(p)) 4622 p->burst = BCOUNT; 4623 } 4624 } else { 4626 /* 4627 * If in a burst, count it down. When the reply comes 4628 * back the clock_filter() routine will call 4629 * clock_select() to process the results of the burst. 4630 */ 4631 p->burst--; 4632 } 4634 /* 4635 * Do not transmit if in broadcast client mode. 4636 */ 4637 if (p->hmode != M_BCLN) 4638 peer_xmit(p); 4639 poll_update(p, hpoll); 4640 } 4642 A.5.7.2. poll_update() 4643 /* 4644 * poll_update() - update the poll interval for association p 4645 * 4646 * Note: This routine is called by both the packet() and poll() routine. 4647 * Since the packet() routine is executed when a network packet arrives 4648 * and the poll() routine is executed as the result of timeout, a 4649 * potential race can occur, possibly causing an incorrect interval for 4650 * the next poll. This is considered so unlikely as to be negligible. 4651 */ 4652 void 4653 poll_update( 4654 struct p *p, /* peer structure pointer */ 4655 int poll /* poll interval (log2 s) */ 4656 ) 4657 { 4658 /* 4659 * This routine is called by both the poll() and packet() 4660 * routines to determine the next poll time. If within a burst 4661 * the poll interval is two seconds. Otherwise, it is the 4662 * minimum of the host poll interval and peer poll interval, but 4663 * not greater than MAXPOLL and not less than MINPOLL. The 4664 * design insures that a longer interval can be preempted by a 4665 * shorter one if required for rapid response. 4666 */ 4667 p->hpoll = max(min(MAXPOLL, poll), MINPOLL); 4668 if (p->burst > 0) { 4669 if (p->nextdate != c.t) 4670 return; 4671 else 4672 p->nextdate += BTIME; 4673 } else { 4675 /* 4676 * While not shown here, the reference implementation 4677 * randomizes the poll interval by a small factor. 4678 */ 4679 p->nextdate = p->outdate + (1 << max(min(p->ppoll, 4680 p->hpoll), MINPOLL)); 4681 } 4683 /* 4684 * It might happen that the due time has already passed. If so, 4685 * make it one second in the future. 4686 */ 4687 if (p->nextdate <= c.t) 4688 p->nextdate = c.t + 1; 4689 } 4690 A.5.7.3. peer_xmit() 4691 /* 4692 * transmit() - transmit a packet for association p 4693 */ 4694 void 4695 peer_xmit( 4696 struct p *p /* peer structure pointer */ 4697 ) 4698 { 4699 struct x x; /* transmit packet */ 4701 /* 4702 * Initialize header and transmit timestamp 4703 */ 4704 x.srcaddr = p->dstaddr; 4705 x.dstaddr = p->srcaddr; 4706 x.leap = s.leap; 4707 x.version = p->version; 4708 x.mode = p->hmode; 4709 if (s.stratum == MAXSTRAT) 4710 x.stratum = 0; 4711 else 4712 x.stratum = s.stratum; 4713 x.poll = p->hpoll; 4714 x.precision = s.precision; 4715 x.rootdelay = D2FP(s.rootdelay); 4716 x.rootdisp = D2FP(s.rootdisp); 4717 x.refid = s.refid; 4718 x.reftime = s.reftime; 4719 x.org = p->org; 4720 x.rec = p->rec; 4721 x.xmt = get_time(); 4722 p->xmt = x.xmt; 4724 /* 4725 * If the key ID is nonzero, send a valid MAC using the key ID 4726 * of the association and the key in the local key cache. If 4727 * something breaks, like a missing trusted key, don't send the 4728 * packet; just reset the association and stop until the problem 4729 * is fixed. 4730 */ 4731 if (p->keyid) 4732 if (/* p->keyid invalid */ 0) { 4733 clear(p, X_NKEY); 4734 return; 4735 } 4736 x.mac = md5(p->keyid); 4737 xmit_packet(&x); 4738 } 4739 Authors' Addresses 4741 Jack Burbank (editor) 4742 The Johns Hopkins University Applied Physics Laboratory 4743 11100 Johns Hopkins Road 4744 Laurel, MD 20723-6099 4745 US 4747 Phone: +1 443 778 7127 4748 Email: jack.burbank@jhuapl.edu 4750 William Kasch (editor) 4751 The Johns Hopkins University Applied Physics Laboratory 4752 11100 Johns Hopkins Road 4753 Laurel, MD 20723-6099 4754 US 4756 Phone: +1 443 778 7463 4757 Email: william.kasch@jhuapl.edu 4759 Jim Martin (editor) 4760 Woven Systems 4761 2455 Augustine Drive, Suite 211 4762 Santa Clara, CA 95054 4763 US 4765 Phone: +1 408 654-8900 x131 4766 Email: jim@wovensystems.com 4768 Dr. David L. Mills 4769 University of Delaware 4770 Newark, DE 19716 4771 US 4773 Phone: +1 302 831 8247 4774 Email: mills@udel.edu 4776 Full Copyright Statement 4778 Copyright (C) The IETF Trust (2008). 4780 This document is subject to the rights, licenses and restrictions 4781 contained in BCP 78, and except as set forth therein, the authors 4782 retain all their rights. 4784 This document and the information contained herein are provided on an 4785 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 4786 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 4787 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 4788 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 4789 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 4790 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 4792 Intellectual Property 4794 The IETF takes no position regarding the validity or scope of any 4795 Intellectual Property Rights or other rights that might be claimed to 4796 pertain to the implementation or use of the technology described in 4797 this document or the extent to which any license under such rights 4798 might or might not be available; nor does it represent that it has 4799 made any independent effort to identify any such rights. Information 4800 on the procedures with respect to rights in RFC documents can be 4801 found in BCP 78 and BCP 79. 4803 Copies of IPR disclosures made to the IETF Secretariat and any 4804 assurances of licenses to be made available, or the result of an 4805 attempt made to obtain a general license or permission for the use of 4806 such proprietary rights by implementers or users of this 4807 specification can be obtained from the IETF on-line IPR repository at 4808 http://www.ietf.org/ipr. 4810 The IETF invites any interested party to bring to its attention any 4811 copyrights, patents or patent applications, or other proprietary 4812 rights that may cover technology that may be required to implement 4813 this standard. Please address the information to the IETF at 4814 ietf-ipr@ietf.org.