idnits 2.17.1 draft-ietf-ntp-chronos-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There are 2 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (September 3, 2020) is 1330 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2629' is defined on line 433, but no explicit reference was found in the text == Unused Reference: 'RFC3552' is defined on line 437, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 442, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group N. Rozen-Schiff 3 Internet-Draft D. Dolev 4 Intended status: Informational Hebrew University of Jerusalem 5 Expires: March 7, 2021 T. Mizrahi 6 Huawei Network.IO Innovation Lab 7 M. Schapira 8 Hebrew University of Jerusalem 9 September 3, 2020 11 A Secure Selection and Filtering Mechanism for the Network Time Protocol 12 Version 4 13 draft-ietf-ntp-chronos-01 15 Abstract 17 The Network Time Protocol version 4 (NTPv4), as defined in RFC 5905, 18 is the mechanism used by NTP clients to synchronize with NTP servers 19 across the Internet. This document specifies an extension to the 20 NTPv4 client, named Chronos, which is used as a "watchdog" alongside 21 NTPv4, and provides improved security against time shifting attacks. 22 Chronos involves changes to the NTP client's system process only and 23 is backwards compatible with NTPv4 servers. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on March 7, 2021. 42 Copyright Notice 44 Copyright (c) 2020 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (https://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Conventions Used in This Document . . . . . . . . . . . . . . 4 61 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.2. Terms and Abbreviations . . . . . . . . . . . . . . . . . 4 63 2.3. Notations . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3. Extension to the NTP System Process . . . . . . . . . . . . . 4 65 3.1. Chronos' System Process . . . . . . . . . . . . . . . . . 5 66 4. Chronos' Pseudocode . . . . . . . . . . . . . . . . . . . . . 6 67 5. Precision vs. Security . . . . . . . . . . . . . . . . . . . 7 68 6. Chronos' Threat Model and Security Guarantees . . . . . . . . 7 69 6.1. Security Analysis Overview . . . . . . . . . . . . . . . 8 70 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 71 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 72 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 73 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 74 9.2. Informative References . . . . . . . . . . . . . . . . . 9 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 77 1. Introduction 79 NTPv4, as defined in RFC 5905 [RFC5905], is vulnerable to time 80 shifting attacks, in which the attacker's goal is to shift the local 81 time at an NTP client. See [Chronos_paper] for details. Time 82 shifting attacks on NTP are possible even if all NTP communications 83 are encrypted and authenticated. This document introduces an 84 improved system process that incorporates an algorithm called 85 Chronos. Chronos is backwards compatible with NTPv4 and serves as an 86 NTPv4 client's "watchdog" for time shifting attacks. An NTP client 87 that runs Chronos is interoperable with [RFC5905]-compatible NTPv4 88 servers. 90 Chronos is a background mechanism that continuously maintains a 91 virtual "Chronos" clock update and compares it to NTPv4's clock 92 update. When the gap between the two updates exceeds a certain 93 threshold (specified in Section 6), this is interpreted as the client 94 experiencing a time shifting attack. In this case, Chronos is used 95 to update the client's clock, and NTPv4 is operated in the background 96 until the gap between NTPv4 and Chronos' updates are again below this 97 threshold, and hence NTPv4 is safe to use again. 99 Due to Choronos operating in the background, the client clock's 100 precision and accuracy are precisely as in NTPv4 while not 101 experiencing a time-shifting attack. When under attack, Chronos 102 prevents the clock from being shifted by the attacker, thus still 103 preserving high accuracy and precision (as discussed in Section 6). 105 Chronos achieves accurate synchronization even in the presence of 106 powerful attackers who are in direct control of a large number of NTP 107 servers: up to 1/3 of the servers in the pool (where the pool may 108 consist of hundreds or even thousands of servers). NTPv4 chooses a 109 small subset of the NTP server pool (e.g. 4 servers), and 110 periodically queries this subset of servers. Thus, even if only 1/3 111 of the servers in the pool are compromised, the small subset that is 112 used by NTPv4 may consist of a majority of faulty servers. 113 Conversely, Chronos constantly updates the set of servers it queries; 114 in each poll interval Chronos randomly chooses a different subset of 115 servers from the pool. Thus, even if an attack is not detected in a 116 given poll interval, Chronos is bound to detect the attack within a 117 relatively small number of poll intervals. 119 A Chronos client iteratively "crowdsources" time queries across NTP 120 servers and applies a provably secure algorithm for eliminating 121 "suspicious" responses and for averaging over the remaining 122 responses. Chronos is carefully engineered to minimize communication 123 overhead so as to avoid overloading NTP servers. Chronos' security 124 was evaluated both theoretically and experimentally with a prototype 125 implementation. These evaluation results indicate that in order to 126 successfully shift time at a Chronos client by over 100ms from the 127 UTC, even a powerful man-in-the-middle attacker requires over 20 128 years of effort in expectation. The full paper is available at 129 [Chronos_paper]. 131 Chronos introduces a watchdog mechanism that is added to the client's 132 system process and maintains a virtual clock value that is used as a 133 reference for detecting attacks. The virtual clock value computation 134 differs from the current NTPv4 in two key aspects. First, a Chronos 135 client relies on a large number of NTP servers, from which only few 136 servers to synchronize with are periodically chosen at random, in 137 order to avoid overloading the servers. Second, the selection 138 algorithm of the virtual clock uses an approximate agreement 139 technique to remove outliers, thus limiting the attacker's ability to 140 contaminate the "time samples" (offsets) derived from the queried NTP 141 servers. These two elements of Chronos' design provide provable 142 security guarantees against both man-in-the-middle attackers and 143 attackers capable of compromising a large number of NTP servers. 145 2. Conventions Used in This Document 147 2.1. Terminology 149 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 150 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 151 document are to be interpreted as described in RFC 2119 [RFC2119]. 153 2.2. Terms and Abbreviations 155 NTPv4 Network Time Protocol version 4 [RFC5905]. 157 Selection process Clock filter algorithm and system process 158 [RFC5905]. 160 2.3. Notations 162 Describing Chronos algorithm, the following notation are used. 164 +---------+---------------------------------------------------------+ 165 | Notaion | Meaning | 166 +---------+---------------------------------------------------------+ 167 | n | The number of candidate servers in the pool that | 168 | | Chronos can query (potentially hundreds) | 169 | m | The number of servers that NTPv4 queries in each poll | 170 | | interval (up to tens) | 171 | w | An upper bound on the distance of the local time from | 172 | | the UTC at any NTP server with an accurate clock | 173 | | (termed "truechimer" in [RFC5905]) | 174 | Cest | The client's estimation for the time that has passed | 175 | | since its last synchronization to the server pool (sec) | 176 | B | An upper bound on the client's time estimation error | 177 | | (ms/sec) | 178 | ERR | An upper bound on the client's error regarding his | 179 | | estimation of the time passed from the last update, | 180 | | equals to B*Cest (ms) | 181 | K | Panic trigger | 182 | tc | The current time [sec], as indicated by the virtual | 183 | | clock value that is computed by Chronos | 184 +---------+---------------------------------------------------------+ 186 Table 1: Chronos Notations 188 3. Extension to the NTP System Process 190 A client that runs Chronos as a watchdog, uses NTPv4 as in [RFC5905], 191 and in the background runs a modification to the elements of the 192 system process described in Section 11.2.1 and 11.2.2 in [RFC5905] 193 (namely, the Selection Algorithm and the Cluster Algorithm). The 194 NTPv4 conventional protocol periodically queries m servers in each 195 poll interval. In parallel the Chronos watchdog periodically queries 196 a (variable) set of m servers in each Chronos poll interval. 197 Specifically, in Chronos, after executing the clock filter algorithm 198 as defined in Section 10 in [RFC5905], the client discards outliers 199 by executing the procedure described in this section and the next. 200 Then, the NTPv4 Combine Algorithm is used for computing the system 201 peer offset, as specified in Section 11.2.3 in [RFC5905]. In each 202 poll interval the Chronos virtual clock value is compared with the 203 NTPv4 clock value, and if the difference exceeds a predetermined 204 value, an attack is detected. 206 3.1. Chronos' System Process 208 At the first time the Chronos system process is executed, calibration 209 is needed. The calibration process generates a local pool of servers 210 the client can synchronize with, consisting of n servers (up to 211 hundreds). To this end, the NTP client executes the peer process and 212 clock filter algorithm as in Sections 9,10 in [RFC5905] 213 (respectively), on an hourly basis, for 24 consecutive hours, and 214 generates the union of all received NTP servers' IP addresses. 215 Importantly, this process can also be executed in the background 216 periodically, once in a long time (e.g., every few weeks/months). 218 In each Chronos poll interval the Chronos system process randomly 219 chooses a set of m servers (where n with magnitude of hundreds and m 220 of tens) out of the local pool of n servers. Then, out of the time- 221 samples received from this chosen subset of servers, a third of the 222 samples with the lowest offset value and a third of the samples with 223 the highest offset value are discarded. 225 Chronos checks that the following two conditions hold for the 226 remaining samples: 228 o The maximal distance between every two time samples does not 229 exceed 2w. 231 o The average value of the remaining samples is at distance at most 232 ERR+2w from the client's local clock (as computed by Chronos). 234 (where w, ERR are as described in Table 1. Notice that ERR magnitude 235 is approximately LAMBDA as defined in [RFC5905]). 237 In the event that both of these conditions are satisfied, the average 238 of the remaining samples is the "final offset". Otherwise, a random 239 partial of the interval is chosen, after which Chronos a new subset 240 of servers is sampled, in the exact same manner. This way, Chronos 241 client queries are spread across the time interval better in case of 242 DoS atack on the NTP servers. This resampling process continues in 243 subsequent Chronos poll intervals until the two conditions are both 244 satisfied or the number of times the servers are re-sampled exceeds a 245 "Panic Trigger" (K in Table 1), in which case, Chronos enters a 246 "Panic Mode". Note that it is configurable whether the client allows 247 panic mode or not. 249 In panic mode, Chronos queries all the servers in the local server 250 pool, orders the collected time samples from lowest to highest and 251 eliminates the bottom third and the top third of the samples. The 252 client then averages over the remaining samples, and sets this 253 average to be the new "final offset". 255 As in [RFC5905], the final offset is passed on to the clock 256 discipline algorithm for the purpose of steering the Chronos virtual 257 clock to the correct time. The Chronos virtual clock is then 258 compared to the NTPv4 clock as part of the watchdog process. 260 According to empirical observations (presented in [Chronos_paper]), 261 setting w to be around 25 milliseconds provides both high time 262 accuracy and good security. Moreover, empirical analyses showed 263 that, on average, approximately 83% of the servers' clocks are at 264 most w-away from the UTC, and within 2w from each other, satisfying 265 the first condition of Chronos' system process. 267 4. Chronos' Pseudocode 269 The pseudocode for Chronos' Time Sampling Scheme, which is invoked in 270 each Chronos poll interval is as follows: 272 counter := 0 273 While counter < K do 274 S := sample(m) //gather samples from (tens of) randomly chosen servers 275 T := bi-side-trim(S,1/3) //trim the third lowest and highest values 276 if (max(T) -min(T) <= 2w) and (|avg(T)-tc| < ERR + 2w) Then 277 return avg(t) 278 end 279 counter ++ 280 sleep(rand(0,1)*poll interval) 281 end 282 // panic mode 283 S := sample(n) 284 T := bi-sided-trim(S,1/3) //trim bottom and top thirds; 285 return avg(T) 287 5. Precision vs. Security 289 Since NTPv4 updates the clock so long as time-shifting attacks are 290 not detected, the precision and accuracy of a Chronos client are the 291 same as NTPv4 when not under attack. When under attack, Chronos, 292 which changes the list of the sampled servers more frequently than 293 NTPv4 [Chronos_paper], and without using some of the filters in 294 NTPv4's system process, can potentially be less precise (though 295 provably more accurate and secure than NTPv4, which is vulnerable to 296 time-shifting attacks [RFC5905]). 298 However, our experimental and empirical analyses of Chronos revealed 299 that Chornos and NTPv4 exhibit the same level of precision and 300 accuracy when not under attack, with Chronos maintaining this level 301 even in the presence of time-shifting attacks. 303 6. Chronos' Threat Model and Security Guarantees 305 As explained above, Chronos repeatedly gathers time samples from 306 small subsets of a large local pool of NTP servers. The following 307 form of a man-in-the-middle (MitM) Byzantine attacker is considered: 308 the MitM attacker is assumed to control a subset of the servers in 309 the local pool of servers and is capable of determining precisely the 310 values of the time samples gathered by the Chronos client from these 311 NTP servers. The threat model thus encompasses a broad spectrum of 312 MitM attackers, ranging from fairly weak (yet dangerous) MitM 313 attackers only capable of delaying and dropping packets to extremely 314 powerful MitM attackers who are in control of (even authenticated) 315 NTP servers. MitM attackers captured by this framework might be, for 316 example, (1) in direct control of a fraction of the NTP servers 317 (e.g., by exploiting a software vulnerability), (2) an ISP (or other 318 Autonomous-System-level attacker) on the default BGP paths from the 319 NTP client to a fraction of the available servers, (3) a nation state 320 with authority over the owners of NTP servers in its jurisdiction, or 321 (4) an attacker capable of hijacking (e.g., through DNS cache 322 poisoning or BGP prefix hijacking) traffic to some of the available 323 NTP servers. The details of the specific attack scenario are 324 abstracted by reasoning about MitM attackers in terms of the fraction 325 of servers with respect to which the attacker has MitM capabilities. 327 Chronos detects time-shifting attacks by constantly monitoring 328 NTPv4's offset and the offset computed by Chronos, as explained 329 above, and checking whether it exceeds a certain threshold (10ms by 330 default). 332 Analytical results (in [Chronos_paper]) indicate that in order to 333 succeed in shifting time at a Chronos client by even a small amount 334 (e.g., 100ms), even a powerful man-in-the-middle attacker requires 335 many years of effort (e.g., over 20 years in expectation). See a 336 brief overview of Chronos' security analysis below. 338 Notably, Chronos provides protection from MitM attacks that cannot be 339 achieved by cryptographic authentication protocols since even with 340 such measures in place an attacker can still influence time by 341 dropping/delaying packets. However, adding an authentication and 342 crypto-based security layer to Chronos will enhance its security 343 guarantees and enable the detection of various spoofing and 344 modification attacks. 346 Chronos' security analysis is briefly described next. 348 6.1. Security Analysis Overview 350 Time-samples that are at most w away from the UTC are considered 351 "good", whereas other samples are considered "malicious". Two 352 scenarios are considered: 354 o Less than 2/3 of the queried servers are under the attacker's 355 control. 357 o The attacker controls more than 2/3 of the queried servers. 359 The first scenario, where there are more than 1/3 good samples, 360 consists of two sub-cases: (i) there is at least one good sample in 361 the set of samples not eliminated by Chronos (that is, in the middle 362 third of samples), and (ii) there are no good samples in the 363 remaining set of samples. In the first of these two cases (at least 364 one good sample in the set of samples was not eliminated by Chronos), 365 the other remaining samples, including those provided by the 366 attacker, must be close to a good sample (for otherwise, the first 367 condition of Chronos' system process in Section 3.1 is violated and a 368 new set of servers is chosen). This implies that the average of the 369 remaining samples must be close to the UTC. In the second case 370 (there are no good samples in the set of remaining samples), since 371 more than a third of the initial samples were good, both the 372 (discarded) third lowest-value samples and the (discarded) third 373 highest-value samples must each contain a good sample. Hence, all 374 the remaining samples are bounded from both above and below by good 375 samples, and so is their average value, implying that this value is 376 close to the UTC [RFC5905]. 378 In the second scenario, where the attacker controls more than 2/3 of 379 the queried servers, the worst possibility for the client is that all 380 remaining samples are malicious (i.e., more than w away from the 381 UTC). However, as proved in [Chronos_paper], the probability of this 382 scenario is extremely low even if the attacker controls a large 383 fraction (e.g., 1/4) of the servers in the local pool. The 384 probability that the attacker repeatedly succeeds in realising this 385 scenario decays exponentially, rendering the probability of a 386 significant time shift negligible. See [Chronos_paper] for details. 388 Beyond evaluating the probability of an attacker successfully 389 shifting time at the client's clock, we also evaluated the 390 probability that the attacker succeeds in launching a DoS attack on 391 the servers by causing many clients to enter panic mode (and so query 392 all the servers in their local pools). This probability too is 393 negligible even for an attacker in control of a large number of 394 servers in clients' local server pools. See [Chronos_paper]for 395 details. 397 Further details about Chronos's threat model and security guarantees 398 can be found in [Chronos_paper]. 400 7. Acknowledgements 402 The authors would like to thank Erik Kline, Miroslav Lichvar, Danny 403 Mayer, Karen O'Donoghue, Dieter Sibold, Yaakov. J. Stein, and 404 Harlan Stenn, for valuable contributions to this document and helpful 405 discussions and comments. 407 8. IANA Considerations 409 This memo includes no request to IANA. 411 9. References 413 9.1. Normative References 415 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 416 Requirement Levels", BCP 14, RFC 2119, 417 DOI 10.17487/RFC2119, March 1997, 418 . 420 [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, 421 "Network Time Protocol Version 4: Protocol and Algorithms 422 Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, 423 . 425 9.2. Informative References 427 [Chronos_paper] 428 Deutsch, O., Schiff, N., Dolev, D., and M. Schapira, 429 "Preventing (Network) Time Travel with Chronos", 2018, 430 . 433 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 434 DOI 10.17487/RFC2629, June 1999, 435 . 437 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 438 Text on Security Considerations", BCP 72, RFC 3552, 439 DOI 10.17487/RFC3552, July 2003, 440 . 442 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 443 IANA Considerations Section in RFCs", RFC 5226, 444 DOI 10.17487/RFC5226, May 2008, 445 . 447 [roughtime] 448 Patton, C., "Roughtime: Securing Time with Digital 449 Signatures", 2018, 450 . 452 Authors' Addresses 454 Neta Rozen-Schiff 455 Hebrew University of Jerusalem 456 Jerusalem 457 Israel 459 Phone: +972 2 549 4599 460 Email: neta.r.schiff@gmail.com 462 Danny Dolev 463 Hebrew University of Jerusalem 464 Jerusalem 465 Israel 467 Phone: +972 2 549 4588 468 Email: danny.dolev@mail.huji.ac.il 469 Tal Mizrahi 470 Huawei Network.IO Innovation Lab 471 Israel 473 Email: tal.mizrahi.phd@gmail.com 475 Michael Schapira 476 Hebrew University of Jerusalem 477 Jerusalem 478 Israel 480 Phone: +972 2 549 4570 481 Email: schapiram@huji.ac.il