idnits 2.17.1 draft-sriram-replay-protection-design-discussion-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 18, 2016) is 2747 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-23) exists of draft-ietf-sidr-bgpsec-protocol-18 == Outdated reference: A later version (-06) exists of draft-ietf-sidr-bgpsec-rollover-05 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Secure Inter-Domain Routing K. Sriram 3 Internet-Draft D. Montgomery 4 Intended status: Informational US NIST 5 Expires: April 21, 2017 October 18, 2016 7 Design Discussion and Comparison of Protection Mechanisms for Replay 8 Attack and Withdrawal Suppression in BGPsec 9 draft-sriram-replay-protection-design-discussion-07 11 Abstract 13 In the context of BGPsec, a withdrawal suppression occurs when an 14 adversary AS suppresses a prefix withdrawal with the intension of 15 continuing to attract traffic for that prefix based on a previous 16 (signed and valid) BGPsec announcement that was earlier propagated. 17 Subsequently if the adversary AS had a BGPsec session reset with a 18 neighboring BGPsec speaker and when the session is restored, the AS 19 replays said previous BGPsec announcement (even though it was 20 withdrawn), then such a replay action is called a replay attack. The 21 BGPsec protocol should incorporate a method for protection from 22 Replay Attack and Withdrawal Suppression (RAWS), at least to control 23 the window of exposure. This informational document provides design 24 discussion and comparison of multiple alternative RAWS protection 25 mechanisms weighing their pros and cons. This is meant to be a 26 companion document to the standards track I-D.-ietf-sidr-bgpsec- 27 rollover that will specify a method to be used with BGPsec for RAWS 28 protection. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 21, 2017. 47 Copyright Notice 49 Copyright (c) 2016 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Description and Scenarios of Replay Attacks and Withdrawal 66 Suppression . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 3. Classification of Solutions . . . . . . . . . . . . . . . . . 4 68 4. Expiration Time Method . . . . . . . . . . . . . . . . . . . 5 69 5. Key Rollover Method . . . . . . . . . . . . . . . . . . . . . 6 70 5.1. Periodic Key Rollover Method . . . . . . . . . . . . . . 7 71 5.2. Event-driven Key Rollover Method . . . . . . . . . . . . 9 72 5.2.1. EKR-A: EKR where Update Expiry is Enforced by CRL . . 10 73 5.2.2. EKR-B: EKR where Update Expiry is Enforced by 74 NotAfter Time . . . . . . . . . . . . . . . . . . . . 11 75 5.2.3. EKR with Separate Key for Each Incoming-Outgoing 76 Peering-Pair . . . . . . . . . . . . . . . . . . . . 12 77 6. Summary of Pros and Cons . . . . . . . . . . . . . . . . . . 13 78 7. Summary and Conclusions . . . . . . . . . . . . . . . . . . . 15 79 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 80 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 81 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 82 11. Informative References . . . . . . . . . . . . . . . . . . . 16 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 85 1. Introduction 87 In BGP or BGPsec, prefix or route withdrawals happen, and a 88 withdrawal can be explicit (i.e. route simply withdrawn) or implicit 89 (i.e. a new route announcement replaces the previous). In the 90 context of BGPsec, a withdrawal suppression occurs when an adversary 91 AS suppresses a prefix withdrawal with the intension of continuing to 92 attract traffic for that prefix based on a previous (signed and 93 valid) BGPsec announcement that was earlier propagated. Subsequently 94 if the adversary AS has a BGPsec session reset with a neighboring 95 BGPsec speaker and when the session is restored, the AS replays said 96 previous BGPsec announcement (even though it was withdrawn), then 97 such a replay action is called a replay attack. The BGPsec protocol 98 [I-D.ietf-sidr-bgpsec-protocol] requires a method for protection from 99 Replay Attack and Withdrawal Suppression (RAWS), at least to control 100 the window of exposure (see Sections 4.3, 4.4 of [RFC7353]). 102 In this informational document, we provide design discussion and 103 comparison of various RAWS protection mechanisms that may be used in 104 conjunction with the BGPsec protocol. This is meant to be a 105 companion document to the standards track document 106 [I-D.ietf-sidr-bgpsec-rollover] that will specify a method to be used 107 with BGPsec for RAWS protection. Here we consider four alternative 108 mechanisms - one based on the explicit Expiration Time approach and 109 three variants based on the Key Rollover approach. We provide a 110 detailed comparison among these mechanisms, weighing their pros and 111 cons. This document is meant to help inform the decision process 112 leading to an exact description for the mechanism to be finalized and 113 formally specified in [I-D.ietf-sidr-bgpsec-rollover]. 115 2. Description and Scenarios of Replay Attacks and Withdrawal 116 Suppression 118 The following are examples of various forms of replay attack and 119 withdrawal suppression (RAWS): 121 Example 1: AS1 has AS2 and AS3 as eBGPsec peers. At time x, AS1 had 122 announced a prefix (P) to AS2 and AS3. At a later time (x+d), AS1 123 sends a Withdraw for prefix P to AS2. AS2 suppresses the Withdraw 124 (does not send to its peers any explicit or implicit Withdraw). AS2 125 continues to attract some of the data for prefix P by pretending to 126 still have a valid (signed) route for P. In effect, AS2 can conduct 127 a Denial of Service (DOS) attack on a server located at prefix P. 128 (See slide #15 in [RAWS-discussion] for an illustration.) 130 Example 2: AS1 has AS2 and AS3 as eBGPsec peers. AS2 and AS3 are 131 also eBGPsec peers. At time x, AS1 announced a prefix P to AS2 and 132 AS3. AS3 also propagates to AS2 its route (via AS1) for prefix P. 133 At a later time (x+d), AS1 discontinues its peering with AS2. AS2 134 should propagate an alternate longer path via AS3 for prefix P and 135 thus implicitly withdraw the route via AS1. However, AS2 suppresses 136 it. AS2 can thus make some traffic destined for prefix P to flow via 137 itself. This enables AS2 to eavesdrop on the data but not cause a 138 DOS attack. AS2 may also choose to DoS attack hosts in prefix P. 139 (See slide #16 in [RAWS-discussion] for an illustration.) 141 Example 3: AS1 has AS2 and AS3 as eBGPsec peers. AS2 and AS3 are 142 also eBGPsec peers. At time x, AS1 announced a prefix P to AS2 143 without prepending (Update: AS1{pCount=1} P) but announced the same 144 prefix to AS3 with prepending (Update: AS1{pCount=2} P). Thus AS1 145 had preferred its ingress data traffic for prefix P to come in via 146 AS2. At a later time (x+d), AS1 switches ingress data path 147 preference to AS3 over AS2 - announcing prefix P to AS3 without 148 prepending (Update: AS1{pCount=1} P) and to AS2 with prepending 149 (Update: AS1{pCount=2} P). AS2 suppresses the new prepended path 150 announcement (does not send to its peers any new update about P). 151 Thus AS2 continues to attract more of AS1's ingress data traffic and 152 generates more revenue for itself at the expense of AS1. (See slide 153 #17 in [RAWS-discussion] for an illustration.) 155 As illustrated above, the mechanisms and motivations for RAWS may 156 differ. 158 In the context of the examples mentioned above, a requirement for 159 RAWS protection can be stated as follows. An update that AS1 sends 160 to AS2 at time x should expire at time x+w. This capability would 161 allow other ASes to detect actions by AS2 to suppress the Withdraw or 162 replay the update from AS1 for prefix P after time x+w. This limits 163 the RAWS vulnerability window. (Note: If no peering or policy change 164 affecting prefix P occurs during the vulnerability window, then a 165 typical solution would include a method for extending the validity 166 period of the route(s) beyond x+w.) We will later discuss what a 167 reasonable window size, w, should be. 169 The obvious downside of any mechanism that support this capability is 170 that it will require AS1 to send a new update before time x+w, and 171 this update will need to propagate via all the paths that the 172 original update traversed. Thus more update traffic will result than 173 if the RAWS protection mechanism were not employed, and this traffic 174 will require cryptographic processing by all of the routers along the 175 paths. Thus the creation of a mechanism to counter RAWS attacks 176 potentially introduces a new opportunity for DoS attacks against 177 eBGPsec routers. 179 3. Classification of Solutions 181 Mechanisms for RAWS protection can be classified into two broad 182 categories as follows: 184 o Expiration Time (ET) Method: This method uses an explicit 185 Expiration Time field in the BGPsec update. (Note: Explicit 186 Expire Time field was included in an earlier version of the BGPsec 187 protocol specification [draft-ietf-sidr-bgpsec-protocol-01].) 189 o Key Rollover (KR) Method: In this method, the update expiration is 190 enforced by a key rollover. Router transitions to a new 191 certificate with a new pair of keys, and the previous router 192 certificate either expires or is revoked. 194 The Key Rollover method can be further characterized into the 195 following sub categories: 197 o Periodic Key Rollover (PKR): Key rollovers happen at periodic 198 intervals. 200 o Event-driven Key Rollover (EKR): Key rollovers happen only when 201 peering or policy change events occur. 203 * EKR-A: EKR where expiry of previous update is enforced by CRL. 205 * EKR-B: EKR where expiry of previous update is controlled by 206 NotAfter time (router certificate is not revoked at the time 207 when the event happens). 209 In Section 4, Section 5, and Section 6 we describe the various 210 methods listed above, and discuss their pros and cons. 212 4. Expiration Time Method 214 The details of the Expiration Time (ET) method are as follow: 216 o Explicit Expiration Time is used for origin's signature. 218 o Expiration Time field is required in the BGPsec update. 220 o Periodic re-origination (beaconing) of prefixes is performed by 221 origin ASes. The value in the ET field in the update is extended 222 at beaconing time, and thereby the update is refreshed. Every 223 prefix in the Internet is re-originated and propagates through the 224 Internet once every 'beacon' interval. 226 o These beacons are distributed actions by prefix owners and are 227 intended to be jittered in time to reduce burstiness. The beacon 228 interval can be different at each originating AS. 230 o Beacon interval granularity: TBD but preferably in fairly granular 231 units (days). It is important to limit the ability of each AS to 232 specify a short beacon interval, to prevent an AS from using this 233 mechanism to cause BGPsec to thrash. 235 Discussion of Pros and Cons: 237 Pro: This method is easy on transit routers. In the event of peering 238 or policy change, BGPsec with the ET method behaves the same way as 239 BGP-4 in terms of which prefix routes are propagated. That is, the 240 router re-evaluates best paths factoring in peering or policy 241 changes, and propagates only those prefix routes that have a change 242 in best path. In other words, there is no necessity for a transit 243 BGPsec router to re-propagate and refresh prefixes on all peering 244 links. This is because prefix updates are refreshed anyway once 245 every beacon interval by all prefix originators. There is low 246 steady-state traffic associated with beaconing (see Figure on slide 247 #8 in [RAWS-discussion]), but there are no huge bursts or spikes in 248 workload due to peering or policy change events at transit routers. 250 Con: Equipment vendor can potentially facilitate unnecessary frequent 251 beaconing if ISP urges and pays (dollar attack!). This possibility 252 is mitigated by having a well thought-out granularity for ET, for 253 example, setting the unit for advertising ET to one day (rather than 254 one minute). 256 Con: A change in on-the-wire BGPsec protocol would be needed in case 257 the unit of the ET field (granularity) needs to be changed. 259 5. Key Rollover Method 261 Key Rollover (KR) method has three variations as outlined in 262 Section 3. Those will be discussed later in this section. The 263 following features are common to all variants of the KR method: 265 o In the KR method, it is best if the BGPsec router has two pairs of 266 certificates as follows: A pair of origination certificates 267 (current and next) for signing prefixes being originated by the AS 268 of the router, and a pair of transit certificates (current and 269 next) for signing transit prefixes. 271 o Note: If a BGPsec router only originates prefixes (i.e. has no 272 transit prefixes), then it needs to maintain only a pair of 273 origination certificates and need not maintain the extra pair of 274 transit certificates. (This would be the case for the vast 275 majority of ASes, since most are stubs.) 277 o The three KR methods differ in how the rollover of certificates 278 (or keys) is done: 280 * Certificate rollovers are Periodic vs. Event-driven. 282 * In the Event-driven method, the expiry of old update is (A) 283 Enforced by CRL vs. (B) Controlled by NotAfter time. 285 * In (A), certificate's NotAfter field is set to a very large 286 value and CRL is issued to revoke the certificate when 287 necessary. In (B), NotAfter field set to a permissible 288 vulnerability window time, and CRL to revoke certificate is not 289 required. 291 Discussion of Pros and Cons (common to all Key Rollover methods): 293 Pro: The KR method functions by manipulating the RPKI objects 294 (certificates, keys, NotAfter field in certificate, etc.) to refresh 295 updates or to cause expiry of previously propagated updates. Unlike 296 the ET method, it does not rely on any explicit field in the update. 297 Hence, an advantage of the KR method over the ET method is that in 298 case any parameters need to change or if the method itself is 299 modified, then there is no impact on the BGPsec protocol on the wire. 301 Con: The KR method increases the number of objects in the RPKI 302 repository system, by requiring at least two certificates for every 303 transit AS. It also introduces additional churn in the global RPKI 304 as these certificates expire (or are revoked) and are replaced. 306 Con: There is also added update churn. The amount of update churn 307 varies depending on the type of KR method used (see Section 5.1 and 308 Section 5.2). 310 We will now describe and discuss in detail the variants of the KR 311 method. 313 5.1. Periodic Key Rollover Method 315 The details of the Periodic Key Rollover (PKR) method are as follow. 317 o Router's origination certificate's NotAfter time is used 318 effectively as expiration time for origin's signature. 320 o Each origination router re-originates (i.e. beacons) before 321 NotAfter time of the current origination certificate. Beaconing 322 is periodic re-origination of prefixes by origin ASes. 324 o At beaconing time, the next origination certificate becomes the 325 new current certificate, and the new update is signed with the 326 private key of this new current certificate and re-originated. 328 o A new 'next' origination certificate is created and propagated at 329 or before beaconing time. This can also be done with a good lead 330 time. In practice, multiple 'next' certificates for each router 331 could be propagated and kept in the in the RPKI repositories. 332 They must have contiguous or slightly overlapping validity 333 periods. 335 o Every prefix in the Internet is re-originated and propagates 336 through the Internet once every 'beacon' interval. 338 o The re-originations or beacons are distributed actions by prefix 339 owners and jittered in time by design to reduce burstiness. The 340 beacon interval can be different at different originating ASes. 342 o Beacon (or re-origination) interval granularity: TBD but 343 preferably in fairly granular units (days). 345 o Transit certificates can have large NotAfter time (e.g., whatever 346 duration is required normally for key maintenance). 348 o When a peering or policy change event occurs at a transit router, 349 the router does not perform any reactive key rollover. The router 350 re-evaluates best paths factoring in peering or policy changes, 351 and propagates only those prefix routes that have a change in best 352 path (similar to BGP-4). There is no necessity for the BGPsec 353 router to re-propagate and refresh prefixes on all peering links. 354 This is because prefix updates are refreshed anyway once every re- 355 origination (i.e. beaconing) interval by all prefix originators. 357 Discussion of Pros and Cons: 359 Several of the same pros/cons of the Expiration Time method also 360 apply here for the PKR method. 362 Pro: The main pro for the PKR method is the same as that for the 363 Expiration Time (ET) method. That is, being easy on transit routers 364 as discussed in Section 4. Just as in the ET method, there is low 365 steady-state traffic associated with periodic re-originations (i.e. 366 beaconing) (see Figure on slide #8 in [RAWS-discussion]), but there 367 are no huge bursts or spikes in workload due to peering or policy 368 change events at transit routers. (See comparisons with the EKR 369 methods in Section 5.2.) 371 Pro: The common pro discussed previously for all KR methods, namely, 372 not requiring change of protocol on the wire when a parameter change 373 occurs (e.g., change of beacon interval units) is naturally 374 applicable here. 376 Con: Churn in the RPKI is of concern. Every BGPsec router renews and 377 propagates its 'next' origination certificate once in every beacon 378 (i.e. re-origination) interval. 380 5.2. Event-driven Key Rollover Method 382 The common details of the Event-driven Key Rollover (EKR) methods are 383 as follow. 385 o Key rollover is reactive to events (not periodic). 387 o If a peering or policy change event involves only prefixes being 388 originated at the AS of the router, then the router rolls only the 389 origination key. 391 o If a peering change event involves transit prefixes at the AS of 392 the router, then the router rolls its transit key as well as the 393 origination key. Both keys are rolled because any peering 394 relationship change also requires refresh of prefixes originated 395 by the router. 397 o If a key rollover takes place, then a corresponding (origination 398 or transit) new 'next' certificate is propagated in RPKI. 400 Discussion of Pros and Cons: 402 Pro: As long as no triggering events occur, there is no added update 403 churn in BGPsec. 405 Con: Whenever the transit key is rolled, there is a storm of BGPsec 406 updates at routers in transit ASes. For example, consider BGPsec 407 capable transit AS5 that is connected to four BGPsec non-stub 408 customers (AS1, AS2, AS3, AS4). Assume each AS has a single BGPsec 409 router in it. AS1 through AS4 each receives almost full table 410 (approximately 600K signed prefix updates) from AS5. Assume also 411 that AS1 and its customers together originate 100 prefixes in total; 412 likewise for AS2, AS3 and AS4. Now consider that an event occurs 413 whereby the peering between AS1 and AS5 is discontinued. As a result 414 of this event, in the EKR method, the AS5 router signs and re- 415 propagates approximately 3x600K = 1.8 Million signed prefix updates 416 to AS2, AS3 and AS4 combined. In addition, it also sends 4x100 = 400 417 Withdraws, which are negligible. In comparison, in the PKR method, 418 reacting to the same event, the BGPsec router at AS5 sends only 4x100 419 = 400 Withdraws and signs/re-propagates ZERO prefix updates. (An 420 illustration can be found in slide #9 in [RAWS-discussion]. Also, 421 additional peering change scenarios and quantitative comparisons can 422 be found in slides #10 and #11 in [RAWS-discussion].) 424 It remains to be seen through measurement and modeling how the impact 425 of such large bursts of workload in the EKR method at the time of 426 event occurrence can be managed in route processors, e.g., by 427 jittering and throttling the workload. 429 5.2.1. EKR-A: EKR where Update Expiry is Enforced by CRL 431 EKR-A builds on the common principles as described for EKR above in 432 Section 5.2. The additional details of EKR-A operation are as 433 follow: 435 o NotAfter time of origination and transit certificates is set to a 436 large value (e.g., one year or whatever period needed for normal 437 key maintenance). 439 o Whenever key rollover (for origination or transit) occurs, then a 440 CRL is propagated for the certificate that was used until that 441 time. So the old update expires (due to invalid state) only when 442 the CRL propagates and reaches each relying router. 444 o This method relies on end-to-end CRL propagation through the RPKI 445 system to enforce expiry of a previous update whenever the need 446 arises. 448 o The CRL either propagates all the way to the relying router, or 449 the RPKI cache server of the router receives the CRL and then 450 sends a withdrawal of the {AS, SKI, Pub Key} tuple to the router. 451 Either way, the CRL must in effect propagate all the way to the 452 relying router. 454 o Thus the attack vulnerability window with the EKR-A method is 455 governed by the end-to-end CRL propagation time. 457 Discussion of Pros and Cons: 459 The following pro and con for the EKR-A method are in addition to the 460 common pros and cons listed above for the KR and EKR methods 461 (Section 5 and Section 5.2). 463 Pro: EKR-A has much less RPKI churn than PKR or EKR-B (see 464 Section 5.2.2). 466 Con: Router needs to receive a CRL or a withdraw of {AS, SKI, Pub 467 Key} tuple in order to know an update has expired. Hence, the RAWS 468 vulnerability window is determined by the CRL propagation time which 469 can vary widely from one relying router to another router that may be 470 in different regions. It is anticipated that this would be no worse 471 than 24 hours, but needs to be confirmed by measurements in an 472 operational or emulated RPKI systems [rpki-delay]. 474 5.2.2. EKR-B: EKR where Update Expiry is Enforced by NotAfter Time 476 EKR-B builds on the common principles as described for EKR above in 477 Section 5.2. The additional details of EKR-B operation are as 478 follow: 480 o NotAfter time of current origination and transit certificates is 481 set to a value determined by the desired vulnerability window 482 (~day). 484 o Update expiry is controlled by NotAfter time (router certificate 485 is not revoked at the time when the event happens). 487 o If no triggering event occurs to cause origination key rollover 488 within a pre-set time (NotAfter), then new origination (current 489 and next) certificates are issued only to extend the NotAfter time 490 but the corresponding key pairs and SKIs remain unchanged. 492 o Do likewise (i.e. similar to what the above bullet says) for the 493 transit (current and next) certificates and keys. 495 o A previous update automatically becomes invalid at the earliest 496 NotAfter time of the certificates used in the signatures unless 497 each of those certificates' NotAfter time has been extended. 499 o Changes in certificates to extend their NotAfter time need not 500 propagate end-to-end (all the way to the relying routers); they 501 may propagate only up to the RPKI cache server of the relying 502 router. RPKI cache server would send a withdraw for an {AS, SKI, 503 Pub Key} tuple to a relying router if the NotAfter time of the 504 certificate has passed. 506 o Changes in certificates to advance NotAfter time can be scheduled 507 and propagated (in RPKI) reasonably well in advance. 509 Discussion of Pros and Cons: 511 The following pro and con for EKR-B are in addition to the common 512 pros and cons listed above for the KR and EKR methods (Section 5 and 513 Section 5.2). 515 Pro: Update expiration is automatic in case the NotAfter time of any 516 of the certificates used to validate the update has not been 517 extended. So the RAWS vulnerability window is predictable and not 518 influenced by the RPKI end-to-end propagation time. 520 Pro: Routers do not get any RPKI updates from the RPKI cache server 521 when a certificate changes but the corresponding key pair and SKI 522 remain unchanged. Routers do not receive NotAfter time from their 523 RPKI cache server. There is no need for it. Instead, the RPKI cache 524 server keeps track of NotAfter time, and provides to routers only 525 valid {AS, SKI, Pub Key} tuples. This saves some RPKI state 526 maintenance workload at the routers. 528 Con: EKR-B has much more RPKI churn than EKR-A because both 529 origination and transit certificates need to be reissued periodically 530 to extend their validity time (even in the absence of any peering or 531 policy change events). 533 5.2.3. EKR with Separate Key for Each Incoming-Outgoing Peering-Pair 535 This is a place holder section where we mention another variant of 536 the EKR method. This idea has not been considered or vetted by the 537 SIDR WG yet. So we only mention it here briefly. 539 As noted earlier, the EKR methods considered so far generate a huge 540 spike in workload whenever the transit key rollover takes place. One 541 way to reduce that workload is to have a separate signing key for 542 each incoming-outgoing peering pair. For example, consider a BGPsec 543 router in AS4 that has peers in AS1, AS2, and AS3. The router will 544 hold six signing keys, one each corresponding to (AS1, AS2), (AS2, 545 AS1), (AS1, AS3), (AS3, AS1), (AS2, AS3), and (AS3, AS2) peering- 546 pairs. Note that the directionality of peering is included here and 547 is necessary. The key corresponding to (AS-i, AS-j) would only be 548 used to sign updates received from AS-i and being forwarded to AS-j. 549 In the general case, when the BGPsec router has n peers, the number 550 of transit keys will be n(n-1). Since there would be a Current and a 551 Next key (for rollover), the number of transit keys held in the 552 router for signing will be actually 2n(n-1). When a peering or 553 policy change occurs, the router would rollover only those specific 554 keys that correspond to the peering-pairs over which the prefix 555 updates are affected. In the above example, suppose a policy change 556 between AS4 and AS1 causes AS4 to prepend prefixes sent to AS1 557 (pCount changed from 1 to 2). Then AS4 would do key rollover only 558 for (AS2, AS1) and (AS3, AS1) peering-pairs, and not for any of the 559 others. This would substantially reduce the quantity of prefix 560 updates that are signed and re-propagated. In general, when peering 561 or policy changes occur, this method will reduce the number of prefix 562 updates to be re-propagated to exactly the same as that with normal 563 BGP. That means that this method would also be on par with the ET 564 and PKR methods in terms of update churn when a peering or policy 565 change takes place. The downside of this method is that the router 566 needs to maintain 2n(n-1) key pairs if it has n BGPsec peers. 568 Detailed discussion and comparison of this method with other methods 569 can be provided in a later version of this document if the idea picks 570 up interest in the WG. 572 6. Summary of Pros and Cons 574 Table 1 below summarizes the pros and cons for the various RAWS 575 protection methods. This summary follows from the discussion above 576 in Section 4 and Section 5. 578 +----------+---------------------------+----------------------------+ 579 | Method | Pros | Cons | 580 +----------+---------------------------+----------------------------+ 581 | Expirati | 1. The background load | 1. Prefix owner can abuse | 582 | on Time | due to beaconing is low | by beaconing too | 583 | (ET) | and not bursty. | frequently. | 584 | | --- | --- | 585 | | 2. Transit AS does NOT | 2. Any change to the units | 586 | | have a huge spike in | (granularity) of ET field | 587 | | workload even when a | entails a change to on- | 588 | | peering or policy change | the-wire BGPsec protocol. | 589 | | happens at that AS. | | 590 | | Beaconing facilitates | | 591 | | this. | | 592 | | --- | --- | 593 | | 3. Does not add to RPKI | | 594 | | churn. | | 595 | -------- | ------------------------- | -------------------------- | 596 | Periodic | 1. The background load | 1. Prefix owner can abuse | 597 | Key | due to beaconing is low | by beaconing (i.e. re- | 598 | Rollover | and not bursty. | originating) too | 599 | (PKR) | | frequently. | 600 | | --- | --- | 601 | | 2. Transit AS does NOT | 2. Adds to RPKI churn. A | 602 | | have a huge spike in | pair of certificates | 603 | | workload even when a | (current and next) for | 604 | | peering change happens at | each origination router | 605 | | that AS. Beaconing (i.e. | are rolled once every | 606 | | periodic re-origination) | beacon (i.e. re- | 607 | | facilitates this. | origination) interval. | 608 | | | Significantly more RPKI | 609 | | | churn than that with EKR-A | 610 | | | or EKR-B methods. | 611 | | --- | --- | 612 | | 3. If the periodic re- | | 613 | | origination (i.e. | | 614 | | beaconing) interval units | | 615 | | change, BGPsec protocol | | 616 | | on the wire remains | | 617 | | unaffected. | | 618 | | --- | --- | 619 | | 4. Changes in the method | | 620 | | (while still based on Key | | 621 | | Rollover) can be | | 622 | | accommodated without | | 623 | | requiring any change to | | 624 | | on-the-wire BGPsec | | 625 | | protocol. | | 626 | -------- | ------------------------- | -------------------------- | 627 | Event | 1. No update churn for | 1. Whenever the transit | 628 | driven | long periods when no | key is rolled (in response | 629 | Key | peering or policy changes | to a peering or policy | 630 | Rollover | occur. | change event), there is a | 631 | Type A | | storm of BGPsec updates, | 632 | (EKR-A) | | especially at routers in | 633 | | | large transit ASes. | 634 | | --- | --- | 635 | | 2. The added churn in | 2. The RAWS vulnerability | 636 | | RPKI is much lower than | window is dependent on | 637 | | that in the EKR-B method. | end-to-end CRL | 638 | | | propagation. It may vary | 639 | | | significantly from one | 640 | | | relying router to another | 641 | | | that may be in different | 642 | | | regions. | 643 | | --- | --- | 644 | | 3. Same as Pro #4 for the | | 645 | | PKR method. | | 646 | -------- | ------------------------- | -------------------------- | 647 | Event | 1. Same as Pro #1 for the | 1. Same as Con #1 for the | 648 | driven | EKR-A method. | EKR-A method. | 649 | Key | | | 650 | Rollover | | | 651 | Type B | | | 652 | (EKR-B) | | | 653 | | --- | --- | 654 | | 2. The RAWS vulnerability | 2. The added churn in RPKI | 655 | | window is enforced by | is much higher than that | 656 | | NotAfter time in | in the EKR-A method. | 657 | | certificates and is | | 658 | | therefore predictable. | | 659 | | --- | --- | 660 | | 3. Same as Pro #4 for the | | 661 | | PKR method. | | 662 +----------+---------------------------+----------------------------+ 663 Table 1: Table with Summary of Pros and Cons 665 7. Summary and Conclusions 667 We have attempted to provide insights into the operation of multiple 668 alternative methods for RAWS protection. It is hoped that the SIDR 669 WG will utilize the analysis presented here as input for deciding on 670 the choice of a mechanism for protection from RAWS. Once that 671 decision is made, the chosen mechanism would be included in the 672 standards track document [I-D.ietf-sidr-bgpsec-rollover]. 674 Some important considerations for the decision making can be possibly 675 listed as follow: 677 1. The Expiration Time (ET) method is best (on par with the PKR 678 method) in terms of preventing huge update workloads during 679 peering and policy change events at transit routers with several 680 peers. It has no added RPKI churn. But the ET method has the 681 disadvantage of requiring on-the-wire protocol change if some 682 parameters (e.g., the units of beacon interval) change. 684 2. The Periodic Key Rollover (PKR) method operates the same way as 685 the ET method for preventing huge update workloads during peering 686 and policy change events at transit routers with several peers. 687 It does not have the disadvantage of requiring on-the-wire 688 protocol change if some parameters (e.g., the units of beaconing/ 689 re-origination periodicity) change. But it has the downside of 690 added RPKI churn. 692 3. The Event-driven Key Roll (EKR-A and EKR-B) methods have 693 significantly less RPKI churn than the PKR method. They also 694 have no BGPsec update churn during long quiet periods when no 695 peering or policy change events occur. But they suffer the 696 drawback of creating huge update workloads during peering and 697 policy change events at transit routers with several peers. Can 698 this workload be jittered or flow controlled to spread it over 699 time without convergence delay concerns? May be - needs further 700 study. 702 4. The EKR-A method relies on end-to-end CRL propagation through the 703 RPKI system to enforce expiry of a previous update when needed. 704 By contrast, in the EKR-B method the update expiry is controlled 705 by NotAfter time of the certificates used in update signatures. 706 In EKR-B method, previous update automatically becomes invalid at 707 the earliest NotAfter time of the certificates used in the 708 signatures unless each of those certificates' NotAfter time has 709 been extended. Also, in EKR-B method, changes in certificates to 710 extend their NotAfter time need not propagate end-to-end (all the 711 way to the relying routers); they may propagate only up to the 712 RPKI cache server of the relying router (see Section 5.2.2). The 713 changes in certificates to advance NotAfter time can be scheduled 714 and propagated (in RPKI) reasonably well in advance. 716 5. Besides being out-of-band relative to the BGPsec protocol on the 717 wire, the other good thing about the Key Rollover method is that 718 once the basics of the mechanism are implemented, there may be 719 flexibility to implement PKR, EKR-A or EKR-B on top of it. It 720 may also be possible to switch from one method to another (within 721 this class) if necessary based on operational experience; this 722 transition would not require any change to on-the-wire BGPsec 723 protocol. 725 8. Acknowledgements 727 The authors would like to thank Steve Kent for extensive review and 728 many useful suggestions on an earlier version of this document. 729 Thanks are also due to Roque Gagliano and Brian Weis for helpful 730 discussions. Further, we are thankful to Oliver Borchert and Okhee 731 Kim for comments and suggestions. 733 9. IANA Considerations 735 This memo includes no request to IANA. 737 10. Security Considerations 739 This memo requires no security considerations of its own since it is 740 targeted to be an informational RFC in support of 741 [I-D.ietf-sidr-bgpsec-rollover] and [I-D.ietf-sidr-bgpsec-protocol] 742 . The reader is therefore directed to the security considerations 743 provided in those documents. 745 11. Informative References 747 [I-D.ietf-sidr-bgpsec-protocol] 748 Lepinski, M. and K. Sriram, "BGPsec Protocol 749 Specification", draft-ietf-sidr-bgpsec-protocol-18 (work 750 in progress), August 2016. 752 [I-D.ietf-sidr-bgpsec-rollover] 753 Gagliano, R., Patel, K., and B. Weis, "BGPsec Router 754 Certificate Rollover", draft-ietf-sidr-bgpsec-rollover-05 755 (work in progress), March 2016. 757 [RAWS-discussion] 758 Sriram, K. and D. Montgomery, "Discussion of Key Rollover 759 Mechanisms for Replay-Attack Protection", Presented 760 at IETF-85 SIDR WG Meeting, November 2012, 761 . 764 [RFC7353] Bellovin, S., Bush, R., and D. Ward, "Security 765 Requirements for BGP Path Validation", RFC 7353, 766 DOI 10.17487/RFC7353, August 2014, 767 . 769 [rpki-delay] 770 Kent, S. and K. Sriram, "RPKI rsync Download Delay 771 Modeling", Presented at IETF-86 SIDR WG Meeting, March 772 2013, . 775 Authors' Addresses 777 Kotikalapudi Sriram 778 US NIST 780 Email: ksriram@nist.gov 782 Doug Montgomery 783 US NIST 785 Email: dougm@nist.gov