idnits 2.17.1 

draft-sriram-replay-protection-design-discussion-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 23, 2013) is 3867 days in the past.  Is
     this intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Secure Inter-Domain Routing                                    K. Sriram
3	Internet-Draft                                             D. Montgomery
4	Intended status: Informational                                   US NIST
5	Expires: March 27, 2014                               September 23, 2013

7	Design Discussion and Comparison of Replay-Attack Protection Mechanisms
8	                               for BGPSEC
9	          draft-sriram-replay-protection-design-discussion-02

11	Abstract

13	   The BGPSEC protocol requires a method for protection from replay
14	   attacks, at least to control the window of exposure.  In the context
15	   of BGPSEC, a replay attack occurs when an adversary suppresses a
16	   prefix withdrawal (implicit or explicit) or replays a previously
17	   received BGPSEC announcement for a prefix that has since been
18	   withdrawn.  This informational document provides design discussion
19	   and comparison of multiple alternative replay-attack protection
20	   mechanisms weighing their pros and cons.  It is meant to be a
21	   companion document to the standards track I-D.-ietf-sidr-bgpsec-
22	   rollover that will specify a method to be used with BGPSEC for
23	   replay-attack protection.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on March 27, 2014.

42	Copyright Notice

44	   Copyright (c) 2013 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
60	   2.  Definition of Replay Attack . . . . . . . . . . . . . . . . .   3
61	   3.  Classification of Solutions . . . . . . . . . . . . . . . . .   4
62	   4.  Expire Time Method  . . . . . . . . . . . . . . . . . . . . .   4
63	   5.  Key Rollover Method . . . . . . . . . . . . . . . . . . . . .   5
64	     5.1.  Periodic Key Rollover Method  . . . . . . . . . . . . . .   6
65	     5.2.  Event-driven Key Rollover Method  . . . . . . . . . . . .   8
66	       5.2.1.  EKR-A: EKR where Update Expiry is Enforced by CRL . .   9
67	       5.2.2.  EKR-B: EKR where Update Expiry is Enforced by
68	               NotValidAfter Time  . . . . . . . . . . . . . . . . .  10
69	       5.2.3.  EKR with Separate Key for Each Incoming-Outgoing
70	               Peering-Pair  . . . . . . . . . . . . . . . . . . . .  11
71	   6.  Summary of Pros and Cons  . . . . . . . . . . . . . . . . . .  12
72	   7.  Summary and Conclusions . . . . . . . . . . . . . . . . . . .  14
73	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
74	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
75	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  15
76	   11. Informative References  . . . . . . . . . . . . . . . . . . .  15
77	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

79	1.  Introduction

81	   The BGPSEC protocol [bgpsec-protocol] requires a method for
82	   protection from replay attacks, at least to control the window of
83	   exposure [bgpsec-reqs].  In the context of BGPSEC, a replay attack
84	   occurs when an adversary suppresses a prefix withdrawal or replays a
85	   previously received BGPSEC announcement for a prefix that has since
86	   been withdrawn.

88	   In this informational document, we provide design discussion and
89	   comparison of various replay-attack protection mechanisms that may be
90	   used in conjunction with the BGPSEC protocol.  It is meant to be a
91	   companion document to the standards track document [bgpsec-rollover]
92	   that will specify a method to be used with BGPSEC for replay-attack
93	   protection.  Here we consider four alternative mechanisms - one based
94	   on the explicit Expire Time approach and three different variants
95	   based on the Key Rollover approach.  We provide a detailed comparison
96	   between these mechanisms weighing their pros and cons.  This document
97	   is meant to help inform the decision process leading to an exact
98	   description for the mechanism to be finalized and formally specified
99	   in [bgpsec-rollover].

101	2.  Definition of Replay Attack

103	   In the context of BGPSEC, a replay attack occurs when an adversary
104	   suppresses a prefix withdrawal (implicit or explicit).  A replay
105	   attack occurs also when the adversary replays a previously received
106	   BGPSEC announcement for a prefix that has since been withdrawn.  In
107	   the rest of this document, we will refer to either of these two
108	   situations as repay attack.  The following are examples of replay
109	   attacks:

111	   Example 1: AS1 has AS2 and AS3 as eBGPSEC peers.  At time x, AS1 had
112	   announced a prefix P to AS2 and AS3.  At a later time x+d, AS1 sends
113	   a Withdraw for prefix P to AS2.  AS2 suppresses the Withdraw (does
114	   not send to its peers any explicit or implicit Withdraw).  AS2
115	   continues to attract some of the data for prefix P towards itself by
116	   pretending to still have a signed and valid route for P. In effect,
117	   AS2 can conduct a DOS attack on a server located at AS1 at prefix P.
118	   (See slide #15 in [replay-discussion] for an illustration.)

120	   Example 2: AS1 has AS2 and AS3 as eBGPSEC peers.  AS2 and AS3 are
121	   also eBGPSEC peers.  At time x, AS1 had announced a prefix P to AS2
122	   and AS3.  AS3 also propagates to AS2 its route (via AS1) for prefix
123	   P. At a later time x+d, AS1 discontinues its peering with AS2.  AS2
124	   should propagate an alternate longer path via AS3 for prefix P and
125	   thus send an implicit Withdraw.  However, AS2 suppresses it.  AS2 can
126	   thus make a significant part of traffic destined for prefix P to flow
127	   via itself and eavesdrop on the data but not cause a DOS attack.
128	   (See slide #16 in [replay-discussion] for an illustration.)

130	   Example 3: AS1 has AS2 and AS3 as eBGPSEC peers.  AS2 and AS3 are
131	   also eBGPSEC peers.  At time x, AS1 had announced a prefix P to AS2
132	   without prepending (Update: AS1{pCount=1} P) but announced the same
133	   prefix to AS3 with prepending (Update: AS1{pCount=2} P).  Thus AS1
134	   had preferred its ingress data traffic for prefix P to come in via
135	   AS2.  At a later time x+d, AS1 switches ingress data path preference
136	   to AS3 over AS2 - announces prefix P without prepending (Update:
137	   AS1{pCount=1} P) to AS3 and with prepending (Update: AS1{pCount=2} P)
138	   to AS2.  AS2 suppresses the new prepended path announcement (does not
139	   send to its peers any new update about P).  Thus AS2 carries more of
140	   AS1's ingress data traffic and generates more revenue for itself at
141	   the expense of AS1.  (See slide #17 in [replay-discussion] for an
142	   illustration.)
143	   Thus the scenarios and motivations for replay attacks may differ as
144	   illustrated by the examples above.

146	   A requirement for replay-attack protection can be stated as follows.
147	   The update that AS1 sent to AS2 at time x should expire at time x+w.
148	   That means, AS2 can suppress the Withdraw or possibly replay the
149	   update from AS1 for prefix P until at most x+w. This limits the
150	   replay vulnerability window.  (Note: If no peering or policy change
151	   affecting prefix P occurs during the vulnerability window, then a
152	   typical solution would include a method for extending the validity
153	   period of the route(s) beyond x+w.)

155	3.  Classification of Solutions

157	   Mechanisms for replay-attack protection can be classified into two
158	   broad categories as follows:

160	   o  Expire Time (ET) Method: This method uses an explicit expire time
161	      field in the BGPSEC update.

163	   o  Key Rollover (KR) Method: In this method, the update expiry is
164	      enforced by a key rollover.  Router rolls over to a new signing
165	      cert with a new pair of keys, and the previous router cert either
166	      expires or is revoked.

168	   The Key Rollover method can be further characterized into the
169	   following sub categories:

171	   o  Periodic Key Rollover (PKR): Key rollovers happen at periodic
172	      intervals.

174	   o  Event-driven Key Rollover (EKR): Key rollovers happen only when
175	      peering or policy change events occur.

177	      *  EKR-A: EKR where expiry of previous update is enforced by CRL.

179	      *  EKR-B: EKR where expiry of previous update is controlled by
180	         NotValidAfter time.

182	   In Section 4, Section 5, and Section 6 we describe the various
183	   methods listed above, and discuss their pros and cons.

185	4.  Expire Time Method

187	   The details of the Expire Time (ET) method are as follow:

189	   o  Explicit Expire Time is used for origin's signature.

191	   o  Expire Time field is required in the BGPSEC update.

193	   o  Periodic re-origination (beaconing) of prefixes is performed by
194	      origin ASes.  The value in the ET field in the update is extended
195	      at beaconing time, and thereby the update is refreshed.  Every
196	      prefix in the Internet is re-originated and propagates through the
197	      Internet once every 'beacon' interval.

199	   o  These beacons are distributed actions by prefix owners and
200	      jittered in time by design to reduce burstiness.  The beacon
201	      interval can be different at different originating ASes.

203	   o  Beacon interval granularity: TBD but preferably in fairly granular
204	      units (days).

206	   Discussion of Pros and Cons:

208	   Pro: This method is easy on transit routers.  In the event of peering
209	   or policy change, BGPSEC with the ET method behaves the same way as
210	   BGP-4 in terms of which prefix routes are propagated.  That is, the
211	   router re-evaluates best paths factoring in peering or policy
212	   changes, and propagates only those prefix routes that have a change
213	   in best path.  In other words, there is no necessity for the BGPSEC
214	   router to re-propagate and refresh prefixes on all peering links.
215	   This is because prefix updates are refreshed anyway once every beacon
216	   interval by all prefix originators.  There is low steady-state
217	   traffic associated with beaconing (see Figure on slide #8 in
218	   [replay-discussion]), but there are no huge bursts or spikes in
219	   workload due to peering or policy change events at transit routers.

221	   Con: Equipment vendor can potentially facilitate unnecessary frequent
222	   beaconing if ISP urges and pays (dollar attack!).  This possibility
223	   is mitigated by having a well thought-out granularity for ET, for
224	   example, if the unit of ET is one day (rather than one minute).

226	   Con: A change in on-the-wire BGPSEC protocol would be needed in case
227	   the unit of the ET field (granularity) needs to be changed.

229	5.  Key Rollover Method

231	   Key Rollover (KR) method has three variations as outlined in
232	   Section 3.  Those will be discussed later in this section.  The
233	   following features are common to all variants of the KR method:

235	   o  In the KR method, it is best if the BGPSEC router has two pairs of
236	      certs as follows: A pair of origination certs (current and next)
237	      for signing prefixes being originated by the AS of the router, and
238	      a pair of transit certs (current and next) for signing transit
239	      prefixes.

241	   o  Note: If a BGPSEC router only originates prefixes (i.e., has no
242	      transit prefixes), then it needs to maintain only a pair of
243	      origination certs and need not maintain the extra pair of transit
244	      certs.

246	   o  The three KR methods differ in how the rollover of certs (or keys)
247	      is done:

249	      *  Cert rollovers are Periodic vs. Event-driven.

251	      *  In the Event-driven method, the expiry of old update is (A)
252	         Enforced by CRL vs. (B) Controlled by NotValidAfter time.

254	      *  In (A), cert's NotValidAfter field is set to a very large value
255	         and CRL is issued to revoke the cert when necessary.  In (B),
256	         NotValidAfter field set to a permissible vulnerability window
257	         time and CRL to revoke cert is not required.

259	   Discussion of Pros and Cons (common to all Key Rollover methods):

261	   Pro: The KR method functions by manipulating the RPKI objects (certs,
262	   keys, NotValidAfter field in cert, etc.) to refresh updates or to
263	   cause expiry of previously propagated updates.  Unlike the ET method,
264	   it does not rely on any explicit field in the update.  Hence, an
265	   advantage of the KR method over the ET method is that in case any
266	   parameters need to change or if the method itself is modified, then
267	   there is no impact on the BGPSEC protocol on the wire.

269	   Con: The KR method introduces additional churn in the global RPKI
270	   system.

272	   Con: There is also added update churn.  The amount of update churn
273	   varies depending on the type of KR method used (see Section 5.1 and
274	   Section 5.2).

276	   We will now describe and discuss in detail the variants of the KR
277	   method.

279	5.1.  Periodic Key Rollover Method

281	   The details of the Periodic Key Rollover (PKR) method are as follow.

283	   o  Router's origination cert's NotValidAfter time is used as the
284	      implicit expire time for origin's signature.

286	   o  Each origination router re-originates (i.e., beacons) before
287	      NotValidAfter time of the current cert.  Beaconing is periodic re-
288	      origination of prefixes by origin ASes.

290	   o  At beaconing time, next cert becomes the new current cert, and
291	      update is signed with the private key of this new current cert and
292	      re-originated.

294	   o  A new 'next' cert is created and propagated at beaconing time.
295	      This can also be done with a good lead time.  In practice,
296	      multiple 'next' certs can be kept in the pipeline.  They must have
297	      contiguous or slightly overlapping validity periods.

299	   o  Every prefix in the Internet is re-originated and propagates
300	      through the Internet once every 'beacon' interval.

302	   o  The re-originations or beacons are distributed actions by prefix
303	      owners and jittered in time by design to reduce burstiness.  The
304	      beacon interval can be different at different originating ASes.

306	   o  Beacon (or re-origination) interval granularity: TBD but
307	      preferably in fairly granular units (days).

309	   o  Transit certs can have very large NotValidAfter time (say ~years).

311	   o  When a peering or policy change event occurs at a transit router,
312	      the router (i.e. BGPSEC router with PKR) does not perform any key
313	      rollover.  The router re-evaluates best paths factoring in peering
314	      or policy changes, and propagates only those prefix routes that
315	      have a change in best path (similar to BGP-4).  There is no
316	      necessity for the BGPSEC router to re-propagate and refresh
317	      prefixes on all peering links.  This is because prefix updates are
318	      refreshed anyway once every re-origination (i.e. beaconing)
319	      interval by all prefix originators.

321	   Discussion of Pros and Cons:

323	   Several of the same pros/cons of the Expire Time method also apply
324	   here for the PKR method.

326	   Pro: The main pro for the PKR method is the same as that for the
327	   Expire Time (ET) method.  That is, being easy on transit routers as
328	   discussed in Section 4.  Just as in the ET method, there is low
329	   steady-state traffic associated with periodic re-originations (i.e.
330	   beaconing) (see Figure on slide #8 in [replay-discussion]), but there
331	   are no huge bursts or spikes in workload due to peering or policy
332	   change events at transit routers.  (See comparisons with the EKR
333	   methods in Section 5.2.)

335	   Pro: The pro discussed above for the KR method regarding parameter
336	   changes (e.g., beacon interval units) not requiring change of
337	   protocol on the wire is naturally applicable here.

339	   Con: Churn in the RPKI is of concern.  Every BGPSEC router rolls two
340	   origination certs (current and next) once in every beacon (i.e., re-
341	   origination) interval.

343	5.2.  Event-driven Key Rollover Method

345	   The common details of the Event-driven Key Rollover (EKR) methods are
346	   as follow.

348	   o  Key rollover is reactive to events (not periodic).

350	   o  If a peering or policy change event involves only prefixes being
351	      originated at the AS of the router, then the router rolls only the
352	      origination key.

354	   o  If a peering change event involves transit prefixes at the AS of
355	      the router, then the router rolls the transit key as well as the
356	      origination key.

358	   o  If a key rollover takes place, then a corresponding (origination
359	      or transit) new 'next' cert is propagated in RPKI.

361	   Discussion of Pros and Cons:

363	   Pro: As long as no triggering events occur, there is no added update
364	   churn in BGPSEC.

366	   Con: Whenever the transit key is rolled, there is a storm of BGPSEC
367	   updates at routers in transit ASes.  For example, consider BGPSEC
368	   capable transit AS5 that is connected to four BGPSEC non-stub
369	   customers (AS1, AS2, AS3, AS4).  Assume each AS has a single BGPSEC
370	   router in it.  AS1 through AS4 each receives almost full table (400K
371	   signed prefix updates) from AS5.  Assume also that AS1 and its
372	   customers together originate 100 prefixes in total; likewise for AS2,
373	   AS3 and AS4.  Now consider that an event occurs whereby the peering
374	   between AS1 and AS5 is discontinued.  As a result of this event, in
375	   the EKR method, the AS5 router signs and re-propagates approximately
376	   3x400K = 1.2 Million signed prefix updates to AS2, AS3 and AS4
377	   combined.  In addition, it also sends 4x100 = 400 Withdraws, which
378	   are negligible.  In comparison, in the PKR method, following the same
379	   event, the router at AS5 sends only 4x100 = 400 Withdraws and signs/
380	   re-propagates ZERO prefix updates.  (An illustration can be found in
381	   slide #9 in [replay-discussion].  Also, additional peering change
382	   scenarios and quantitative comparisons can be found in slides #10 and
383	   #11 in [replay-discussion].)

385	   It remains to be seen through measurement and modeling how the impact
386	   of such large bursts of workload in the ETR method at the time of
387	   event occurrence can be managed in route processors, e.g., by
388	   jittering and throttling the workload.

390	5.2.1.  EKR-A: EKR where Update Expiry is Enforced by CRL

392	   EKR-A builds on the common principles as described for EKR above in
393	   Section 5.2.  The additional details of EKR-A operation are as
394	   follow:

396	   o  NotValidAfter time of origination and transit certs is set to a
397	      large value (~year).

399	   o  Whenever key rollover (for origination or transit) occurs, then
400	      CRL is propagated for the old cert.  So the old update expires
401	      (due to invalid state) only when the CRL propagates and reaches
402	      the relying router.

404	   o  This method relies on end-to-end CRL propagation through the RPKI
405	      system to enforce expiry of a previous update whenever the need
406	      arises.

408	   o  The cert CRL either propagates all the way to the relying router,
409	      or the RPKI cache server of the router receives the CRL and then
410	      sends a withdrawal of the {AS, SKI, Pub Key} tuple to the router.
411	      Either way, the CRL must in effect propagate all the way to the
412	      relying router.

414	   o  Thus the attack vulnerability window with the EKR-A method is
415	      governed by the end-to-end CRL propagation time.

417	   Discussion of Pros and Cons:

419	   The following pro and con for the EKR-A method are in addition to the
420	   common pros and cons listed above for the KR and EKR methods
421	   (Section 5 and Section 5.2).

423	   Pro: EKR-A has much less RPKI churn than PKR or EKR-B (see
424	   Section 5.2.2).

426	   Con: Router needs to receive a CRL or a withdraw of {AS, SKI, Pub
427	   Key} tuple in order to know an update has expired.  Hence, the
428	   replay-attack vulnerability window is determined by the CRL
429	   propagation time which can vary widely from one relying router to
430	   another router that may be in different regions.  It is anticipated
431	   that this would be no worse than 24 hours, but needs to be confirmed
432	   by measurements in an operational or emulated RPKI systems
433	   [rpki-delay].

435	5.2.2.  EKR-B: EKR where Update Expiry is Enforced by NotValidAfter Time

437	   EKR-B builds on the common principles as described for EKR above in
438	   Section 5.2.  The additional details of EKR-B operation are as
439	   follow:

441	   o  NotValidAfter time of current origination and transit certs is set
442	      to a value determined by the desired vulnerability window (~day).

444	   o  Update expiry is controlled by NotValidAfter time and CRL is not
445	      sent for the old cert when key rollover happens.

447	   o  If no triggering event occurs to cause origination key rollover
448	      within a pre-set time (NotValidAfter), then new origination
449	      (current and next) certs are issued only to extend the
450	      NotValidAfter time but the corresponding key pairs and SKIs remain
451	      unchanged.

453	   o  A previous update automatically becomes invalid at the earliest
454	      NotValidAfter time of the certs used in the signatures unless each
455	      of those certs' NotValidAfter time has been extended.

457	   o  Likewise for the transit (current and next) certs and keys.

459	   o  Changes in certs to extend their NotValidAfter time need not
460	      propagate end-to-end (all the way to the relying routers); they
461	      may propagate only up to the RPKI cache server of the relying
462	      router.  RPKI cache server would send a withdraw for an {AS, SKI,
463	      Pub Key} tuple to a relying router if the NotValidAfter time of
464	      the cert has passed.

466	   o  The changes in certs to advance NotValidAfter time can be
467	      scheduled and propagated in RPKI well in advance.

469	   Discussion of Pros and Cons:

471	   The following pro and con for EKR-B are in addition to the common
472	   pros and cons listed above for the KR and EKR methods (Section 5 and
473	   Section 5.2).

475	   Pro: Update expiry is automatic in case the NotValidAfter time of any
476	   of the certs used to sign the update has not been extended.  So the
477	   replay-attack vulnerability window is predictable and not influenced
478	   by the RPKI end-to-end propagation time.

480	   Pro: Routers do not get any RPKI updates from the RPKI cache server
481	   when cert changes but the key pair and SKI remain unchanged.  Routers
482	   do not receive NotValidAfter time from their RPKI cache server.
483	   There is no need for it.  Instead, the RPKI cache server keeps track
484	   of NotValidAfter time, and provides to routers only valid {AS, SKI,
485	   Pub Key} tuples.  This saves some RPKI state maintenance workload at
486	   the routers.

488	   Con: EKR-B has much more RPKI churn than EKR-A because both
489	   origination and transit certs need to be reissued periodically to
490	   extend their validity time (in the absence of any events).

492	5.2.3.  EKR with Separate Key for Each Incoming-Outgoing Peering-Pair

494	   This is a place holder section where we mention another variant of
495	   the EKR method.  This idea has not been considered or whetted by the
496	   SIDR WG yet.  So we only mention it here briefly.

498	   As noted earlier, the EKR methods considered so far generate a huge
499	   spike in workload whenever the transit key rollover takes place at a
500	   router.  One way to reduce that workload is to have a separate
501	   signing key for each incoming-outgoing peering pair.  For example,
502	   consider a BGPSEC router in AS4 that has peers in AS1, AS2, and AS3.
503	   The router will hold six signing keys, one each corresponding to
504	   (AS1, AS2), (AS2, AS1), (AS1, AS3), (AS3, AS1), (AS2, AS3), and (AS3,
505	   AS2) peering-pairs.  Note that the directionality of peering is
506	   included here and is necessary.  They key corresponding to (AS-i,
507	   AS-j) would only be used to sign updates received from AS-i and being
508	   forwarded to AS-j.  In the general case, when the BGPSEC router has n
509	   peers, the number of transit keys will be n(n-1).  Since there would
510	   be a Current and a Next key (for rollover), the number of transit
511	   keys held in the router for signing will be actually 2n(n-1).  When a
512	   peering or policy change occurs, the router would rollover only those
513	   specific keys that correspond to the peering-pairs over which the
514	   prefix updates are affected.  In the above example, suppose a policy
515	   change between AS4 and AS1 causes AS4 to prepend prefixes sent to AS1
516	   (pCount changed from 1 to 2).  Then AS4 would do key rollover only
517	   for (AS2, AS1) and (AS3, AS1) peering-pairs, and not for any of the
518	   others.  This would substantially reduce the quantity of prefix
519	   updates that are signed and re-propagated.  In general, when peering
520	   or policy changes occur, this method will reduce the number of prefix
521	   updates to be re-propagated to exactly the same as that with normal
522	   BGP.  That means that this method would also be on par with the ET
523	   and PKR methods in terms of update churn when a peering or policy
524	   change takes place.  The downside of this method is that the router
525	   needs to maintain 2n(n-1) key pairs if it has n BGPSEC peers.

527	   Detailed discussion and comparison of this method with other methods
528	   can be provided in a later version of this document if the idea picks
529	   up interest in the WG.

531	6.  Summary of Pros and Cons

533	   Table 1 below summarizes the pros and cons for the various replay-
534	   attack protection methods.  This summary follows from the discussion
535	   above in Section 4 and Section 5.

537	   +----------+---------------------------+----------------------------+
538	   | Method   | Pros                      | Cons                       |
539	   +----------+---------------------------+----------------------------+
540	   | Expire   | 1. The background load    | 1. Prefix owner can abuse  |
541	   | Time     | due to beaconing is low   | by beaconing too           |
542	   | (ET)     | and not bursty.           | frequently.                |
543	   |          | ---                       | ---                        |
544	   |          | 2. Transit AS does NOT    | 2. Any change to the units |
545	   |          | have a huge spike in      | (granularity) of ET field  |
546	   |          | workload even when a      | entails a change to on-    |
547	   |          | peering or policy change  | the-wire BGPSEC protocol.  |
548	   |          | happens at that AS.       |                            |
549	   |          | Beaconing facilitates     |                            |
550	   |          | this.                     |                            |
551	   |          | ---                       | ---                        |
552	   |          | 3. Does not add to RPKI   |                            |
553	   |          | churn.                    |                            |
554	   | -------- | ------------------------- | -------------------------- |
555	   | Periodic | 1. The background load    | 1. Prefix owner can abuse  |
556	   | Key      | due to beaconing is low   | by beaconing (i.e. re-     |
557	   | Rollover | and not bursty.           | originating) too           |
558	   | (PKR)    |                           | frequently.                |
559	   |          | ---                       | ---                        |
560	   |          | 2. Transit AS does NOT    | 2. Adds to RPKI churn. A   |
561	   |          | have a huge spike in      | pair of certs (current and |
562	   |          | workload even when a      | next) for each origination |
563	   |          | peering change happens at | router are rolled once     |
564	   |          | that AS. Beaconing (i.e.  | every beacon (i.e. re-     |
565	   |          | periodic re-origination)  | origination) interval.     |
566	   |          | facilitates this.         | Significantly more RPKI    |
567	   |          |                           | churn than that with EKR-A |
568	   |          |                           | or EKR-B methods.          |
569	   |          | ---                       | ---                        |
570	   |          | 3. If the periodic re-    |                            |
571	   |          | origination (i.e.,        |                            |
572	   |          | beaconing) interval units |                            |
573	   |          | change, BGPSEC protocol   |                            |
574	   |          | on the wire remains       |                            |
575	   |          | unaffected.               |                            |
576	   |          | ---                       | ---                        |
577	   |          | 4. Changes in the method  |                            |
578	   |          | (while still based on Key |                            |
579	   |          | Rollover) can be          |                            |
580	   |          | accommodated without      |                            |
581	   |          | requiring any change to   |                            |
582	   |          | on-the-wire BGPSEC        |                            |
583	   |          | protocol.                 |                            |
584	   | -------- | ------------------------- | -------------------------- |
585	   | Event    | 1. No update churn for    | 1. Whenever the transit    |
586	   | driven   | long periods when no      | key is rolled (in response |
587	   | Key      | peering or policy changes | to a peering or policy     |
588	   | Rollover | occur.                    | change event), there is a  |
589	   | Type A   |                           | storm of BGPSEC updates,   |
590	   | (EKR-A)  |                           | especially at routers in   |
591	   |          |                           | large transit ASes.        |
592	   |          | ---                       | ---                        |
593	   |          | 2. The added churn in     | 2. The replay-attack       |
594	   |          | RPKI is much lower than   | vulnerability window is    |
595	   |          | that in the EKR-B method. | dependent on end-to-end    |
596	   |          |                           | CRL propagation. It may    |
597	   |          |                           | vary significantly from    |
598	   |          |                           | one relying router to      |
599	   |          |                           | another that may be in     |
600	   |          |                           | different regions.         |
601	   |          | ---                       | ---                        |
602	   |          | 3. Same as Pro #4 for the |                            |
603	   |          | PKR method.               |                            |
604	   | -------- | ------------------------- | -------------------------- |
605	   | Event    | 1. Same as Pro #1 for the | 1. Same as Con #1 for the  |
606	   | driven   | EKR-A method.             | EKR-A method.              |
607	   | Key      |                           |                            |
608	   | Rollover |                           |                            |
609	   | Type B   |                           |                            |
610	   | (EKR-B)  |                           |                            |
611	   |          | ---                       | ---                        |
612	   |          | 2. The replay-attack      | 2. The added churn in RPKI |
613	   |          | vulnerability window is   | is much higher than that   |
614	   |          | enforced by NotValidAfter | in the EKR-A method.       |
615	   |          | time in certs and is      |                            |
616	   |          | therefore predictable.    |                            |
617	   |          | ---                       | ---                        |
618	   |          | 3. Same as Pro #4 for the |                            |
619	   |          | PKR method.               |                            |
620	   +----------+---------------------------+----------------------------+

622	               Table 1: Table with Summary of Pros and Cons

624	7.  Summary and Conclusions

626	   We have attempted to provide insights into the operation of multiple
627	   alternative methods for replay-attack protection.  It is hoped that
628	   the SIDR WG will take the insights and trade-offs presented here as
629	   input for deciding on the choice of a mechanism for protection from
630	   replay attacks.  Once that decision is made, the chosen mechanism
631	   would be included in the standards track document [bgpsec-rollover].

633	   Some important considerations for the decision making can be possibly
634	   listed as follow:

636	   1.  The Expire Time (ET) method is best (on par with the PKR method)
637	       in terms of preventing huge update workloads during peering and
638	       policy change events at transit routers with several peers.  It
639	       has no added RPKI churn.  But the ET method has the disadvantage
640	       of requiring on-the-wire protocol change if some parameters
641	       (e.g., the units of beacon interval) change.

643	   2.  The Periodic Key Rollover (PKR) method operates the same way as
644	       the ET method for preventing huge update workloads during peering
645	       and policy change events at transit routers with several peers.
646	       It does not have the disadvantage of requiring on-the-wire
647	       protocol change if some parameters (e.g., the units of beaconing/
648	       re-origination periodicity) change.  But it has the downside of
649	       added RPKI churn.

651	   3.  The Event-driven Key Roll (EKR-A and EKR-B) methods have
652	       significantly less RPKI churn than the PKR method.  They also
653	       have no BGPSEC update churn during long quiet periods when no
654	       peering or policy change events occur.  But they suffer the
655	       drawback of creating huge update workloads during peering and
656	       policy change events at transit routers with several peers.  Can
657	       this workload be jittered or flow controlled to spread it over
658	       time without convergence delay concerns?  May be - needs further
659	       study.

661	   4.  The EKR-A method relies on end-to-end CRL propagation through the
662	       RPKI system to enforce expiry of a previous update when needed.
663	       By contrast, in the EKR-B method the update expiry is controlled
664	       by NotValidAfter time of the certs used in update signatures.  In
665	       EKR-B, previous update automatically becomes invalid at the
666	       earliest NotValidAfter time of the certs used in the signatures
667	       unless each of those certs' NotValidAfter time has been extended.
668	       In the latter method, changes in certs to extend their
669	       NotValidAfter time need not propagate end-to-end (all the way to
670	       the relying routers); they may propagate only up to the RPKI
671	       cache server of the relying router (see Section 5.2.2).  The
672	       changes in certs to advance NotValidAfter time can be scheduled
673	       and propagated in RPKI well in advance.

675	   5.  Besides being out-of-band relative to the BGPSEC protocol on the
676	       wire, the other good thing about the Key Rollover method is that
677	       once the basics of the mechanism are implemented, there may be
678	       flexibility to implement PKR, EKR-A or EKR-B on top of it.  It
679	       may also be possible to switch from one method to another (within
680	       this class) if necessary based on operational experience; this
681	       transition would not require any change to on-the-wire BGPSEC
682	       protocol.

684	8.  Acknowledgements

686	   The authors would like to thank Roque Gagliano, Brian Weis and Steve
687	   Kent for helpful discussions.  Further, we are thankful to fellow
688	   NIST BGP team members for comments and suggestions.

690	9.  IANA Considerations

692	   This memo includes no request to IANA.

694	10.  Security Considerations

696	   This memo requires no security considerations of its own since it is
697	   targeted to be an informational RFC in support of [bgpsec-rollover]
698	   and [bgpsec-protocol].  The reader is therefore directed to the
699	   security considerations provided in those documents.

701	11.  Informative References

703	   [bgpsec-protocol]
704	              Lepinski (Ed.), M., "BGPSEC Protocol Specification", Work
705	              in Progress, February 2013, <http://datatracker.ietf.org/
706	              doc/draft-ietf-sidr-bgpsec-protocol/>.

708	   [bgpsec-reqs]
709	              Belloven, S., Bush, R., and D. Ward, "Security
710	              Requirements for BGP Path Validation", Work in Progress,
711	              April 2013, <http://datatracker.ietf.org/doc/draft-ietf-
712	              sidr-bgpsec-reqs/>.

714	   [bgpsec-rollover]
715	              Gagliano, R., Patel, K., and B. Weis, "BGPSEC router key
716	              rollover as an alternative to beaconing", Work in
717	              Progress, April 2013, <http://datatracker.ietf.org/doc/
718	              draft-ietf-sidr-bgpsec-rollover/>.

720	   [replay-discussion]
721	              Sriram, K. and D. Montgomery, "Discussion of Key Rollover
722	              Mechanisms for Replay-Attack Protection", Presented at
723	              IETF-85 SIDR WG Meeting, November 2012, <http://
724	              www.ietf.org/proceedings/85/slides/slides-85-sidr-4.pdf>.

726	   [rpki-delay]
727	              Kent, S. and K. Sriram, "RPKI rsync Download Delay
728	              Modeling", Presented at IETF-86 SIDR WG Meeting, March
729	              2013, <http://www.ietf.org/proceedings/86/slides/
730	              slides-86-sidr-1.pdf>.

732	Authors' Addresses

734	   Kotikalapudi Sriram
735	   US NIST

737	   Email: ksriram@nist.gov

739	   Doug Montgomery
740	   US NIST

742	   Email: dougm@nist.gov