idnits 2.17.1 draft-uttaro-idr-bgp-persistence-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 694 has weird spacing: '...lineaux cedex...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: For MPLS VPN services, the effectiveness of the traffic isolation between VPNs relies on the correctness of the MPLS labels between ingress and egress PEs. In particular, when an egress PE withdraws a label L1 allocated to a VPN1 route, this label MUST not be assigned to a VPN route of a different VPN until all ingress PEs stop using the old VPN1 route using L1. -- The document date (March 9, 2012) is 4424 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '12' on line 433 == Unused Reference: 'RFC1997' is defined on line 609, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 615, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 624, but no explicit reference was found in the text == Outdated reference: A later version (-08) exists of draft-ietf-idr-custom-decision-00 Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Uttaro 3 Internet-Draft AT&T 4 Intended status: Standards Track A. Simpson 5 Expires: September 10, 2012 Alcatel-Lucent 6 R. Shakir 7 C&W 8 C. Filsfils 9 P. Mohapatra 10 Cisco Systems 11 B. Decraene 12 France Telecom 13 J. Scudder 14 Y. Rekhter 15 Juniper Networks 16 March 9, 2012 18 BGP Persistence 19 draft-uttaro-idr-bgp-persistence-01 21 Abstract 23 For certain AFI/SAFI combinations it is desirable that a BGP speaker 24 be able to retain routing state learned over a session that has 25 terminated. By maintaining routing state forwarding may be 26 preserved. This technique works effectively as long as the AFI/SAFI 27 is primarily used to realize services that do not depend on 28 exchanging BGP routing state with peers or customers. There may be 29 exceptions based upon the amount and frequency of route exchange that 30 allow for this technique. Generally the BGP protocol tightly couples 31 the viability of a session and the routing state that is learned over 32 it. This is driven by the history of the protocol and it's 33 application in the internet space as a vehicle to exchange routing 34 state between administrative authorities. This document addresses 35 new services whose requirements for persistence diverge from the 36 Internet routing point of view. 38 Status of this Memo 40 This Internet-Draft is submitted in full conformance with the 41 provisions of BCP 78 and BCP 79. 43 Internet-Drafts are working documents of the Internet Engineering 44 Task Force (IETF). Note that other groups may also distribute 45 working documents as Internet-Drafts. The list of current Internet- 46 Drafts is at http://datatracker.ietf.org/drafts/current/. 48 Internet-Drafts are draft documents valid for a maximum of six months 49 and may be updated, replaced, or obsoleted by other documents at any 50 time. It is inappropriate to use Internet-Drafts as reference 51 material or to cite them other than as "work in progress." 53 This Internet-Draft will expire on September 10, 2012. 55 Copyright Notice 57 Copyright (c) 2012 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1. BGP Graceful Restart and BGP persistence targets 74 different use cases . . . . . . . . . . . . . . . . . . . 4 75 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 76 2. Communities . . . . . . . . . . . . . . . . . . . . . . . . . 6 77 2.1. DO_NOT_PERSIST . . . . . . . . . . . . . . . . . . . . . . 6 78 2.2. STALE . . . . . . . . . . . . . . . . . . . . . . . . . . 6 79 3. Configuration (Persistence Timer and DO_NOT_PERSIST 80 Community) . . . . . . . . . . . . . . . . . . . . . . . . . . 7 81 3.1. Settings for Different Applications . . . . . . . . . . . 7 82 4. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 83 4.1. BGP session failure . . . . . . . . . . . . . . . . . . . 8 84 4.1.1. Attaching the STALE Community Value and 85 Propagation of Paths . . . . . . . . . . . . . . . . . 8 86 4.1.2. Lower route preference . . . . . . . . . . . . . . . . 8 87 4.2. Forwarding . . . . . . . . . . . . . . . . . . . . . . . . 9 88 4.3. BGP session re-establishement . . . . . . . . . . . . . . 9 89 5. Deployment Considerations . . . . . . . . . . . . . . . . . . 10 90 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 11 91 6.1. Persistence in L2VPN (VPLS/VPWS) . . . . . . . . . . . . . 11 92 6.2. Persistence in L3VPN . . . . . . . . . . . . . . . . . . . 12 93 7. Interactions between GR and Persistence . . . . . . . . . . . 15 94 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 95 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 96 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 97 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 98 11.1. Normative References . . . . . . . . . . . . . . . . . . . 21 99 11.2. Informative References . . . . . . . . . . . . . . . . . . 21 100 Appendix A. Appendix A. Changes / Author Notes . . . . . . . . . 22 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 103 1. Introduction 105 In certain scenarios, a BGP speaker may maintain forwarding in spite 106 of BGP session termination. Currently all routing state learned 107 between two speakers is flushed upon either normal or abnormal 108 session termination. There are techniques that are useful for 109 maintaining routing when a session abnormally terminates i.e BGR 110 Graceful RestartR ( RFC 4724 ) or normal termination such as 111 increasing timers but they do not change the fundamental problem. 112 The technique of BGP persistence works effectively as long as the 113 expectation is that there is a decoupling of session viability and 114 the correct service delivery, and the delivery uses the routing state 115 learned over that session. This document proposes a modification to 116 BGP's behavior by enabling persistence of BGP learned routing state 117 in spite of normal or abnormal session termination. 119 1.1. BGP Graceful Restart and BGP persistence targets different use 120 cases 122 BGP Graceful Restart as defined in [RFC4724] solve the requirement of 123 a control plane restart. 125 As such the fundamental assumption is that the control plane is to go 126 back quickly (e.g. minutes) and that the failure does not need to be 127 advertised in the network thus avoiding churn. Hence there is an 128 opportunity to locally recover from a control plane only failure 129 without affecting the whole network. In the worst case where reality 130 turns to be different from the assumption and that this is not only a 131 control plane failure but also a the forwarding plane failure, the 132 traffic may be black hole but only during the relative short duration 133 of the initial assumption (e.g. minutes). In term of technical 134 specification, this translates into: a short timer, no change of 135 attributes of stale routes, need to exchange information with the BGP 136 peer (e.g. ability to preserve forwarding, forwarding preserved...) 138 BGP Persistence targets the different use case of a catastrophic 139 failure when the BGP control plane can remain down for a longer time 140 (e.g. hours). In such case, if alternate path are available, they 141 should be used as their are kept up to date. But if not alternate 142 path are available, it is felt to be better to use stale old routes 143 rather than no routes at all. In term of technical specification, 144 this translates into: a long timer, defined per AFI/SAFI, the need to 145 lower the preference of stale routes, no need to exchange information 146 with the BGP peer. Possibly the need to have different timers per 147 AFI/SAFI. 149 1.2. Requirements Language 151 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 152 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 153 document are to be interpreted as described in RFC 2119 [RFC2119]. 155 2. Communities 157 This memo defines two new communities that are used to identify the 158 capability of a path to persist and whether or not that path is live 159 or stale. 161 2.1. DO_NOT_PERSIST 163 This memo defines a new BGP community, DO_NOT_PERSIST, with value TBD 164 (to be assigned by IANA). Attaching of the DO_NOT_PERSIST community 165 SHOULD be controlled by configuration. The functionality SHOULD 166 default to being disabled. 168 2.2. STALE 170 This memo defines a new BGP community, STALE, with value TBD (to be 171 assigned by IANA). Attaching of the STALE community is limited to a 172 path that currently has not the DO_NOT_PERSIST community attached 174 3. Configuration (Persistence Timer and DO_NOT_PERSIST Community) 176 Persistence is configured on a per session and per AFI/SAFI basis. 177 Through the use of an inbound BGP policy selectively setting the 178 DO_NOT_PERSIST community, the persistence behavior can be set on a 179 per route basis. A speaker configures the ability to persist 180 independently of its peer. There is no negotiation between the 181 peers. A timer must be configured indicating the time to persist 182 stale state from a peer where the session is no longer viable. This 183 timer is designated as the persist-timer. A speaker may also attach 184 the DO_NOT_PERSIST community value indicating if a path to a route 185 should not persist. 187 3.1. Settings for Different Applications 189 The setting of the persist-timer should be based upon the field of 190 use. BGP is used in a many different applications that each bring a 191 unique requirement for retaining state. The following is not meant 192 as a comprehensive listing but to suggest timer settings for a subset 193 of AFI/SAFIs. 195 L2VPN This AFI/SAFI requires the exchange of routing state in order 196 to establish PWs to realize a VPLS VPN, or a VPWS PW. This AFI/ 197 SAFI does not require exchange of routing state with a customer 198 and there is no eBGP session established. The persist-timer 199 should be set to a large value on the order of days to infinity. 201 L3VPN This AFI/SAFI requires the exchange of routing state to create 202 a private VPN. This AFI/SAFI requires exchange of state with 203 customers via eBGP and is dynamic. The SP needs to consider the 204 possibility that stale state may not reflect the latest route 205 updates and therefore may be incorrect from the customer 206 perspective. The persist-timer should be set to a large value on 207 the order of hours to a few days. this is built upon the notion 208 some incorrectness is preferable to a large outage. 210 4. Operation 212 4.1. BGP session failure 214 Assuming a session failure has occurred, a BGP persistent router 215 SHOULD retain BGP routes unless they carry the DO_NOT_PERSIST 216 community and propagate paths to downstream speakers that indicate 217 that a given path is now stale. 219 There is no restriction on whether the session is internal or 220 external. 222 4.1.1. Attaching the STALE Community Value and Propagation of Paths 224 The following rules must be followed: 226 o Identify paths learned over a failed session that do not have the 227 DO_NO_PERSIST community value attached. 229 o For those paths, attach the STALE community value, lower their 230 preference and propagate the updated path to peers. 232 o For those paths learned over the failed session that have the 233 DO_NOT_PERSIST community attached follow BGP rules: remove the 234 routes from the RIB and generate withdrawals to all peers for 235 those paths. 237 4.1.2. Lower route preference 239 As the STALE routes are not dynamically updated anymore, it's 240 desirable that they be only used in last resort. Hence when 241 comparing paths for a prefix, a non STALE path should be preferred 242 over a STALE path. If all path are marked as STALE, it's desirable 243 to keep their relative (pre-STALE) priority. To achieve the above 244 goals, the below mechanism is proposed. 246 To lower the preference of the STALE routes within the Autonomous 247 System, the LOCAL_PREF of the routes marked as STALE SHOULD be 248 decreased by a configured value. If the result of the subtraction is 249 negative, the LOCAL_PREF SHOULD be set to 0. 251 Optionally, a configured BGP cost community may be attached. In this 252 case, as described in [I-D.ietf-idr-custom-decision] in order to 253 avoid potential forwarding loops, the operator needs to make sure 254 that all routers are compliant with [I-D.ietf-idr-custom-decision]. 255 In this case, it is also expected that the LOCAL_PREF would not be 256 decreased (i.e. the configured value would be 0). 258 To allow for a lower preference of STALE routes across Autonomous 259 System, ASBR in others AS which are configured with BGP Persistence, 260 MAY lower the preference of PATH received with the STALE community 261 over an eBGP session. Lowering the preference within their AS is 262 performed as described above in the iBGP case. Note that if the ASBR 263 is not persistent capable, this behavior can be implemented by the 264 operator by configuring a BGP policy. 266 4.2. Forwarding 268 As per BGP rules, the BGP MUST check that the BGP Next Hop is viable. 270 As during the persistence situation, the BGP session will be down, 271 the network operator SHOULD make sure that BGP has the ability to 272 check Next-Hop liveliness. For routes learnt over an iBGP session, 273 the IGP should be able to provide this. For routes learnt over an 274 eBGP session, the liveness of the Next Hop may be checked by using a 275 layer 1 (e.g. light), layer 2 (e.g. Ethernet OAM) or layer 3 (e.g. 276 BFD) mechanism. 278 When the forwarding plane is updated with a new next-hop, a make 279 before break strategy SHOULD be employed. Such routing change may 280 happen when the BGP session has failed and hence the nominal path has 281 been de-preferenced and an alternate path selected, or when the BGP 282 session is re-established and the nominal path is selected back. 284 4.3. BGP session re-establishement 286 When a failed persistent BGP session is re-established, the Receiving 287 Speaker MUST replace the stale routes by the routing updates received 288 from the peer. Once the End-of-RIB marker for an address family is 289 received from the peer, it MUST immediately remove any paths from the 290 peer that are still marked as stale for that address family. 292 If the End-of-RIB marker is not received before a configurable timer 293 expired, it MUST immediately remove any paths from the peer that are 294 still marked as stale. 296 5. Deployment Considerations 298 BGP Persistence as described in this document is useful within a 299 single autonomous system or across autonomous systems. 301 If [I-D.ietf-idr-custom-decision] is used to lower the preference of 302 the STALE paths, the operator needs to make sure that all routers are 303 compliant with [I-D.ietf-idr-custom-decision]. Otherwise, forwarding 304 loops, may form. 306 When a BGP session is persistent enabled, the network operator SHOULD 307 make sure that when the BGP session is down, BGP has a way to 308 evaluate that the BGP Next Hop is viable and reachable. For routes 309 learnt over an iBGP session, the IGP should be able to advertise the 310 reachability of the next-hop. For routes learnt over an eBGP 311 session, the liveness of the Next Hop need to be checked. For 312 example using a layer 1 (e.g. light), layer 2 (e.g. Ethernet OAM) or 313 layer 3 (e.g. BFD) mechanism. 315 6. Applications 317 This technique may be useful in a wide array of applications where 318 routing state is either fairly static or, the state is localized 319 within a routing context. Some applications that come immediately to 320 mind are L2 and L3 VPN. 322 6.1. Persistence in L2VPN (VPLS/VPWS) 324 VPLS/VPWS VPNs use BGP to exchange routing state between two PEs. 325 This exchange allows for the creation of a PW within a VPN context 326 between those PEs. By definition, L2VPN does not exchange any 327 routing state with customers via BGP. BGP persistence is very useful 328 here as the state is quite constant. The only time state is 329 exchanged is when a PW endpoint is provisioned, deleted or when a 330 speaker reboots. 332 Referring to Figure 1, PE1 and PE2 have advertised BGP routing state 333 in order to create PWs between PE1 and PE2. The RRs are only 334 responsible to reflect this state between the PEs. The use of a 335 unique RD makes every path unique from the RRs perspective. 337 Assume that the both RR experience catastrophic failure. 339 Case 1 - All BGP speakers are persistent capable. 341 The PWs created between PE1 and PE2 persist. Forwarding 342 uninterrupted. 344 Case 2 - PE1 and the RRs are persistent capable, PE2 is not. 346 In this case the path advertised from PE2 via the RRs is persistent 347 at PE1, the PW from PE1 to PE2 is not torn down. PE2 will remove the 348 path from PE1 and tear down the PW from PE2 to PE1. THe effect is 349 that MAC state learned at PE2 is valid as the PW is still valid. MAC 350 state learned at PE1 is removed as the PW is no longer valid. 351 Eventually MAC destinations recursed to the PW at PE1 destined for 352 PE2 over the valid PW will time out. 354 Assume that the RRs are valid but the iBGP sessions are torn down. 356 Case 3 - All BGP speakers are persistent capable. 358 The PWs created between PE1 and PE2 persist. Forwarding 359 uninterrupted. 361 VPNA VPNA 362 PW+++++++++++++++++++PW 364 CE1-------PE1--------RR1-------PE2------CE2 365 | | 366 | | 367 ----------RR2--------- 369 <--iBGP---><---iBGP--> 371 Figure 1 373 6.2. Persistence in L3VPN 375 --------RR1------- 376 / A C \ 377 CE1 ----- PE1 --Forwarding Path-- PE2 ---- CE2 378 \ B D / 379 ------- RR2 ------ 381 Figure 2 383 In the case of a Layer 3 VPN topology, during the failure of a route 384 reflector device at the current time, all routing information 385 propagated via BGP is purged from the routing database. In this 386 case, forwarding is interrupted within such a topology due to the 387 lack of signalling information, rather than an outage to the 388 forwarding path between the PE devices. With the addition of BGP 389 persistence, a complete service outage can be avoided. 391 The topology shown in Figure 2 is a simple L3VPN topology consisting 392 of two customer edge (CE) devices, along with two provider edge (PE), 393 and route reflector (RR) devices. In this case, where an RFC4364 VPN 394 topology is utilised a BGP session exists between PE1 to both RR1 and 395 RR2, and from PE2 to RR1 and RR2, in order to propagate the VPN 396 topology. 398 Case 1: No BGP speakers are persistence capable: 400 o In this scenario, during a simultaneous failure of RR1 and RR2 401 (which are extremely likely to share route reflector clients) both 402 PE1 and PE2 remove all routing information from the VPN from their 403 RIB, and hence a complete service outage is experienced. 405 o Where either sessions A and B, or C and D fail simultaneously, 406 routing information from either PE1 (in the case of A and B), or 407 PE2 (in the case of C and D) are withdrawn, and a partial service 408 topology exists. 410 o Both of the states described reflect a service outage where the 411 forwarding path between the PE devices is not interrupted. 413 Case 2: All BGP speakers are persistence capable: 415 o PE1 continues to forward utilising the label information received 416 from PE2 via the working forwarding path for the duration of the 417 persistence timer (and vice versa). 419 o This condition occurs regardless of the session(s) that fail. In 420 the worst case where sessions A, B, C and D fail simultaneously, 421 the network continues to operate in the state in which it was at 422 the time of the failure. 424 Case 3: PE1 and RR[12] are persistence capable - PE2 is not. 426 o During a failure of BGP session A or B, PE1 will continue to 427 forward utilising the routing information received from the RRs 428 for PE2 for the duration of the persistence timer. PE2 will 429 continue to forward utilising the routing information received 430 from the RRs, again for the duration of the persistence timer. 432 o In the case that either BGP session C or D fails, all routes will 433 be withdrawn by RR[12] towards PE1 since these routes are not 434 valid to be persisted by the RRs. The end result of this will be 435 that the routes advertised by CE2 into the VPN will be withdrawn. 437 o Where the worst case failure occurs (i.e. sessions A, B, C and D 438 fail) the routes advertised by CE1 into the VPN will be 439 persistently advertised by the RR devices, whereas those 440 advertised by CE2 will be withdrawn. Clearly in the example shown 441 in the figure this results in a service outage, but where multiple 442 PE devices exist within a topology, service is maintained for the 443 subset of CEs attached to PE devices supporting the persistence 444 capability. 446 Within the Layer 3 VPN deployment it should be noted that routing 447 information is less static than that of the many Layer 2 VPNs since 448 typically multiple routes exist within the topology rather than an 449 individual MAC address or egress interface per CE device on the PE 450 device. As such, the L3VPN operates with the routing databases in 451 the 'core' of the network reflecting those at the time of failure. 452 Should there be re-convergence for any path between the PE and CE 453 devices, this will result in invalid routing information, should the 454 egress PE device not hold alternate routing information for the 455 prefixes undergoing such re-convergence. It is expected that where 456 each PE maintains multiple paths to each egress prefix (where an 457 alternate path is available), it is expected that the egress PE will 458 forward packets towards an alternative egress PE for the prefix in 459 question where the topology is no longer valid. 461 The lack of convergence within a Layer 3 topology during the 462 persistent state SHOULD be considered since it may adversely affect 463 services, however, an assumption is made that a degraded service is 464 preferable to a complete service outage during a large-scale BGP 465 control plane failure. 467 7. Interactions between GR and Persistence 469 BGP Graceful Restart and BGP Persitence can be enabled independantly. 471 o If only BGP Graceful Restart is enabled, BGP behaved as defined in 472 [RFC4724]. 474 o If only BGP Persistence is enabled, BGP behaved as defined in this 475 document. 477 o If both BGP Graceful Restart and BGP Persistence are enabled on a 478 BGP session, since both graceful-restart and persistence provide a 479 means by which routes are retained in the RIB after a BGP session 480 is no longer established, then there is a need to define their 481 interactions. The principle is that when the BGP session is down, 482 Graceful Restart is the first to come into play. While BGP 483 Graceful runs and keep the route, BGP Persistence has no effect. 484 i.e. BGP routes are kept unchanged and not readvertised. If BGP 485 Graceful Restart fails, then BGP Persistence kicks in to keep the 486 route. i.e. BGP Routes are kept, de-preferenced and re- 487 advertised. 489 Case a: GR succeed and Persistence never kicks in: 491 1. BGP session failure --> GR behavior applies. 493 * Route marked as stale. 495 * Route are kept unchanged (hence not re-advertised). 497 2. BGP session is re-established before GR timer expires --> GR 498 succeed, GR behavior applies 500 1. Route are refreshed. 502 2. When End-of-RIB is received, route still marked as stale are 503 removed. 505 3. If routes have changed, routes are updated in the FIB and re- 506 advertised to peer as per regular BGP. 508 Case b: GR fails and Persistence kicks in: 510 1. BGP session failure --> GR behavior applies 512 * Route marked as stale. 514 * Route are kept unchanged (hence not re-advertised). 516 2. Expiry of GR restart-time-expiry timer --> GR behavior ends, 517 Persistent behavior applies. 519 1. GR stale routes are marked as Persistence stale and their 520 preference is lowered. 522 2. As a result, regular BGP best path computation runs and 523 possibly select alternate routes. 525 + If routes have changed, routes are updated in the FIB. 527 + Updated routes are advertised to peer as needed. 529 3. Session now runs in persistence mode as defined in this document 531 It is expected that in general the Persistence timer SHOULD be set to 532 a value greater than that of the Graceful Restart. 534 8. Security Considerations 536 The security implications of the persistence mechanism defined within 537 in this document are akin to those incurred by the maintenance of 538 stale routing information within a network. This is particularly 539 relevant when considering the maintenance of routing information that 540 is utilised for service segregation - such as MPLS label entries. 542 For MPLS VPN services, the effectiveness of the traffic isolation 543 between VPNs relies on the correctness of the MPLS labels between 544 ingress and egress PEs. In particular, when an egress PE withdraws a 545 label L1 allocated to a VPN1 route, this label MUST not be assigned 546 to a VPN route of a different VPN until all ingress PEs stop using 547 the old VPN1 route using L1. 549 Such a corner case may happen today, if the propagation of VPN routes 550 by BGP messages between PEs takes more time than the label re- 551 allocation delay on a PE. Given that we can generally bound worst 552 case BGP propagation time to a few minutes (e.g. 2-5), the security 553 breach will not occur if PEs are designed to not reallocate a 554 previous used and withdrawn label before a few minutes. 556 The problem is made worse with BGP GR between PEs as VPN routes can 557 be stalled for a longer period of time (e.g. 20 minutes). 559 This is further aggravated by the BGP persistent extension proposed 560 in this document as VPN routes can be stalled for a much longer 561 period of time (e.g. 2 hours, 1 day). 563 Therefore, to avoid VPN breach, before enabling BGP persistence, SPs 564 needs to check how fast a given label can be reused by a PE, taking 565 into account: 567 o The load of the BGP route churn on a PE (in term of number of VPN 568 label advertised and churn rate). 570 o The label allocation policy on the PE (possibly depending upon the 571 size of pool of the VPN labels (which can be restricted by 572 hardware consideration or others MPLS usages), the label 573 allocation scheme (e.g. per route or per VRF/CE), the re- 574 allocation policy (e.g. least recently used label...) 576 Note that RFC 4781 [RFC4781] which defines Graceful Restart Mechanism 577 for BGP with MPLS is also applicable to BGP Persistence. 579 In addition to these considerations, the persistence mechanism 580 described within this document is considered to be complex to exploit 581 maliciously - in order to inject packets into a topology, there is a 582 requirement to engineer a specific persistence state between two PE 583 devices, whilst engineering label reallocation to occur in a manner 584 that results in the two topologies overlapping. Such allocation is 585 particularly difficult to engineer (since it is typically an internal 586 mechanism of an LSR). 588 9. IANA Considerations 590 IANA shall assigned community values from BGP well-known communities 591 registry[a] for the DO-NOT-PERSIST and STALE communities. 593 10. Acknowledgements 595 We would like to acknowledge Roberto Fragassi (Alcatel-Lucent), John 596 Medamana, (AT&T) Han Nguyen (AT&T), Jeffrey Haas (Juniper), Nabil 597 Bitar (Verizon), Nicolai Leymann (DT) for their contributions to this 598 document. 600 11. References 602 11.1. Normative References 604 [I-D.ietf-idr-custom-decision] 605 White, R. and A. Retana, "BGP Custom Decision Process", 606 draft-ietf-idr-custom-decision-00 (work in progress), 607 November 2011. 609 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 610 Communities Attribute", RFC 1997, August 1996. 612 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 613 Requirement Levels", BCP 14, RFC 2119, March 1997. 615 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 616 Protocol 4 (BGP-4)", RFC 4271, January 2006. 618 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 619 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 620 January 2007. 622 11.2. Informative References 624 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 625 Networks (VPNs)", RFC 4364, February 2006. 627 [RFC4781] Rekhter, Y. and R. Aggarwal, "Graceful Restart Mechanism 628 for BGP with MPLS", RFC 4781, January 2007. 630 Appendix A. Appendix A. Changes / Author Notes 632 [RFC Editor: Please remove this section before publication ] 634 Changes -01 636 o PERSIST community removed 638 o Use of local_pref or cost_community to lower the preference of the 639 path within an AS. Between AS, the STALE community is used to 640 convey the information. 642 o Deployment considerations section enhanced. 644 o Introduction explains why GR and persistence are different and 645 target different needs. 647 o Security section refer to RFC RFC 4781. 649 o New section describing interaction between GR and Persistence. 651 Authors' Addresses 653 James Uttaro 654 AT&T 655 200 S. Laurel Avenue 656 Middletown, NJ 07748 657 USA 659 Email: ju1738@att.com 661 Adam Simpson 662 Alcatel-Lucent 663 600 March Road 664 Ottawa, Ontario K2K 2E6 665 Canada 667 Email: adam.simpson@alcatel-lucent.com 669 Rob Shakir 670 Cable&Wireless Worldwide 671 London 672 UK 674 Email: rjs@cw.net 675 URI: http://www.cw.com/ 677 Clarence Filsfils 678 Cisco Systems 679 Brussels 1000 680 BE 682 Email: cf@cisco.com 684 Pradosh Mohapatra 685 Cisco Systems 686 170 W. Tasman Drive 687 San Jose, CA 95134 688 USA 690 Email: pmohapat@cisco.com 691 Bruno Decraene 692 France Telecom 693 38-40 Rue de General Leclerc 694 92794 Issy Moulineaux cedex 9 695 France 697 Email: bruno.decraene@orange.com 699 John Scudder 700 Juniper Networks 701 1194 N. Mathilda Ave 702 Sunnyvale, CA 94089 703 USA 705 Email: jgs@juniper.net 707 Yakov Rekhter 708 Juniper Networks 710 Email: yakov@juniper.net