idnits 2.17.1 draft-ietf-idr-restart-09.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 4) being 76 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 10 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (ref. 'BGP-4') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2858 (ref. 'BGP-MP') (Obsoleted by RFC 4760) -- Unexpected draft version: The latest known version of draft-ietf-idr-rfc2842bis is -01, but you're referring to -02. ** Obsolete normative reference: RFC 2385 (ref. 'BGP-AUTH') (Obsoleted by RFC 5925) -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-AFI' -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-SAFI' Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Srihari R. Sangli (Procket Networks) 2 Internet Draft Yakov Rekhter (Juniper Networks) 3 Expiration Date: October 2004 Rex Fernando (Procket Networks) 4 John G. Scudder (Cisco Systems) 5 Enke Chen (Redback Networks) 7 Graceful Restart Mechanism for BGP 9 draft-ietf-idr-restart-09.txt 11 1. Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as ``work in progress.'' 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 2. Abstract 34 This document proposes a mechanism for BGP that would help minimize 35 the negative effects on routing caused by BGP restart. An End-of-RIB 36 marker is specified and can be used to convey routing convergence 37 information. A new BGP capability, termed "Graceful Restart 38 Capability", is defined which would allow a BGP speaker to express 39 its ability to preserve forwarding state during BGP restart. Finally, 40 procedures are outlined for temporarily retaining routing information 41 across a TCP transport reset. 43 The mechanisms described in this document are applicable to all 44 routers, both those with the ability to preserve forwarding state 45 during BGP restart and those without (although the latter need to 46 implement only a subset of the mechanisms described in this 47 document). 49 3. Specification of Requirements 51 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 52 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 53 document are to be interpreted as described in RFC2119 [RFC2119]. 55 4. Introduction 57 Usually when BGP on a router restarts, all the BGP peers detect that 58 the session went down, and then came up. This "down/up" transition 59 results in a "routing flap" and causes BGP route re-computation, 60 generation of BGP routing updates and flap the forwarding tables. It 61 could spread across multiple routing domains. Such routing flaps may 62 create transient forwarding blackholes and/or transient forwarding 63 loops. They also consume resources on the control plane of the 64 routers affected by the flap. As such they are detrimental to the 65 overall network performance. 67 This document proposes a mechanism for BGP that would help minimize 68 the negative effects on routing caused by BGP restart. An End-of-RIB 69 marker is specified and can be used to convey routing convergence 70 information. A new BGP capability, termed "Graceful Restart 71 Capability", is defined which would allow a BGP speaker to express 72 its ability to preserve forwarding state during BGP restart. Finally, 73 procedures are outlined for temporarily retaining routing information 74 across a TCP transport reset. 76 5. Marker for End-of-RIB 78 An UPDATE message with no reachable NLRI and empty withdrawn NLRI is 79 specified as the End-Of-RIB Marker that can be used by a BGP speaker 80 to indicate to its peer the completion of the initial routing update 81 after the session is established. For IPv4 unicast address family, 82 the End-Of-RIB Marker is an UPDATE message with the minimum length 83 [BGP-4]. For any other address family, it is an UPDATE message that 84 contains only the MP_UNREACH_NLRI attribute [BGP-MP] with no 85 withdrawn routes for that . 87 Although the End-of-RIB Marker is specified for the purpose of BGP 88 graceful restart, it is noted that the generation of such a marker 89 upon completion of the initial update would be useful for routing 90 convergence in general, and thus the practice is recommended. 92 In addition, it would be beneficial for routing convergence if a BGP 93 speaker can indicate to its peer up-front that it will generate the 94 End-Of-RIB marker, regardless of its ability to preserve its 95 forwarding state during BGP restart. This can be accomplished using 96 the Graceful Restart Capability described in the next section. 98 6. Graceful Restart Capability 100 The Graceful Restart Capability is a new BGP capability [BGP-CAP] 101 that can be used by a BGP speaker to indicate its ability to preserve 102 its forwarding state during BGP restart. It can also be used to 103 convey to its peer its intention of generating the End-Of-RIB marker 104 upon the completion of its initial routing updates. 106 This capability is defined as follows: 108 Capability code: 64 110 Capability length: variable 112 Capability value: Consists of the "Restart Flags" field, "Restart 113 Time" field, and zero or more of the tuples as follows: 116 +--------------------------------------------------+ 117 | Restart Flags (4 bits) | 118 +--------------------------------------------------+ 119 | Restart Time in seconds (12 bits) | 120 +--------------------------------------------------+ 121 | Address Family Identifier (16 bits) | 122 +--------------------------------------------------+ 123 | Subsequent Address Family Identifier (8 bits) | 124 +--------------------------------------------------+ 125 | Flags for Address Family (8 bits) | 126 +--------------------------------------------------+ 127 | ... | 128 +--------------------------------------------------+ 129 | Address Family Identifier (16 bits) | 130 +--------------------------------------------------+ 131 | Subsequent Address Family Identifier (8 bits) | 132 +--------------------------------------------------+ 133 | Flags for Address Family (8 bits) | 134 +--------------------------------------------------+ 136 The use and meaning of the fields are as follows: 138 Restart Flags: 140 This field contains bit flags related to restart. 142 0 1 2 3 143 +-+-+-+-+ 144 |R|Resv.| 145 +-+-+-+-+ 147 The most significant bit is defined as the Restart State (R) 148 bit which can be used to avoid possible deadlock caused by 149 waiting for the End-of-RIB marker when multiple BGP speakers 150 peering with each other restart. When set (value 1), this bit 151 indicates that the BGP speaker has restarted, and its peer 152 SHOULD NOT wait for the End-of-RIB marker from the speaker 153 before advertising routing information to the speaker. 155 The remaining bits are reserved, and SHOULD be set to zero by 156 the sender and ignored by the receiver. 158 Restart Time: 160 This is the estimated time (in seconds) it will take for the 161 BGP session to be re-established after a restart. This can be 162 used to speed up routing convergence by its peer in case that 163 the BGP speaker does not come back after a restart. 165 Address Family Identifier (AFI): 167 This field carries the identity of the Network Layer protocol 168 for which the Graceful Restart support is advertised. Presently 169 defined values for this field are specified in [IANA-AFI]. 171 Subsequent Address Family Identifier (SAFI): 173 This field provides additional information about the type of 174 the Network Layer Reachability Information carried in the 175 attribute. Presently defined values for this field are 176 specified in [IANA-SAFI]. 178 Flags for Address Family: 180 This field contains bit flags for the . 182 0 1 2 3 4 5 6 7 183 +-+-+-+-+-+-+-+-+ 184 |F| Reserved | 185 +-+-+-+-+-+-+-+-+ 187 The most significant bit is defined as the Forwarding State (F) 188 bit which can be used to indicate if the forwarding state for 189 the has indeed been preserved during the previous 190 BGP restart. When set (value 1), the bit indicates that the 191 forwarding state has been preserved. 193 The remaining bits are reserved, and SHOULD be set to zero by 194 the sender and ignored by the receiver. 196 When a sender of this capability doesn't include any in 197 the capability, it means that the sender is not capable of preserving 198 its forwarding state during BGP restart, but supports procedures for 199 the Receiving Speaker (as defined in Section 6.2 of this document). 200 In that case the value of the "Restart Time" field advertised by the 201 sender is irrelevant. 203 A BGP speaker SHOULD NOT include more than one instance of the 204 Graceful Restart Capability in the capability advertisement [BGP- 205 CAP]. If more than one instance of the Graceful Restart Capability 206 is carried in the capability advertisement, the receiver of the 207 advertisement SHOULD ignore all but the last instance of the Graceful 208 Restart Capability. 210 Including into the Graceful Restart 211 Capability doesn't imply that the IPv4 unicast routing information 212 should be carried by using the BGP Multiprotocol extensions [BGP-MP] 213 - it could be carried in the NLRI field of the BGP UPDATE message. 215 7. Operation 217 A BGP speaker MAY advertise the Graceful Restart Capability for an 218 address family to its peer if it has the ability to preserve its 219 forwarding state for the address family when BGP restarts. In 220 addition, even if the speaker does not have the ability to preserve 221 its forwarding state for any address family during BGP restart, it is 222 still recommended that the speaker advertise the Graceful Restart 223 Capability to its peer (as mentioned before this is done by not 224 including any in the advertised capability). There are 225 two reasons for doing this. First, to indicate its intention of 226 generating the End-of-RIB marker upon the completion of its initial 227 routing updates, as doing this would be useful for routing 228 convergence in general. Second, to indicate its support for a peer 229 which wishes to perform a graceful restart. 231 The End-of-RIB marker SHOULD be sent by a BGP speaker to its peer 232 once it completes the initial routing update (including the case when 233 there is no update to send) for an address family after the BGP 234 session is established. 236 It is noted that the normal BGP procedures MUST be followed when the 237 TCP session terminates due to the sending or receiving of a BGP 238 NOTIFICATION message. 240 In general the Restart Time SHOULD NOT be greater than the HOLDTIME 241 carried in the OPEN. 243 In the following sections, "Restarting Speaker" refers to a router 244 whose BGP has restarted, and "Receiving Speaker" refers to a router 245 that peers with the restarting speaker. 247 Consider that the Graceful Restart Capability for an address family 248 is advertised by the Restarting Speaker, and is understood by the 249 Receiving Speaker, and a BGP session between them is established. 250 The following sections detail the procedures that SHALL be followed 251 by the Restarting Speaker as well as the Receiving Speaker once the 252 Restarting Speaker restarts. 254 7.1. Procedures for the Restarting Speaker 256 When the Restarting Speaker restarts, possible it SHOULD retain, if 257 possible, the forwarding state for the BGP routes in the Loc-RIB, and 258 SHALL mark them as stale. It SHOULD NOT differentiate between stale 259 and other information during forwarding. 261 To re-establish the session with its peer, the Restarting Speaker 262 MUST set the "Restart State" bit in the Graceful Restart Capability 263 of the OPEN message. Unless allowed via configuration, the 264 "Forwarding State" bit for an address family in the capability can be 265 set only if the forwarding state has indeed been preserved for that 266 address family during the restart. 268 Once the session between the Restarting Speaker and the Receiving 269 Speaker is re-established, the Restarting Speaker will receive and 270 process BGP messages from its peers. However, it SHALL defer route 271 selection for an address family until it receives the End-of-RIB 272 marker from all its peers (excluding the ones with the "Restart 273 State" bit set in the received capability and excluding the ones 274 which do not advertise the graceful restart capability). It is noted 275 that prior to route selection, the speaker has no routes to advertise 276 to its peers and no routes to update the forwarding state. 278 In situations where both IGP and BGP have restarted, it might be 279 advantageous to wait for IGP to converge before the BGP speaker 280 performs route selection. 282 After the BGP speaker performs route selection, the forwarding state 283 of the speaker SHALL be updated and any previously marked stale 284 information SHALL be removed. The Adj-RIB-Out can then be advertised 285 to its peers. Once the initial update is complete for an address 286 family (including the case that there is no routing update to send), 287 the End-of-RIB marker SHALL be sent. 289 To put an upper bound on the amount of time a router defers its route 290 selection, an implementation MUST support a (configurable) timer that 291 imposes this upper bound. 293 If one wants to apply graceful restart only when the restart is 294 planned (as opposed to both planned and unplanned restart), then one 295 way to accomplish this would be to set the Forwarding State bit to 1 296 after a planned restart, and to 0 in all other cases. Other 297 approaches to accomplish this are outside the scope of this document. 299 7.2. Procedures for the Receiving Speaker 301 When the Restarting Speaker restarts, the Receiving Speaker may or 302 may not detect the termination of the TCP session with the Restarting 303 Speaker, depending on the underlying TCP implementation, whether or 304 not [BGP-AUTH] is in use, and the specific circumstances of the 305 restart. In case it does not detect the TCP reset and still 306 considers the BGP session as being established, it SHALL treat the 307 subsequent open connection from the peer as an indication of TCP 308 reset and act accordingly (when the Graceful Restart Capability has 309 been received from the peer). 311 "Acting accordingly" in this context means that the previous TCP 312 session SHOULD be closed, and the new one retained. Note that this 313 behavior differs from the default behavior, as specified in [BGP-4] 314 section 6.8. Since the previous connection is considered to be 315 reset, no NOTIFICATION message should be sent -- the previous TCP 316 session is simply closed. 318 When the Receiving Speaker detects TCP reset for a BGP session with a 319 peer that has advertised the Graceful Restart Capability, it SHALL 320 retain the routes received from the peer for all the address families 321 that were previously received in the Graceful Restart Capability, and 322 SHALL mark them as stale routing information. To deal with possible 323 consecutive restarts, a route (from the peer) previously marked as 324 stale SHALL be deleted. The router SHOULD NOT differentiate between 325 stale and other routing information during forwarding. 327 In re-establishing the session, the "Restart State" bit in the 328 Graceful Restart Capability of the OPEN message sent by the Receiving 329 Speaker SHALL NOT be set unless the Receiving Speaker has restarted. 331 The presence and the setting of the "Forwarding State" bit for an 332 address family depends upon the actual forwarding state and 333 configuration. 335 If the session does not get re-established within the "Restart Time" 336 that the peer advertised previously, the Receiving Speaker SHALL 337 delete all the stale routes from the peer that it is retaining. 339 Once the session is re-established, if the "Forwarding State" bit for 340 a specific address family is not set in the newly received Graceful 341 Restart Capability, or if a specific address family is not included 342 in the newly received Graceful Restart Capability, or if the Graceful 343 Restart Capability isn't received in the re-established session at 344 all, then Receiving Speaker SHALL immediately remove all the stale 345 routes from the peer that it is retaining for that address family. 347 The Receiving Speaker SHALL send the End-of-RIB marker once it 348 completes the initial update for an address family (including the 349 case that it has no routes to send) to the peer. 351 The Receiving Speaker SHALL replace the stale routes by the routing 352 updates received from the peer. Once the End-of-RIB marker for an 353 address family is received from the peer, it SHALL immediately remove 354 any routes from the peer that are still marked as stale for that 355 address family. 357 To put an upper bound on the amount of time a router retains the 358 stale routes, an implementation MAY support a (configurable) timer 359 that imposes this upper bound. 361 8. Deployment Considerations 363 While the procedures described in this document would help minimize 364 the effect of routing flaps, it is noted, however, that when a BGP 365 Graceful Restart capable router restarts, there is a potential for 366 transient routing loops or blackholes in the network if routing 367 information changes before the involved routers complete routing 368 updates and convergence. Also, depending on the network topology, if 369 not all IBGP speakers are Graceful Restart capable, there could be an 370 increased exposure to transient routing loops or blackholes when the 371 Graceful Restart procedures are exercised. 373 The Restart Time, the upper bound for retaining routes and the upper 374 bound for deferring route selection may need to be tuned as more 375 deployment experience is gained. 377 Finally, it is noted that the benefits of deploying BGP Graceful 378 Restart in an AS whose IGPs and BGP are tightly coupled (i.e., BGP 379 and IGPs would both restart) and IGPs have no similar Graceful 380 Restart capability are reduced relative to the scenario where IGPs do 381 have similar Graceful Restart capability. 383 9. Security Considerations 385 Since with this proposal a new connection can cause an old one to be 386 terminated, it might seem to open the door to denial of service 387 attacks. However, it is noted that unauthenticated BGP is already 388 known to be vulnerable to denials of service through attacks on the 389 TCP transport. The TCP transport is commonly protected through use 390 of [BGP-AUTH]. Such authentication will equally protect against 391 denials of service through spurious new connections. 393 It is thus concluded that this proposal does not change the 394 underlying security model (and issues) of BGP-4. 396 10. Acknowledgments 398 The authors would like to thank Bruce Cole, Bill Fenner, Eric Gray 399 Jeffrey Haas, Alvaro Retana, Naiming Shen, Satinder Singh, David 400 Ward, Shane Wright and Alex Zinin for their review and comments. 402 11. Normative References 404 [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- 405 4)", RFC 1771, March 1995. 407 [BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., 408 "Multiprotocol Extensions for BGP-4", RFC2858, June 2000. 410 [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with 411 BGP-4", draft-ietf-idr-rfc2842bis-02.txt, April 2002. 413 [BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 414 Signature Option", RFC 2385, August 1998. 416 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 417 Requirement Levels", BCP 14, RFC 2119, March 1997. 419 [IANA-AFI] http://www.iana.org/assignments/address-family-numbers. 421 [IANA-SAFI] http://www.iana.org/assignments/safi-namespace. 423 12. Author Information 425 Srihari R. Sangli 426 Procket Networks, Inc. 427 1100 Cadillac Court 428 Milpitas, CA 95035 429 e-mail: srihari@procket.com 431 Yakov Rekhter 432 Juniper Networks, Inc. 433 1194 N. Mathilda Avenue 434 Sunnyvale, CA 94089 435 e-mail: yakov@juniper.net 437 Rex Fernando 438 Procket Networks, Inc. 439 1100 Cadillac Court 440 Milpitas, CA 95035 441 e-mail: rex@procket.com 443 John G. Scudder 444 Cisco Systems, Inc. 445 170 West Tasman Drive 446 San Jose, CA 95134 447 e-mail: jgs@cisco.com 449 Enke Chen 450 Redback Networks, Inc. 451 350 Holger Way 452 San Jose, CA 95134 453 e-mail: enke@redback.com