idnits 2.17.1 draft-ietf-idr-restart-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 8 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 9 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 195: '...l BGP procedures MUST be followed when...' RFC 2119 keyword, line 199: '...the Restart Time SHOULD NOT be greater...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (ref. 'BGP-4') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2283 (ref. 'BGP-MP') (Obsoleted by RFC 2858) ** Obsolete normative reference: RFC 2842 (ref. 'BGP-CAP') (Obsoleted by RFC 3392) ** Obsolete normative reference: RFC 2385 (ref. 'BGP-AUTH') (Obsoleted by RFC 5925) Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Srihari Ramachandra (Procket Networks) 2 Internet Draft Yakov Rekhter (Juniper Networks) 3 Expiration Date: January 2002 Rex Fernando (Procket Networks) 4 John G. Scudder (Cisco Systems) 5 Enke Chen (Redback Networks) 7 Graceful Restart Mechanism for BGP 9 draft-ietf-idr-restart-01.txt 11 1. Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as ``work in progress.'' 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 2. Abstract 34 This document proposes a mechanism for BGP that would help minimize 35 the negative effects on routing caused by BGP restart. An End-of-RIB 36 marker is specified and can be used to convey routing convergence 37 information. A new BGP capability, termed "Graceful Restart 38 Capability", is defined which would allow a BGP speaker to express 39 its ability to preserve forwarding state during BGP restart. Finally, 40 procedures are outlined for temporarily retaining routing information 41 across a TCP transport reset. 43 3. Introduction 45 Usually when BGP on a router restarts, all the BGP peers detect that 46 the session went down, and then came up. This "down/up" transition 47 results in a "routing flap" and causes BGP route re-computation, 48 generation of BGP routing updates and flap the forwarding tables. It 49 could spread across multiple routing domains. Such routing flaps may 50 create transient forwarding blackholes and/or transient forwarding 51 loops. They also consume resources on the control plane of the 52 routers affected by the flap. As such they are detrimental to the 53 overall network performance. 55 This document proposes a mechanism for BGP that would help minimize 56 the negative effects on routing caused by BGP restart. An End-of-RIB 57 marker is specified and can be used to convey routing convergence 58 information. A new BGP capability, termed "Graceful Restart 59 Capability", is defined which would allow a BGP speaker to express 60 its ability to preserve forwarding state during BGP restart. Finally, 61 procedures are outlined for temporarily retaining routing information 62 across a TCP transport reset. 64 4. Marker for End-of-RIB 66 An UPDATE message with empty withdrawn NLRI is specified as the End- 67 Of-RIB Marker that can be used by a BGP speaker to indicate to its 68 peer the completion of the initial routing update after the session 69 is established. For IPv4 unicast address family, the End-Of-RIB 70 Marker is an UPDATE message with the minimum length [BGP-4]. For any 71 other address family, it is an UPDATE message that contains only 72 MP_UNREACH_NLRI [BGP-MP] with no withdrawn routes for that . 75 Although the End-of-RIB Marker is specified for the purpose of BGP 76 graceful restart, it is noted that the generation of such a marker 77 upon completion of the initial update would be useful for routing 78 convergence in general, and thus the practice is recommended. 80 In addition, it would be beneficial for routing convergence if a BGP 81 speaker can indicate to its peer up-front that it will generate the 82 End-Of-RIB marker, regardless of its ability to preserve its 83 forwarding state during BGP restart. This can be accomplished using 84 the Graceful Restart Capability described in the next section. 86 5. Graceful Restart Capability 88 The Graceful Restart Capability is a new BGP capability [BGP-CAP] 89 that can be used by a BGP speaker to indicate its ability to preserve 90 its forwarding state during BGP restart. It can also be used to 91 convey to its peer its intention of generating the End-Of-RIB marker 92 upon the completion of its initial routing updates. 94 This capability is defined as follows: 96 Capability code: 64 98 Capability length: variable 100 Capability value: Consists of the "Restart Flags" field, 101 "Restart Time" field, and zero or more of the tuples as follows. 104 +--------------------------------------------------+ 105 | Restart Flags (4 bits) | 106 +--------------------------------------------------+ 107 | Restart Time in seconds (12 bits) | 108 +--------------------------------------------------+ 109 | Address Family Identifier (16 bits) | 110 +--------------------------------------------------+ 111 | Subsequent Address Family Identifier (8 bits) | 112 +--------------------------------------------------+ 113 | Flags for Address Family (8 bits) | 114 +--------------------------------------------------+ 115 | ... | 116 +--------------------------------------------------+ 117 | Address Family Identifier (16 bits) | 118 +--------------------------------------------------+ 119 | Subsequent Address Family Identifier (8 bits) | 120 +--------------------------------------------------+ 121 | Flags for Address Family (8 bits) | 122 +--------------------------------------------------+ 124 The use and meaning of the fields are as follows: 126 Restart Flags: 128 This field contains bit flags related to restart. 130 The most significant bit is defined as the Restart State bit 131 which can be used to avoid possible deadlock caused by waiting 132 for the End-of-RIB marker when multiple BGP speakers peering 133 with each other restart. When set (value 1), this bit indicates 134 that the BGP speaker has restarted, and its peer should not wait 135 for the End-of-RIB marker from the speaker before advertising 136 routing information to the speaker. 138 The remaining bits are reserved. 140 Restart Time: 142 This is the estimated time (in seconds) it will take for the BGP 143 session to be re-established after a restart. This can be used to 144 speed up routing convergence by its peer in case that the BGP 145 speaker does not come back after a restart. 147 Address Family Identifier (AFI): 149 This field carries the identity of the Network Layer protocol 150 for which the Graceful Restart support is advertised. Presently 151 defined values for this field are specified in RFC1700 (see 152 the Address Family Numbers section). 154 Subsequent Address Family Identifier (Sub-AFI): 156 This field provides additional information about the type of 157 the Network Layer Reachability Information carried in the 158 attribute. 160 Flags for Address Family: 162 This field contains bit flags for the . 164 The most significant bit is defined as the Forwarding State 165 bit which can be used to indicate if the forwarding state for 166 the has indeed been preserved during the previous 167 BGP restart. When set (value 1), the bit indicates that the 168 forwarding state has been preserved. 170 The remaining bits are reserved. 172 The advertisement of this capability by a BGP speaker also implies 173 that it will generate the End-of-RIB marker upon completion of its 174 initial routing update to its peer. The value of the "Restart Time" 175 field is irrelevant in the case that the capability does not carry 176 any . 178 6. Operation 180 A BGP speaker may advertise the Graceful Restart Capability for an 181 address family to its peer only if it has the ability to preserve its 182 forwarding state for the address family when BGP restarts. 184 Even if the speaker does not have the ability to preserve its 185 forwarding state for any address family during BGP restart, it is 186 still recommended that the speaker advertise the Graceful Restart 187 Capability to its peer to indicate its intention of generating the 188 End-of-RIB marker upon the completion of its initial routing updates. 190 The End-of-RIB marker should be sent by a BGP speaker to its peer 191 once it completes the initial routing update (including the case when 192 there is no update to send) for an address family after the BGP 193 session is established. 195 It is noted that the normal BGP procedures MUST be followed when the 196 TCP session terminates due to the sending or receiving of a BGP 197 NOTIFICATION message. 199 In general the Restart Time SHOULD NOT be greater than the HOLDTIME 200 carried in the OPEN. 202 In the following sections, "Restarting Speaker" refers to a router 203 whose BGP has restarted, and "Receiving Speaker" refers to a router 204 that peers with the restarting speaker. 206 Consider that the Graceful Restart Capability for an address family 207 is advertised by the Restarting Speaker, and is understood by the 208 Receiving Speaker, and a BGP session between them is established. 209 The following sections detail the procedures that shall be followed 210 by the Restarting Speaker as well as the Receiving Speaker once the 211 Restarting Speaker restarts. 213 6.1. Procedures for the Restarting Speaker 215 When the Restarting Speaker restarts, if possible it shall retain the 216 forwarding state for the BGP routes in the Loc-RIB, and shall mark 217 them as stale. It should not differentiate between stale and other 218 information during forwarding. 220 To re-establish the session with its peer, the Restarting Speaker 221 must set the "Restart State" bit in the Graceful Restart Capability 222 of the OPEN message. Unless allowed via configuration, the 223 "Forwarding State" bit for an address family in the capability can be 224 set only if the forwarding state has indeed been preserved for that 225 address family during the restart. 227 Once the session between the Restarting Speaker and the Receiving 228 Speaker is re-established, the Restarting Speaker will receive and 229 process BGP messages from its peers. However, it shall defer route 230 selection for an address family until it receives the End-of-RIB 231 marker from all its peers (excluding the ones with the "Restart 232 State" bit set in the received capability). It is noted that prior to 233 route selection, the speaker has no routes to advertise to its peers 234 and no routes to update the forwarding state. 236 In situations where both IGP and BGP have restarted, it might be 237 advantageous to wait for IGP to converge before the BGP speaker 238 performs route selection. 240 After the BGP speaker performs route selection, the forwarding state 241 of the speaker shall be updated and any previously marked stale 242 information shall be removed. The Adj-RIB-Out can then be advertised 243 to its peers. Once the initial update is complete for an address 244 family (including the case that there is no routing update to send), 245 the End-of-RIB marker shall be sent. 247 To put an upper bound on the amount of time a router defers its route 248 selection, an implementation must support a (configurable) timer that 249 imposes this upper bound. 251 6.2. Procedures for the Receiving Speaker 253 When the Restarting Speaker restarts, the Receiving Speaker may or 254 may not detect the termination of the TCP session with the Restarting 255 Speaker, depending on the underlying TCP implementation, whether or 256 not [BGP-AUTH] is in use, and the specific circumstances of the 257 restart. In case it does not detect the TCP reset and still 258 considers the BGP session as being established, it shall treat the 259 subsequent open connection from the Restarting Speaker as an 260 indication of TCP reset and act accordingly. 262 When the TCP reset is detected by the Receiving Speaker, it shall 263 retain the routes received from the Restarting Speaker for all the 264 address families that were previously received in the Graceful 265 Restart Capability, and shall mark them as stale routing information. 266 To deal with possible consecutive restarts, a route (from the 267 Restarting Speaker) previously marked as stale shall be deleted. The 268 router should not differentiate between stale and other routing 269 information during forwarding. 271 In re-establishing the session, the "Restart State" bit in the 272 Graceful Restart Capability of the OPEN message sent by the Receiving 273 Speaker shall not be set unless the Receiving Speaker has also 274 restarted. The presence and the setting of the "Forwarding State" bit 275 for an address family depends upon the actual forwarding state and 276 configuration. 278 If the session does not get re-established within the "Restart Time" 279 that the Restarting Speaker advertised previously, the Receiving 280 Speaker shall delete all the stale routes from the Restarting Speaker 281 that it is retaining. 283 Once the session is re-established, if the "Forwarding State" bit for 284 an address family is not set in the received Graceful Restart 285 Capability, or if the capability is not received for an address 286 family, the Receiving Speaker shall immediately remove all the stale 287 routes from the Restarting Speaker that it is retaining for that 288 address family. 290 The Receiving Speaker shall send the End-of-RIB marker once it 291 completes the initial update for an address family (including the 292 case that it has no routes to send) to the Restarting Speaker. 294 The Receiving Speaker shall replace the stale routes by the routing 295 updates received from the Restarting Speaker. Once the End-of-RIB 296 marker for an address family is received from the Restarting Speaker, 297 it shall immediately remove any routes from the Restarting Speaker 298 that are still marked as stale for that address family. 300 To put an upper bound on the amount of time a router retains the 301 stale routes, an implementation may support a (configurable) timer 302 that imposes this upper bound. 304 7. Deployment Considerations 306 While the procedures described in this document would help minimize 307 the effect of routing flaps, it is noted, however, that when a BGP 308 Graceful-Restart capable router restarts, there is a potential for 309 transient routing loops or blackholes in the network if routing 310 information changes before the involved routers complete routing 311 updates and convergence. Also, depending on the network topology, if 312 not all IBGP speakers are Graceful-Restart capable, there could be an 313 increased exposure to transient routing loops or blackholes when the 314 Graceful-Restart procedures are exercised. 316 The Restart Time, the upper bound for retaining routes and the upper 317 bound for deferring route selection may need to be tuned as more 318 deployment experience is gained. 320 Finally, it is noted that there is little benefit deploying BGP 321 Graceful-Restart in an AS whose IGPs and BGP are tightly coupled 322 (i.e., BGP and IGPs would both restart), and IGPs have no similar 323 Graceful-Restart capability. 325 8. Security Considerations 327 Since with this proposal a new connection can cause an old one to be 328 terminated, it might seem to open the door to denial of service 329 attacks. However, it is noted that unauthenticated BGP is already 330 known to be vulnerable to denials of service through attacks on the 331 TCP transport. The TCP transport is commonly protected through use 332 of [BGP-AUTH]. Such authentication will equally protect against 333 denials of service through spurious new connections. 335 It is thus concluded that this proposal does not change the 336 underlying security model (and issues) of BGP-4. 338 9. Acknowledgments 340 The authors would like to thank Alvaro Retana, Satinder Singh, David 341 Ward, Naiming Shen and Bruce Cole for their review and comments. 343 10. References 345 [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- 346 4)", RFC 1771, March 1995. 348 [BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., 349 "Multiprotocol Extensions for BGP-4", RFC 2283, March 1998. 351 [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with 352 BGP-4", RFC 2842, May 2000. 354 [BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 355 Signature Option", RFC 2385, August 1998. 357 11. Author Information 359 Srihari Ramachandra 360 Procket Networks, Inc. 361 1100 Cadillac Court 362 Milpitas, CA 95035 363 e-mail: srihari@procket.com 365 Yakov Rekhter 366 Juniper Networks, Inc. 367 1194 N. Mathilda Avenue 368 Sunnyvale, CA 94089 369 e-mail: yakov@juniper.net 371 Rex Fernando 372 Procket Networks, Inc. 373 1100 Cadillac Court 374 Milpitas, CA 95035 375 e-mail: rex@procket.com 377 John G. Scudder 378 Cisco Systems, Inc. 379 170 West Tasman Drive 380 San Jose, CA 95134 381 e-mail: jgs@cisco.com 383 Enke Chen 384 Redback Networks, Inc. 385 350 Holger Way 386 San Jose, CA 95134 387 e-mail: enke@redback.com