idnits 2.17.1 draft-ramachandra-bgp-restart-04.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 10 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 198: '...l BGP procedures MUST be followed when...' RFC 2119 keyword, line 202: '...the Restart Time SHOULD NOT be greater...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (ref. 'BGP-4') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2283 (ref. 'BGP-MP') (Obsoleted by RFC 2858) ** Obsolete normative reference: RFC 2842 (ref. 'BGP-CAP') (Obsoleted by RFC 3392) ** Obsolete normative reference: RFC 2385 (ref. 'BGP-AUTH') (Obsoleted by RFC 5925) Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Srihari Ramachandra 2 Internet Draft Yakov Rekhter 3 Expiration Date: May 2001 Rex Fernando 4 John G. Scudder 5 Cisco Systems 6 Enke Chen 7 Redback Networks 9 Graceful Restart Mechanism for BGP 11 draft-ramachandra-bgp-restart-04.txt 13 1. Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026 except that the right to 17 produce derivative works is not granted. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as ``work in progress.'' 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 2. Abstract 37 This document proposes a mechanism for BGP that would help minimize 38 the negative effects on routing caused by BGP restart. An End-of-RIB 39 marker is specified and can be used to convey routing convergence 40 information. A new BGP capability, termed "Graceful Restart 41 Capability", is defined which would allow a BGP speaker to express 42 its ability to preserve forwarding state during BGP restart. Finally, 43 procedures are outlined for temporarily retaining routing information 44 across a TCP transport reset. 46 3. Introduction 48 Usually when BGP on a router restarts, all the BGP peers detect that 49 the session went down, and then came up. This "down/up" transition 50 results in a "routing flap" and causes BGP route re-computation, 51 generation of BGP routing updates and flap the forwarding tables. It 52 could spread across multiple routing domains. Such routing flaps may 53 create transient forwarding blackholes and/or transient forwarding 54 loops. They also consume resources on the control plane of the 55 routers affected by the flap. As such they are detrimental to the 56 overall network performance. 58 This document proposes a mechanism for BGP that would help minimize 59 the negative effects on routing caused by BGP restart. An End-of-RIB 60 marker is specified and can be used to convey routing convergence 61 information. A new BGP capability, termed "Graceful Restart 62 Capability", is defined which would allow a BGP speaker to express 63 its ability to preserve forwarding state during BGP restart. Finally, 64 procedures are outlined for temporarily retaining routing information 65 across a TCP transport reset. 67 4. Marker for End-of-RIB 69 An UPDATE message with empty withdrawn NLRI is specified as the End- 70 Of-RIB Marker that can be used by a BGP speaker to indicate to its 71 peer the completion of the initial routing update after the session 72 is established. For IPv4 unicast address family, the End-Of-RIB 73 Marker is an UPDATE message with the minimum length [BGP-4]. For any 74 other address family, it is an UPDATE message that contains only 75 MP_UNREACH_NLRI [BGP-MP] with no withdrawn routes for that . 78 Although the End-of-RIB Marker is specified for the purpose of BGP 79 graceful restart, it is noted that the generation of such a marker 80 upon completion of the initial update would be useful for routing 81 convergence in general, and thus the practice is recommended. 83 In addition, it would be beneficial for routing convergence if a BGP 84 speaker can indicate to its peer up-front that it will generate the 85 End-Of-RIB marker, regardless of its ability to preserve its 86 forwarding state during BGP restart. This can be accomplished using 87 the Graceful Restart Capability described in the next section. 89 5. Graceful Restart Capability 91 The Graceful Restart Capability is a new BGP capability [BGP-CAP] 92 that can be used by a BGP speaker to indicate its ability to preserve 93 its forwarding state during BGP restart. It can also be used to 94 convey to its peer its intention of generating the End-Of-RIB marker 95 upon the completion of its initial routing updates. 97 This capability is defined as follows: 99 Capability code: 64 101 Capability length: variable 103 Capability value: Consists of the "Restart Flags" field, 104 "Restart Time" field, and zero or more of the tuples as follows. 107 +--------------------------------------------------+ 108 | Restart Flags (4 bits) | 109 +--------------------------------------------------+ 110 | Restart Time in seconds (12 bits) | 111 +--------------------------------------------------+ 112 | Address Family Identifier (16 bits) | 113 +--------------------------------------------------+ 114 | Subsequent Address Family Identifier (8 bits) | 115 +--------------------------------------------------+ 116 | Flags for Address Family (8 bits) | 117 +--------------------------------------------------+ 118 | ... | 119 +--------------------------------------------------+ 120 | Address Family Identifier (16 bits) | 121 +--------------------------------------------------+ 122 | Subsequent Address Family Identifier (8 bits) | 123 +--------------------------------------------------+ 124 | Flags for Address Family (8 bits) | 125 +--------------------------------------------------+ 127 The use and meaning of the fields are as follows: 129 Restart Flags: 131 This field contains bit flags related to restart. 133 The most significant bit is defined as the Restart State bit 134 which can be used to avoid possible deadlock caused by waiting 135 for the End-of-RIB marker when multiple BGP speakers peering 136 with each other restart. When set (value 1), this bit indicates 137 that the BGP speaker has restarted, and its peer should not wait 138 for the End-of-RIB marker from the speaker before advertising 139 routing information to the speaker. 141 The remaining bits are reserved. 143 Restart Time: 145 This is the estimated time (in seconds) it will take for the BGP 146 session to be re-established after a restart. This can be used to 147 speed up routing convergence by its peer in case that the BGP 148 speaker does not come back after a restart. 150 Address Family Identifier (AFI): 152 This field carries the identity of the Network Layer protocol 153 for which the Graceful Restart support is advertised. Presently 154 defined values for this field are specified in RFC1700 (see 155 the Address Family Numbers section). 157 Subsequent Address Family Identifier (Sub-AFI): 159 This field provides additional information about the type of 160 the Network Layer Reachability Information carried in the 161 attribute. 163 Flags for Address Family: 165 This field contains bit flags for the . 167 The most significant bit is defined as the Forwarding State 168 bit which can be used to indicate if the forwarding state for 169 the has indeed been preserved during the previous 170 BGP restart. When set (value 1), the bit indicates that the 171 forwarding state has been preserved. 173 The remaining bits are reserved. 175 The advertisement of this capability by a BGP speaker also implies 176 that it will generate the End-of-RIB marker upon completion of its 177 initial routing update to its peer. The value of the "Restart Time" 178 field is irrelevant in the case that the capability does not carry 179 any . 181 6. Operation 183 A BGP speaker may advertise the Graceful Restart Capability for an 184 address family to its peer only if it has the ability to preserve its 185 forwarding state for the address family when BGP restarts. 187 Even if the speaker does not have the ability to preserve its 188 forwarding state for any address family during BGP restart, it is 189 still recommended that the speaker advertise the Graceful Restart 190 Capability to its peer to indicate its intention of generating the 191 End-of-RIB marker upon the completion of its initial routing updates. 193 The End-of-RIB marker should be sent by a BGP speaker to its peer 194 once it completes the initial routing update (including the case when 195 there is no update to send) for an address family after the BGP 196 session is established. 198 It is noted that the normal BGP procedures MUST be followed when the 199 TCP session terminates due to the sending or receiving of a BGP 200 NOTIFICATION message. 202 In general the Restart Time SHOULD NOT be greater than the HOLDTIME 203 carried in the OPEN. 205 In the following sections, "Restarting Speaker" refers to a router 206 whose BGP has restarted, and "Receiving Speaker" refers to a router 207 that peers with the restarting speaker. 209 Consider that the Graceful Restart Capability for an address family 210 is advertised by the Restarting Speaker, and is understood by the 211 Receiving Speaker, and a BGP session between them is established. 212 The following sections detail the procedures that shall be followed 213 by the Restarting Speaker as well as the Receiving Speaker once the 214 Restarting Speaker restarts. 216 6.1. Procedures for the Restarting Speaker 218 When the Restarting Speaker restarts, if possible it shall retain the 219 forwarding state for the BGP routes in the Loc-RIB, and shall mark 220 them as stale. It should not differentiate between stale and other 221 information during forwarding. 223 To re-establish the session with its peer, the Restarting Speaker 224 must set the "Restart State" bit in the Graceful Restart Capability 225 of the OPEN message. Unless allowed via configuration, the 226 "Forwarding State" bit for an address family in the capability can be 227 set only if the forwarding state has indeed been preserved for that 228 address family during the restart. 230 Once the session between the Restarting Speaker and the Receiving 231 Speaker is re-established, the Restarting Speaker will receive and 232 process BGP messages from its peers. However, it shall defer route 233 selection for an address family until it receives the End-of-RIB 234 marker from all its peers (excluding the ones with the "Restart 235 State" bit set in the received capability). It is noted that prior to 236 route selection, the speaker has no routes to advertise to its peers 237 and no routes to update the forwarding state. 239 In situations where both IGP and BGP have restarted, it might be 240 advantageous to wait for IGP to converge before the BGP speaker 241 performs route selection. 243 After the BGP speaker performs route selection, the forwarding state 244 of the speaker shall be updated and any previously marked stale 245 information shall be removed. The Adj-RIB-Out can then be advertised 246 to its peers. Once the initial update is complete for an address 247 family (including the case that there is no routing update to send), 248 the End-of-RIB marker shall be sent. 250 To put an upper bound on the amount of time a router defers its route 251 selection, an implementation must support a (configurable) timer that 252 imposes this upper bound. 254 6.2. Procedures for the Receiving Speaker 256 When the Restarting Speaker restarts, the Receiving Speaker may or 257 may not detect the termination of the TCP session with the Restarting 258 Speaker, depending on the underlying TCP implementation, whether or 259 not [BGP-AUTH] is in use, and the specific circumstances of the 260 restart. In case it does not detect the TCP reset and still 261 considers the BGP session as being established, it shall treat the 262 subsequent open connection from the Restarting Speaker as an 263 indication of TCP reset and act accordingly. 265 When the TCP reset is detected by the Receiving Speaker, it shall 266 retain the routes received from the Restarting Speaker for all the 267 address families that were previously received in the Graceful 268 Restart Capability, and shall mark them as stale routing information. 269 To deal with possible consecutive restarts, a route (from the 270 Restarting Speaker) previously marked as stale shall be deleted. The 271 router should not differentiate between stale and other routing 272 information during forwarding. 274 In re-establishing the session, the "Restart State" bit in the 275 Graceful Restart Capability of the OPEN message sent by the Receiving 276 Speaker shall not be set unless the Receiving Speaker has also 277 restarted. The presence and the setting of the "Forwarding State" bit 278 for an address family depends upon the actual forwarding state and 279 configuration. 281 If the session does not get re-established within the "Restart Time" 282 that the Restarting Speaker advertised previously, the Receiving 283 Speaker shall delete all the stale routes from the Restarting Speaker 284 that it is retaining. 286 Once the session is re-established, if the "Forwarding State" bit for 287 an address family is not set in the received Graceful Restart 288 Capability, or if the capability is not received for an address 289 family, the Receiving Speaker shall immediately remove all the stale 290 routes from the Restarting Speaker that it is retaining for that 291 address family. 293 The Receiving Speaker shall send the End-of-RIB marker once it 294 completes the initial update for an address family (including the 295 case that it has no routes to send) to the Restarting Speaker. 297 The Receiving Speaker shall replace the stale routes by the routing 298 updates received from the Restarting Speaker. Once the End-of-RIB 299 marker for an address family is received from the Restarting Speaker, 300 it shall immediately remove any routes from the Restarting Speaker 301 that are still marked as stale for that address family. 303 To put an upper bound on the amount of time a router retains the 304 stale routes, an implementation may support a (configurable) timer 305 that imposes this upper bound. 307 7. Deployment Considerations 309 While the procedures described in this document would help minimize 310 the effect of routing flaps, it is noted, however, that when a BGP 311 Graceful-Restart capable router restarts, there is a potential for 312 transient routing loops or blackholes in the network if routing 313 information changes before the involved routers complete routing 314 updates and convergence. Also, depending on the network topology, if 315 not all IBGP speakers are Graceful-Restart capable, there could be an 316 increased exposure to transient routing loops or blackholes when the 317 Graceful-Restart procedures are exercised. 319 The Restart Time, the upper bound for retaining routes and the upper 320 bound for deferring route selection may need to be tuned as more 321 deployment experience is gained. 323 Finally, it is noted that there is little benefit deploying BGP 324 Graceful-Restart in an AS whose IGPs and BGP are tightly coupled 325 (i.e., BGP and IGPs would both restart), and IGPs have no similar 326 Graceful-Restart capability. 328 8. Security Considerations 330 Since with this proposal a new connection can cause an old one to be 331 terminated, it might seem to open the door to denial of service 332 attacks. However, it is noted that unauthenticated BGP is already 333 known to be vulnerable to denials of service through attacks on the 334 TCP transport. The TCP transport is commonly protected through use 335 of [BGP-AUTH]. Such authentication will equally protect against 336 denials of service through spurious new connections. 338 It is thus concluded that this proposal does not change the 339 underlying security model (and issues) of BGP-4. 341 9. Acknowledgments 343 The authors would like to thank Alvaro Retana, Satinder Singh, David 344 Ward, Naiming Shen and Bruce Cole for their review and comments. 346 10. References 348 [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- 349 4)", RFC 1771, March 1995. 351 [BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., 352 "Multiprotocol Extensions for BGP-4", RFC 2283, March 1998. 354 [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with 355 BGP-4", RFC 2842, May 2000. 357 [BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 358 Signature Option", RFC 2385, August 1998. 360 11. Author Information 362 Srihari Ramachandra 363 Cisco Systems, Inc. 364 170 West Tasman Drive 365 San Jose, CA 95134 366 e-mail: rsrihari@cisco.com 368 Yakov Rekhter 369 Cisco Systems, Inc. 370 170 Tasman Drive 371 San Jose, CA, 95134 372 e-mail: yakov@cisco.com 374 Rex Fernando 375 Cisco Systems, Inc. 376 170 West Tasman Drive 377 San Jose, CA 95134 378 e-mail: rex@cisco.com 380 John G. Scudder 381 Cisco Systems, Inc. 382 170 West Tasman Drive 383 San Jose, CA 95134 384 e-mail: jgs@cisco.com 386 Enke Chen 387 Redback Networks, Inc. 388 350 Holger Way 389 San Jose, CA 95134 390 e-mail: enke@redback.com