idnits 2.17.1 draft-ietf-idr-restart-03.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 8 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 9 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 196: '...l BGP procedures MUST be followed when...' RFC 2119 keyword, line 200: '...the Restart Time SHOULD NOT be greater...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (ref. 'BGP-4') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2283 (ref. 'BGP-MP') (Obsoleted by RFC 2858) ** Obsolete normative reference: RFC 2842 (ref. 'BGP-CAP') (Obsoleted by RFC 3392) ** Obsolete normative reference: RFC 2385 (ref. 'BGP-AUTH') (Obsoleted by RFC 5925) Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Srihari R. Sangli (Procket Networks) 3 Internet Draft Yakov Rekhter (Juniper Networks) 4 Expiration Date: October 2002 Rex Fernando (Procket Networks) 5 John G. Scudder (Cisco Systems) 6 Enke Chen (Redback Networks) 8 Graceful Restart Mechanism for BGP 10 draft-ietf-idr-restart-03.txt 12 1. Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as ``work in progress.'' 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 2. Abstract 35 This document proposes a mechanism for BGP that would help minimize 36 the negative effects on routing caused by BGP restart. An End-of-RIB 37 marker is specified and can be used to convey routing convergence 38 information. A new BGP capability, termed "Graceful Restart 39 Capability", is defined which would allow a BGP speaker to express 40 its ability to preserve forwarding state during BGP restart. Finally, 41 procedures are outlined for temporarily retaining routing information 42 across a TCP transport reset. 44 3. Introduction 46 Usually when BGP on a router restarts, all the BGP peers detect that 47 the session went down, and then came up. This "down/up" transition 48 results in a "routing flap" and causes BGP route re-computation, 49 generation of BGP routing updates and flap the forwarding tables. It 50 could spread across multiple routing domains. Such routing flaps may 51 create transient forwarding blackholes and/or transient forwarding 52 loops. They also consume resources on the control plane of the 53 routers affected by the flap. As such they are detrimental to the 54 overall network performance. 56 This document proposes a mechanism for BGP that would help minimize 57 the negative effects on routing caused by BGP restart. An End-of-RIB 58 marker is specified and can be used to convey routing convergence 59 information. A new BGP capability, termed "Graceful Restart 60 Capability", is defined which would allow a BGP speaker to express 61 its ability to preserve forwarding state during BGP restart. Finally, 62 procedures are outlined for temporarily retaining routing information 63 across a TCP transport reset. 65 4. Marker for End-of-RIB 67 An UPDATE message with empty withdrawn NLRI is specified as the End- 68 Of-RIB Marker that can be used by a BGP speaker to indicate to its 69 peer the completion of the initial routing update after the session 70 is established. For IPv4 unicast address family, the End-Of-RIB 71 Marker is an UPDATE message with the minimum length [BGP-4]. For any 72 other address family, it is an UPDATE message that contains only 73 MP_UNREACH_NLRI [BGP-MP] with no withdrawn routes for that . 76 Although the End-of-RIB Marker is specified for the purpose of BGP 77 graceful restart, it is noted that the generation of such a marker 78 upon completion of the initial update would be useful for routing 79 convergence in general, and thus the practice is recommended. 81 In addition, it would be beneficial for routing convergence if a BGP 82 speaker can indicate to its peer up-front that it will generate the 83 End-Of-RIB marker, regardless of its ability to preserve its 84 forwarding state during BGP restart. This can be accomplished using 85 the Graceful Restart Capability described in the next section. 87 5. Graceful Restart Capability 89 The Graceful Restart Capability is a new BGP capability [BGP-CAP] 90 that can be used by a BGP speaker to indicate its ability to preserve 91 its forwarding state during BGP restart. It can also be used to 92 convey to its peer its intention of generating the End-Of-RIB marker 93 upon the completion of its initial routing updates. 95 This capability is defined as follows: 97 Capability code: 64 99 Capability length: variable 101 Capability value: Consists of the "Restart Flags" field, 102 "Restart Time" field, and zero or more of the tuples as follows. 105 +--------------------------------------------------+ 106 | Restart Flags (4 bits) | 107 +--------------------------------------------------+ 108 | Restart Time in seconds (12 bits) | 109 +--------------------------------------------------+ 110 | Address Family Identifier (16 bits) | 111 +--------------------------------------------------+ 112 | Subsequent Address Family Identifier (8 bits) | 113 +--------------------------------------------------+ 114 | Flags for Address Family (8 bits) | 115 +--------------------------------------------------+ 116 | ... | 117 +--------------------------------------------------+ 118 | Address Family Identifier (16 bits) | 119 +--------------------------------------------------+ 120 | Subsequent Address Family Identifier (8 bits) | 121 +--------------------------------------------------+ 122 | Flags for Address Family (8 bits) | 123 +--------------------------------------------------+ 125 The use and meaning of the fields are as follows: 127 Restart Flags: 129 This field contains bit flags related to restart. 131 The most significant bit is defined as the Restart State bit 132 which can be used to avoid possible deadlock caused by waiting 133 for the End-of-RIB marker when multiple BGP speakers peering 134 with each other restart. When set (value 1), this bit indicates 135 that the BGP speaker has restarted, and its peer should not wait 136 for the End-of-RIB marker from the speaker before advertising 137 routing information to the speaker. 139 The remaining bits are reserved. 141 Restart Time: 143 This is the estimated time (in seconds) it will take for the BGP 144 session to be re-established after a restart. This can be used to 145 speed up routing convergence by its peer in case that the BGP 146 speaker does not come back after a restart. 148 Address Family Identifier (AFI): 150 This field carries the identity of the Network Layer protocol 151 for which the Graceful Restart support is advertised. Presently 152 defined values for this field are specified in RFC1700 (see 153 the Address Family Numbers section). 155 Subsequent Address Family Identifier (Sub-AFI): 157 This field provides additional information about the type of 158 the Network Layer Reachability Information carried in the 159 attribute. 161 Flags for Address Family: 163 This field contains bit flags for the . 165 The most significant bit is defined as the Forwarding State 166 bit which can be used to indicate if the forwarding state for 167 the has indeed been preserved during the previous 168 BGP restart. When set (value 1), the bit indicates that the 169 forwarding state has been preserved. 171 The remaining bits are reserved. 173 The advertisement of this capability by a BGP speaker also implies 174 that it will generate the End-of-RIB marker (for all address families 175 exchanged) upon completion of its initial routing update to its peer. 176 The value of the "Restart Time" field is irrelevant in the case that 177 the capability does not carry any . 179 6. Operation 181 A BGP speaker may advertise the Graceful Restart Capability for an 182 address family to its peer only if it has the ability to preserve its 183 forwarding state for the address family when BGP restarts. 185 Even if the speaker does not have the ability to preserve its 186 forwarding state for any address family during BGP restart, it is 187 still recommended that the speaker advertise the Graceful Restart 188 Capability to its peer to indicate its intention of generating the 189 End-of-RIB marker upon the completion of its initial routing updates. 191 The End-of-RIB marker should be sent by a BGP speaker to its peer 192 once it completes the initial routing update (including the case when 193 there is no update to send) for an address family after the BGP 194 session is established. 196 It is noted that the normal BGP procedures MUST be followed when the 197 TCP session terminates due to the sending or receiving of a BGP 198 NOTIFICATION message. 200 In general the Restart Time SHOULD NOT be greater than the HOLDTIME 201 carried in the OPEN. 203 In the following sections, "Restarting Speaker" refers to a router 204 whose BGP has restarted, and "Receiving Speaker" refers to a router 205 that peers with the restarting speaker. 207 Consider that the Graceful Restart Capability for an address family 208 is advertised by the Restarting Speaker, and is understood by the 209 Receiving Speaker, and a BGP session between them is established. 210 The following sections detail the procedures that shall be followed 211 by the Restarting Speaker as well as the Receiving Speaker once the 212 Restarting Speaker restarts. 214 6.1. Procedures for the Restarting Speaker 216 When the Restarting Speaker restarts, if possible it shall retain the 217 forwarding state for the BGP routes in the Loc-RIB, and shall mark 218 them as stale. It should not differentiate between stale and other 219 information during forwarding. 221 To re-establish the session with its peer, the Restarting Speaker 222 must set the "Restart State" bit in the Graceful Restart Capability 223 of the OPEN message. Unless allowed via configuration, the 224 "Forwarding State" bit for an address family in the capability can be 225 set only if the forwarding state has indeed been preserved for that 226 address family during the restart. 228 Once the session between the Restarting Speaker and the Receiving 229 Speaker is re-established, the Restarting Speaker will receive and 230 process BGP messages from its peers. However, it shall defer route 231 selection for an address family until it receives the End-of-RIB 232 marker from all its peers (excluding the ones with the "Restart 233 State" bit set in the received capability and excluding the ones 234 which do not advertise the graceful restart capability). It is noted 235 that prior to route selection, the speaker has no routes to advertise 236 to its peers and no routes to update the forwarding state. 238 In situations where both IGP and BGP have restarted, it might be 239 advantageous to wait for IGP to converge before the BGP speaker 240 performs route selection. 242 After the BGP speaker performs route selection, the forwarding state 243 of the speaker shall be updated and any previously marked stale 244 information shall be removed. The Adj-RIB-Out can then be advertised 245 to its peers. Once the initial update is complete for an address 246 family (including the case that there is no routing update to send), 247 the End-of-RIB marker shall be sent. 249 To put an upper bound on the amount of time a router defers its route 250 selection, an implementation must support a (configurable) timer that 251 imposes this upper bound. 253 6.2. Procedures for the Receiving Speaker 255 When the Restarting Speaker restarts, the Receiving Speaker may or 256 may not detect the termination of the TCP session with the Restarting 257 Speaker, depending on the underlying TCP implementation, whether or 258 not [BGP-AUTH] is in use, and the specific circumstances of the 259 restart. In case it does not detect the TCP reset and still 260 considers the BGP session as being established, it shall treat the 261 subsequent open connection from the peer as an indication of TCP 262 reset and act accordingly (when the Graceful Restart Capabilty has 263 been received from the peer). 265 When the Receiving Speaker detects TCP reset for a BGP session with a 266 peer that has advertised the Graceful Restart Capability, it shall 267 retain the routes received from the peer for all the address families 268 that were previously received in the Graceful Restart Capability, and 269 shall mark them as stale routing information. To deal with possible 270 consecutive restarts, a route (from the peer) previously marked as 271 stale shall be deleted. The router should not differentiate between 272 stale and other routing information during forwarding. 274 In re-establishing the session, the "Restart State" bit in the 275 Graceful Restart Capability of the OPEN message sent by the Receiving 276 Speaker shall not be set unless the Receiving Speaker has restarted. 277 The presence and the setting of the "Forwarding State" bit for an 278 address family depends upon the actual forwarding state and 279 configuration. 281 If the session does not get re-established within the "Restart Time" 282 that the peer advertised previously, the Receiving Speaker shall 283 delete all the stale routes from the peer that it is retaining. 285 Once the session is re-established, if the "Forwarding State" bit for 286 an address family is not set in the received Graceful Restart 287 Capability, or if the capability is not received for an address 288 family, the Receiving Speaker shall immediately remove all the stale 289 routes from the peer that it is retaining for that address family. 291 The Receiving Speaker shall send the End-of-RIB marker once it 292 completes the initial update for an address family (including the 293 case that it has no routes to send) to the peer. 295 The Receiving Speaker shall replace the stale routes by the routing 296 updates received from the peer. Once the End-of-RIB marker for an 297 address family is received from the peer, it shall immediately remove 298 any routes from the peer that are still marked as stale for that 299 address family. 301 To put an upper bound on the amount of time a router retains the 302 stale routes, an implementation may support a (configurable) timer 303 that imposes this upper bound. 305 7. Deployment Considerations 307 While the procedures described in this document would help minimize 308 the effect of routing flaps, it is noted, however, that when a BGP 309 Graceful-Restart capable router restarts, there is a potential for 310 transient routing loops or blackholes in the network if routing 311 information changes before the involved routers complete routing 312 updates and convergence. Also, depending on the network topology, if 313 not all IBGP speakers are Graceful-Restart capable, there could be an 314 increased exposure to transient routing loops or blackholes when the 315 Graceful-Restart procedures are exercised. 317 The Restart Time, the upper bound for retaining routes and the upper 318 bound for deferring route selection may need to be tuned as more 319 deployment experience is gained. 321 Finally, it is noted that there is little benefit deploying BGP 322 Graceful-Restart in an AS whose IGPs and BGP are tightly coupled 323 (i.e., BGP and IGPs would both restart), and IGPs have no similar 324 Graceful-Restart capability. 326 8. Security Considerations 328 Since with this proposal a new connection can cause an old one to be 329 terminated, it might seem to open the door to denial of service 330 attacks. However, it is noted that unauthenticated BGP is already 331 known to be vulnerable to denials of service through attacks on the 332 TCP transport. The TCP transport is commonly protected through use 333 of [BGP-AUTH]. Such authentication will equally protect against 334 denials of service through spurious new connections. 336 It is thus concluded that this proposal does not change the 337 underlying security model (and issues) of BGP-4. 339 9. Acknowledgments 341 The authors would like to thank Alvaro Retana, Satinder Singh, David 342 Ward, Naiming Shen and Bruce Cole for their review and comments. 344 10. References 346 [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- 347 4)", RFC 1771, March 1995. 349 [BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., 350 "Multiprotocol Extensions for BGP-4", RFC 2283, March 1998. 352 [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with 353 BGP-4", RFC 2842, May 2000. 355 [BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 356 Signature Option", RFC 2385, August 1998. 358 11. Author Information 360 Srihari R. Sangli 361 Procket Networks, Inc. 362 1100 Cadillac Court 363 Milpitas, CA 95035 364 e-mail: srihari@procket.com 366 Yakov Rekhter 367 Juniper Networks, Inc. 368 1194 N. Mathilda Avenue 369 Sunnyvale, CA 94089 370 e-mail: yakov@juniper.net 372 Rex Fernando 373 Procket Networks, Inc. 374 1100 Cadillac Court 375 Milpitas, CA 95035 376 e-mail: rex@procket.com 378 John G. Scudder 379 Cisco Systems, Inc. 380 170 West Tasman Drive 381 San Jose, CA 95134 382 e-mail: jgs@cisco.com 384 Enke Chen 385 Redback Networks, Inc. 386 350 Holger Way 387 San Jose, CA 95134 388 e-mail: enke@redback.com