idnits 2.17.1 draft-ietf-grow-ops-reqs-for-bgp-error-handling-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 6, 2012) is 4340 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5881' is defined on line 1010, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2858 (Obsoleted by RFC 4760) == Outdated reference: A later version (-13) exists of draft-ietf-grow-bgp-gshut-03 == Outdated reference: A later version (-17) exists of draft-ietf-grow-bmp-06 == Outdated reference: A later version (-10) exists of draft-ietf-idr-bgp-enhanced-route-refresh-01 == Outdated reference: A later version (-16) exists of draft-ietf-idr-bgp-gr-notification-00 == Outdated reference: A later version (-06) exists of draft-ietf-idr-enhanced-gr-00 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force R. Shakir 3 Internet-Draft BT 4 Intended status: Informational June 6, 2012 5 Expires: December 8, 2012 7 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 8 draft-ietf-grow-ops-reqs-for-bgp-error-handling-04 10 Abstract 12 BGP-4 is utilised as a key intra- and inter-Autonomous System routing 13 protocol in modern IP networks. The failure modes as defined by the 14 original protocol standards are based on a number of assumptions 15 around the impact of session failure. Numerous incidents both in the 16 global Internet routing table and within Service Provider networks 17 have been caused by strict handling of a single invalid UPDATE 18 message causing large-scale failures in one or more Autonomous 19 Systems. 21 This memo describes the current use of BGP-4 within Service Provider 22 networks, and outlines a set of requirements for further work to 23 enhance the mechanisms available to a BGP-4 implementation when 24 erroneous data is detected. Whilst this document does not provide 25 specification of any standard, it is intended as an overview of a set 26 of enhancements to BGP-4 to improve the protocol's robustness to suit 27 its current deployment. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on December 8, 2012. 46 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 65 1.2. Overview of Operator Requirements for BGP-4 Error 66 Handling . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 6 68 2.1. Classifying BGP Errors and Expected Error Handling . . . . 7 69 2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 8 70 2.1.2. Semantic BGP Errors . . . . . . . . . . . . . . . . . 8 71 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 10 72 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 12 73 5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 14 74 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 16 75 7. Operational Complexities Introduced by Altering RFC4271 . . . 20 76 7.1. Reducing the Network Impact of Session Teardown . . . . . 22 77 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 78 9. Security Considerations . . . . . . . . . . . . . . . . . . . 25 79 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 26 80 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 81 11.1. Normative References . . . . . . . . . . . . . . . . . . . 27 82 11.2. Informational References . . . . . . . . . . . . . . . . . 27 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29 85 1. Introduction 87 Where BGP-4 [RFC4271] is deployed in the Internet and Service 88 Provider networks, numerous incidents have been recorded due to the 89 manner in which [RFC4271] specifies errors in routing information 90 should be handled. Whilst the behaviour defined in the existing 91 standards retains utility, the deployments of the protocol have 92 changed within modern networks, resulting in significantly different 93 demands for protocol robustness. Whilst a number of Internet Drafts 94 have been written to begin to enhance the behaviour of BGP-4 in terms 95 of the handling of erroneous messages, this memo intends to define a 96 set of requirements for ongoing work. These requirements are 97 considered from the perspective of a Network Operator, and hence this 98 draft does not intend to define the protocol mechanisms by which such 99 error handling behaviour is to be implemented. 101 1.1. Role of BGP-4 in Service Provider Networks 103 BGP was designed as an inter-Autonomous System (AS) routing protocol 104 and hence many of the error handling mechanisms within the protocol 105 specification are designed to be conducive to this role. In general, 106 this consideration as an inter-AS routing propagation mechanism 107 results in the view that a BGP session propagates a relatively small 108 amount of network-layer reachability information (NLRI) between two 109 ASes. In this case, it is the expectation of session resilience for 110 those adjacencies that are key to routing continuity (for example, it 111 is expected that two networks peering via BGP would connect multiple 112 times in order to safeguard equipment or protocol failure). In 113 addition, there is some expectation of multiple paths to a particular 114 NLRI being available - it would be expected that a network can fall 115 back to utilising alternate, less direct, paths where a failure of a 116 more direct path occurs. 118 Traditional network architectures would deploy an Interior Gateway 119 Protocol (IGP) to carry infrastructure and customer prefixes, with an 120 Exterior Gateway Protocol (EGP) such as BGP being utilised to 121 propagate these prefixes to other Autonomous Systems. However, with 122 the growth of IP-based services, this is no longer considered best 123 practice. In order to ensure that convergence is within acceptable 124 time bounds, the amount of routing information carried within the IGP 125 is significantly reduced - and tends to be only infrastructure 126 prefixes. iBGP is then utilised to propagate both customer, and 127 external prefixes within an AS. As such, BGP has become an IGP, with 128 traditional IGPs acting as a means by which to propagate the routing 129 information which is required to establish a BGP session, and reach 130 the egress node within the local routing domain. This change in role 131 presents different requirements for the robustness of BGP as a 132 routing protocol - with the expectation of similar level of 133 robustness to that of an IGP being set. 135 Along with this change in role, the nature of the IP routing 136 information that is carried has changed. BGP has become a ubiquitous 137 means by which service information can be propagated between devices. 138 For instance, BGP is utilised to carry routing information for IP/ 139 MPLS VPN services as described in [RFC4364]. Since there is an 140 existing deployment of the protocol between PE devices in numerous 141 networks, it has been adapted to propagate this routing information, 142 as its use limits number of routing protocols required on each 143 device. This additional information being propagated represents a 144 large change in requirement for the error handling of the protocol - 145 where session failure occurs, it is likely a complete service outage 146 for at least a subset of a network's customers is experienced where 147 an erroneous packet may have occurred within a different sub-topology 148 or even service (a different address family for example). For this 149 reason, there is a significant demand to avoid service affecting 150 failures that may be triggered by routing information within a single 151 sub-topology or service. 153 Both within Internet and multi-service routing architectures, a 154 number of BGP sessions propagate a large proportion of the required 155 routing information for network operation. For Internet routing, 156 these are typically BGP sessions which propagate the global routing 157 table to an AS - failure of these sessions may have a large impact on 158 network service, based on a single erroneous update. In an multi- 159 service environment, typical deployments utilise a small number of 160 core-facing BGP sessions, typically towards route reflector devices. 161 Failure of these sessions may also result in a large impact to 162 network operation. Clearly, the avoidance of conditions requiring 163 these sessions to fail is of great utility to any network operator, 164 and provides further motivation for the revision of the existing 165 behaviour. 167 Whilst the behaviour in [RFC4271] is suited to ensuring that BGP 168 messages with erroneous routing information in are limited in scope 169 (by means of session reset), with the above considerations, it is 170 clear that this mechanism is not suited to all deployments. It 171 should, however, be noted that the change in scope affects the 172 handling only of errors occurring after BGP session establishment. 173 There is no current operational requirement to amend the means by 174 which error handling in session establishment, or liveliness 175 detection, are performed. 177 1.2. Overview of Operator Requirements for BGP-4 Error Handling 179 It is the intention of this document to define a set of criteria for 180 the manner in which a revised error handling mechanism in BGP-4 is 181 required to conform. The motivation for the definition of these 182 requirements can be summarised based on certain behaviour currently 183 present in the protocol that is not deemed acceptable within current 184 operational deployments, or where there is a short-fall in the tool 185 set available to an operator. These key requirements can be 186 summarised as follows: 188 o It is unacceptable within modern deployments of the BGP-4 protocol 189 that a single erroneous UPDATE packet affects prefixes that it 190 does not carry. This requirement therefore requires some 191 modification to the means by which erroneous UPDATE packets are 192 handled, and reacted to - with a particular focus on avoiding the 193 use of the NOTIFICATION message. 195 o It is recognised that some error conditions may occur within the 196 BGP-4 protocol may not always be handled gracefully, and may 197 result in conditions whereby an implementation cannot recover. In 198 these (and similar) cases, it is undesirable for an operator that 199 this reset of the BGP-4 session results in interruption to 200 forwarding packets (by means of withdrawing prefixes installed by 201 BGP-4 into a device's RIB, and subsequently FIB). To this end, 202 there is a requirement to define a session reset mechanism which 203 provides session re-initialisation in a non-destructive manner. 205 o Further to the requirements to provide a more robust protocol, the 206 current visibility into error conditions within the BGP-4 protocol 207 is extremely limited - where further modifications to this 208 behaviour are to be made, complexity is likely to be added. Thus, 209 to ensure that BGP-4 is manageable, there are requirements for 210 mechanisms by which the protocol can be examined and monitored. 212 This document describes each of these requirements in further depth, 213 along with an overview of means by which they are expected to be 214 achieved. In addition, the mechanism by which the enhancements 215 meeting these requirements are to interact is discussed. 217 2. Errors within BGP-4 UPDATE Messages 219 Both through analysis of incidents occurring with the Internet DFZ, 220 and multi-service environments utilising BGP-4 to signal service or 221 routing information, a number of different classes of errors within 222 BGP-4 UPDATE messages have been observed. In order to consider the 223 applicability of enhanced error handling mechanisms, it is possible 224 to divide these errors into a number of sub-classes, particularly 225 focusing around the location of the error within the UPDATE message. 227 Where an UPDATE message is considered invalid by a BGP speaker due to 228 an error within a path attribute that is not the NLRI (where the 229 definition of NLRI includes reachability information encoded in the 230 MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in 231 [RFC4760]) it is a requirement of any enhanced error handling 232 mechanism to handle the error in a manner focused on the NLRI 233 contained within the message. Since in this case, the message 234 received from the remote peer is syntactically valid, it is 235 considered that such an UPDATE is indicative of erroneous data within 236 a path attribute. The impact of the current behaviour defined within 237 the protocol makes the implication that the BGP speaker from whom the 238 message is received is now an invalid path for all NLRI announced via 239 the session - which results in a disproportionate impact to overall 240 network operation. In particular scenarios (such as networks with 241 centralised BGP route reflection) such action can result in a loss of 242 all reachability to a network. In other contexts (such as the 243 Internet DFZ), it cannot be assumed that the BGP speaker from whom 244 the UPDATE message is received is directly responsible for the 245 erroneous information contained within the message. 247 Two further error cases exist within UPDATE messages, both of which 248 are related to the mechanisms that are applicable to messages 249 received where some difficulty exists in parsing the entire BGP 250 message. The two cases concern those cases where a valid NLRI 251 attribute can be extracted, and those where such an attribute is not 252 able to be parsed. In these cases, errors in the packing of 253 attributes within a BGP message may have occurred. Such errors are 254 likely indicative of an error specifically caused by the remote BGP 255 speaker. It is, however, desirable to an operator that such errors 256 are handled without affecting all NLRI across a BGP session. As 257 such, there is a key requirement to maximise the number of cases in 258 which it is possible to extract NLRI from a BGP UPDATE message. To 259 this end, it is required that where possible the MP_REACH_NLRI and 260 MP_UNREACH_NLRI attributes are utilised for encoding all NLRI 261 (including IPv4 Unicast), and that this attribute is included as the 262 first attribute of a BGP UPDATE message (as originally recommended in 263 [I-D.chen-ebgp-error-handling]). Such a change to the order of 264 inclusion of this attribute maximises the number of cases in which 265 NLRI can be extracted from an UPDATE. Where this is possible, it is 266 again required that the error handling mechanisms utilised should be 267 directly applied to the NLRI included in the UPDATE. 269 For all cases whereby NLRI can be obtained from an UPDATE message, it 270 is expected that the requirements outlined in Section 3 should be 271 considered by any enhancement to the BGP-4 protocol. 273 In the case that it is not possible to completely parse the NLRI 274 attribute from the UPDATE message received from a peer, it is 275 extremely likely that this is indicative of a serious error with 276 either the process of attribute packing, or buffer usage on the 277 remote BGP speaker. In this case, clearly, it is not possible to 278 apply any error handling mechanism that is limited to a specific set 279 of NLRI, since an implementation has no knowledge of the NLRI 280 included within the UPDATE message. In addition, such errors are 281 considered to be relatively fundamental to the operation of a BGP 282 implementation, and hence may indicate a case whereby significant 283 system errors have occurred. The current BGP-4 standard results in a 284 BGP speaker restarting a session with the remote BGP speaker. 285 However where such an error does occur, it is required that a 286 graceful mechanism is utilised to provide a lower impact to network 287 operation. The requirements for enhancements of this nature to BGP-4 288 are outlined in Section 5, with the requirements outlined therein 289 focused on providing a means by which system integrity can be 290 restored whilst allowing for continued network operation. 292 2.1. Classifying BGP Errors and Expected Error Handling 294 It is clearly of advantage for BGP-4 implementations to utilise a 295 consistent set of error handling mechanisms for the different types 296 of errors that are described in Section 2, and provide consistent 297 nomenclature to refer to them. It is therefore suggested that errors 298 that are indicative of larger scale failures of a BGP speaker, and 299 hence require some error handling at the session level are referred 300 to as 'critical' errors, whilst those errors that are identified 301 based on incorrect content of one of more attributes of a message are 302 referred to as 'semantic' errors. 304 The errors identified within the following sections consider only 305 those errors within the specifications at the time of writing, it is 306 recommended that in the definition of future extensions to the BGP-4 307 specification, the error handling behaviour (and the category within 308 which errors within the extension should be considered by an 309 implementation) is defined. 311 2.1.1. Critical BGP Errors 313 As described in this document, it is of advantage to limit the number 314 of 'critical' errors that occur within the protocol, therefore, based 315 on analysis of the processing of BGP UPDATE messages, it is required 316 that 'critical' error handling behaviour is applied to: 318 o UPDATE Message Length errors - whereby the specified overall 319 UPDATE message length is inconsistent with sum of the Total Path 320 Attribute and Withdrawn Routes length. In this case, this is 321 indicative of message packing failure, whereby the NLRI may not be 322 correctly extracted. 324 o Errors Parsing the NLRI attributes of an UPDATE message - where 325 NLRI is carried in either the IPv4-Unicast Advertised or Withdrawn 326 routes, or in the MP_REACH_NLRI or MP_UNREACH_NLRI attributes 327 [RFC2858], it is not possible to target error handling mechanisms 328 to specific NLRI, and hence session level mechanisms must be 329 utilised. 331 It is expected that those requirements outlined in Section 5 are 332 utilised to provide session-level handling of those errors identified 333 as 'critical'. 335 2.1.2. Semantic BGP Errors 337 Where a BGP message is correctly formed, a number of cases exist 338 whereby the contents of the UPDATE are not valid - in these cases, 339 this represents errors that can be identified to affect specific 340 NLRI. The following cases are expected to be classified as semantic 341 errors: 343 o Zero or invalid length errors in path attributes excluding those 344 containing NLRI, or where the length of all path attributes 345 contained within the UPDATE does not correspond to the total path 346 attributes length. In this case, the NLRI can be correctly 347 extracted, and hence acted upon. 349 o Messages where invalid data or flags are contained in a path 350 attribute that does not relate to the NLRI. 352 o UPDATE messages missing mandatory attributes, unrecognised non- 353 optional attributes or those that contain duplicate or invalid 354 attributes (be they unsupported or unexpected). 356 o Those messages where the NEXT_HOP, or MP_REACH next-hop values are 357 missing, length zero, or invalid for the relevant AFI/SAFI. 359 In these cases, it is expected that these errors can be handled 360 gracefully, following the requirements detailed in Section 3 and 361 Section 4 of this memo. 363 3. Avoiding use of NOTIFICATION 365 The error handling behaviour defined in RFC4271 is problematic due to 366 the limited options that are available to an implementation. When an 367 erroneous BGP message is received, at the current time, the 368 implementation must either ignore the error, or send a NOTIFICATION 369 message, after which it is mandatory to terminate the BGP session. 370 It is apparent that this requirement is at odds with that of protocol 371 robustness. 373 There is significant complexity to this requirement. The mechanism 374 defined in [I-D.chen-ebgp-error-handling] describes a means by which 375 no NOTIFICATION message is generated for all cases whereby NLRI can 376 be extracted from an UPDATE. The NLRI contained within the erroneous 377 UPDATE message is considered as though the remote BGP speaker has 378 provided an UPDATE marking it as withdrawn. This results in a limit 379 in the propagation of the invalid routing information, whilst also 380 ensuring that no traffic is forwarded via a previously-known path 381 that may no longer be valid. This mechanism is referred to as 382 "treat-as-withdraw". 384 Whilst this behaviour results in avoiding a NOTIFICATION message, 385 keeping other routing information advertised by the remote BGP 386 speaker within the RIB, it may result in unreachability for a sub-set 387 of the NLRI advertised by the remote speaker. Two cases should be 388 considered - that where the entry for a prefix in the Adj-RIB-In of 389 the neighbour propagating an erroneous packet is utilised, and that 390 where the prefix installed in the device's RIB is learnt from another 391 BGP speaker. In the former case, should the identified NLRI not be 392 treated as withdrawn, the original NLRI is utilised within the global 393 RIB. However, this information is potentially now invalid (i.e. it 394 no longer provides a valid forwarding path), whilst an alternate 395 (valid) path may exist in another Adj-RIB-In. By continuing to 396 utilise the NLRI for which the UPDATE was considered invalid, traffic 397 may be forwarded via an invalid path, resulting in routing loops, or 398 black-holing. In the second case, no impact to the forwarding of 399 traffic, or global RIB, is incurred, yet where treat-as-withdraw is 400 implemented, possibly stale routing information is purged from the 401 Adj-RIB-In of the neighbour propagating errors. 403 Whilst mechanisms such as "treat-as-withdraw" are currently 404 documented, the proposals are limited in their scope - particularly 405 in terms of restrictions to implementation only on eBGP sessions. 406 This limitation is made based on the view that the BGP RIB must be 407 consistent across an autonomous system. By implementing treat-as- 408 withdraw for a iBGP session, one or more routers within the 409 Autonomous System may not have reachability to a prefix, and hence 410 blackholing of traffic, or routing loops, may occur. It should, 411 however, be considered if this view is valid, in light of the manner 412 in which BGP is utilised within operator networks. Inconsistency in 413 a RIB based on a single UPDATE being treated as withdrawn may cause a 414 inconsistency in a single sub-topology (e.g. Layer 3 VPN service), 415 or a service not operating completely (in the case of an UPDATE 416 carrying service membership information). Where a NOTIFICATION and 417 teardown is utilised this is destructive to all sub-topologies in all 418 address family identifiers (AFIs) carried by the session in question. 419 Even where mechanisms such as multi-session BGP are utilised, a whole 420 AFI is affected by such a NOTIFICATION message. In terms of routing 421 operation, it is therefore far less costly to endure a situation 422 where a limited sub-set of routing information within an AS is 423 invalid, than to consider all routing information as invalid based on 424 a single trigger. 426 It is considered that, if extended to cover iBGP, the mechanisms 427 described in [I-D.chen-ebgp-error-handling] and 428 [I-D.ietf-idr-optional-transitive] provide a means to avoid the 429 transmission of a NOTIFICATION to a remote BGP speaker based on a 430 single erroneous message, where at all possible, and hence meet this 431 requirement. The failure cases whereby NLRI cannot be extracted from 432 the UPDATE message represent a case whereby the receiving system 433 cannot handle the error gracefully based on this mechanism. 435 4. Recovering RIB Consistency 437 The recommendations described in Section 3 may result in the RIB for 438 a topology within an AS being inconsistent across the AS' internal 439 routers. Alternatively, where such mechanisms are deployed at an AS 440 boundary, interconnects between two ASes may be inconsistent with 441 each other. There are therefore risks of traffic blackholing, due to 442 missing routing information, or forwarding loops. Whilst this is 443 deemed an acceptable compromise in the short term, clearly, it is 444 suboptimal. Therefore, a requirement exists to provide mechanisms by 445 which a BGP speaker is able to recover the consistency of the Adj- 446 RIB-In for a particular neighbour. 448 In the general case, the consistency of the BGP RIB can be recovered 449 by re-requesting the entire Adj-RIB-Out of a remote BGP speaker is 450 re-advertised. A mechanism to achieve this re-advertisement is 451 defined within the ROUTE-REFRESH specification [RFC2918]. It is 452 envisaged that by requesting a refresh of all NLRI advertised by a 453 BGP speaker, any NLRI which has been withdrawn due to being contained 454 within an invalid UPDATE message is re-learnt. Where a ROUTE REFRESH 455 is used to directly perform a consistency check between the Adj-RIB- 456 Out of a remote device, and the Adj-RIB-In of the local BGP speaker, 457 a demarcation between the ROUTE-REFRESH, and normal UPDATE messages 458 is required (in order that an "end" of the refresh can be used to 459 identify any 'stale' NLRI) - 460 [I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which 461 the ROUTE-REFRESH mechanism can be extended to meet this requirement. 463 Whilst re-advertisement of the whole BGP RIB provides a means by 464 which withdrawn NLRI can be re-advertised, there are some scaling 465 implications that must be considered. In the case that a ROUTE- 466 REFRESH is generated, all NLRI must be re-packed into UPDATE messages 467 and advertised by one speaker on the BGP session, whilst the other 468 must receive all UPDATE messages, and validate the RIB's consistency. 469 Clearly, it is advantageous to avoid this work where possible. 471 It is envisaged that during routing inconsistencies caused by 472 utilising the 'treat-as-withdraw' mechanism, the local BGP speaker is 473 aware that some routing information was not able to be processed - 474 due to the fact that an UPDATE message was not parsed correctly. 475 Since this mechanism (as discussed in Section 3) requires the local 476 BGP speaker to have determined the set of NLRI for which an erroneous 477 UPDATE message was received, it is possible to use a targeted 478 mechanisms to re-request the specific NLRI that was contained within 479 the erroneous UPDATE message. By re-requesting, this provides the 480 remote BGP speaker an opportunity to re-transmit the NLRI - possibly 481 providing an opportunity to leverage alternative methods to build the 482 UPDATE message. Such a request requires extension to the existing 483 BGP-4 protocol, in terms of specific UPDATE generation filters with a 484 transient lifetime. It is envisaged that the work within 485 [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted 486 elements of the Adj-RIB-In for a BGP neighbour to be recovered. 488 It is of particular note for both means of recovering RIB consistency 489 described that these are effective only when considering transitive 490 errors within an implementation - for instance, should an RFC 491 interpretation error within an implementation be present, regardless 492 of the number of times a specific UPDATE is generated, it is likely 493 that this error condition will persist (as it may with the existing 494 behaviour defined by [RFC4271]). For this reason, there is an 495 requirement to consider the means by which such consistency recovery 496 mechanisms are utilised. It is not advisable that a transitive 497 filter and advertisement mechanism is triggered by all error handling 498 events due to the load this is likely to place on the neighbour 499 receiving such a request. Where this BGP speaker is a relatively 500 centralised device - a route reflector (as described by [RFC4456]) 501 for example - the act of generation of UPDATE messages with such 502 frequency is likely to cause disproportionate load. It is therefore 503 an operational requirement of such mechanisms that means of request 504 dampening be required by any such extension. 506 5. Reducing the Impact of Session Reset 508 Even where protocol enhancements allow errors in the BGP-4 protocol 509 to cease to trigger NOTIFICATION messages, and hence reset a BGP 510 session, it is clear that some error conditions may not be exited. 511 In particular, errors due to existing state, or memory structures, 512 associated with a specific BGP session will not be handled. It is 513 therefore important to consider how these error conditions are 514 currently handled by the protocol. It should be noted that the 515 following discussion and analysis considers only those NOTIFICATION 516 messages generated in response to errors in UPDATE messages (as 517 defined by Section 6.3 in [RFC4271]). 519 The existing NOTIFICATION behaviour triggers a reset of all elements 520 of the BGP-4 session, as described in Section 6 of [RFC4271]. It is 521 expected that session teardown requires an implementation to re- 522 initialise all structures and state required for session maintenance. 523 Clearly, there is some utility to this requirement, as error 524 conditions in BGP are, in general, exited from. However, this 525 definition is responsible for the forwarding outages within networks 526 utilising BGP for propagation of routing or service when each error 527 is experienced. The requirement described in Section 3 is intended 528 to reduce the cases whereby a NOTIFICATION is required, however, any 529 mechanism implemented as a response to this requirement by definition 530 cannot provide a session reset to the extent of that achieved by the 531 current behaviour. 533 In order to address this, there is a requirement for a means by which 534 a BGP speaker can signal that an unhandled error condition in an 535 UPDATE message occurred - requiring a session reset - yet also 536 continue to utilise the paths advertised by the neighbour that are 537 currently in use within the RIB. In this case, the Adj-RIB-In 538 received from the neighbour is not considered invalid, despite a 539 NOTIFICATION, and session reset, being required. This set of 540 requirements is akin to those answered by the BGP Graceful Restart 541 mechanism described in [RFC4724]. Since the operational requirement 542 in this case is to provide a means to achieve a complete session 543 restart without disrupting the forwarding path of those prefixes in 544 use within a BGP speaker's RIB, it is expected that utilising a 545 procedure similar to the Graceful Restart mechanism meets the error 546 handling requirement. By responding to an error condition (repeated 547 or otherwise) with a message indicating that an error that cannot be 548 handled has occurred, forcing session reset, whilst retaining 549 forwarding information within the RIB allows forwarding to all 550 prefixes within a system's RIB to continue during the period in which 551 the session restarts. It is envisaged that the additional complexity 552 introduced by the introduction of such a mechanism can be limited by 553 extending existing BGP messages - one such approach is proposed in 555 [I-D.ietf-idr-bgp-gr-notification]. By placing a time bound on the 556 restart lifetime, should an error condition not be transient - for 557 example, should an error have occurred with the BGP process, rather 558 than a specific of the BGP session - the remote BGP speaker is still 559 detected as an invalid device for forwarding. 561 It should be noted that a protocol enhancement meeting this 562 requirement is not able to solve all error conditions - however, a 563 complete restart of the BGP and TCP session between two BGP speakers 564 implements an identical recovery mechanism to that which is achieved 565 by the existing behaviour. Where an error condition such as memory 566 or configuration corruption has occurred in a BGP implementation, it 567 is expected that a mechanism meeting this requirement continues to 568 detect this, by means of a bound on time for session restart to 569 occur. Whilst there may be some consideration that packets continue 570 to be forwarded through a device which can be in an failure mode of 571 this nature for a longer period due to this requirement, the 572 architecture of modern IP routers should be considered. A divided 573 forwarding and control plane is common in many devices, as well as 574 process separation for software-based devices - corruption of a 575 specific protocol daemon does not necessarily imply forwarding is 576 affected. Indeed, where forwarding behaviour of a device is 577 affected, it is envisaged that a failure detection mechanism (be it 578 Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) 579 will detect such a failure in almost all cases, with the symptomatic 580 behaviour of such a failure being an invalid UPDATE message in very 581 few other cases. 583 6. Operational Toolset for Monitoring BGP 585 A significant complexity that is introduced through the requirements 586 defined in this document is that of monitoring BGP session status for 587 an operator. Although the existing error handling behaviour causes a 588 disproportionate failure, session failure is extremely visible to 589 most operational personnel within a Network Operator due to both 590 existing definitions of SNMP trap mechanisms for BGP, along with the 591 forwarding impact typically caused by such a failure. By introducing 592 mechanisms by which errors of this nature are not as visible, this is 593 no longer the case. There is a requirement that where subsets of the 594 RIB on a device are no longer reachable from a BGP speaker, or indeed 595 an AS, that some visibility of this situation, alongside a mechanism 596 to determine the cause is available to an operator. Whilst, to some 597 extent, this can be solved by mandating a sub-requirement of each of 598 the aforementioned requirements that a BGP speaker must log where 599 such errors occur, and are hence handled, this does not solve all 600 cases. In order to clarify this requirement, the example of the 601 transmission of an erroneous Optional Transitive attribute can be 602 considered. Since, by definition, there is no requirement for all 603 BGP speakers to parse such an attribute, a receiving router may treat 604 NLRI as withdrawn based on an erroneous attribute not examined by its 605 neighbour. In this case, the upstream device or network, propagating 606 the UPDATE, has no visibility of this error. Operationally, however, 607 it is of interest to the upstream router operator that such invalid 608 information was propagated. 610 The requirement for logging of error conditions in transmitted BGP 611 messages, which are visible to only the receiver, cannot be achieved 612 by any existing BGP message, or capability. It is envisaged that 613 each erroneous event should be transmitted to the remote peer - 614 including the information as to the set of NLRI that were considered 615 invalid. Whilst with some mechanisms this is achieved by default 616 (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf] 617 (Outbound Route Filtering) will transmit the set of prefixes that are 618 required), the operator requirement is to know which prefixes may 619 have been unreachable in all cases. It is envisaged that an 620 extension to meet this requirement will allow for such information to 621 be transmitted between peers, and hence logged. Such a mechanism may 622 provide further utility as a either a diagnostic, or logging toolset. 624 As such, it is possible to divide the messages that are required in 625 order to provide further visibility into BGP for an operator. Such a 626 division can be made both due to the required means of message 627 transmission, alongside the criticality of each request. 629 o Messages required to replace NOTIFICATION - In cases where the 630 error handling mechanisms defined by [RFC4271] currently result in 631 a NOTIFICATION message being generated, a number of the 632 requirements detailed within this document result this message 633 being suppressed. Despite this change, the error condition's 634 occurrence is still of interest to an operator in order to provide 635 both monitoring and troubleshooting capabilities, since some form 636 of invalid data has been received on a session. It therefore 637 considered that an implementation must generate a message both 638 locally, and transmitted to the remote peer, based on the such a 639 condition. Where such a message is transmitted to the remote 640 peer, it is considered that the BGP session via which the 641 erroneous UPDATE message was received should be used as transport 642 to the remote peer. The information transmitted in such a message 643 should be minimised to allow identification of the paths which 644 were considered erroneous (i.e. restricting the information to 645 that which is directly relevant to a network operator in the case 646 of an error condition occurring). Any delay to convergence on the 647 session in question is considered to be acceptable, given the 648 suboptimal nature of the reception of invalid routing information 649 via a BGP session. Further concerns regarding such a mechanism 650 relate to the load generated on the BGP speaker in question, 651 however, it must be considered that in the case of an erroneous 652 UPDATE being received, and the 'treat-as-withdraw' mechanism being 653 utilised, where the erroneous path is removed from the Loc-RIB, 654 there is likely to be a requirement to generate UPDATE messages 655 withdrawing the prefix from all further BGP speakers to which the 656 prefix is advertised. The load generated by the generation of 657 such UPDATEs is likely to be much greater than that of 658 transmitting error information via a logging message type back to 659 the speaker from which it was received. It is envisaged that 660 light-weight BGP message-based signalling mechanisms such as the 661 ADVISORY message types detailed in 662 [I-D.ietf-idr-operational-message] provide a suitable means to 663 satisfy this requirement. 665 o Additional Diagnostic Capabilities for BGP - In a number of cases, 666 there is an operational requirement to further debug erroneous BGP 667 UPDATE messages, along with the particulars of the state of a BGP 668 speaker. For instance, where an invalid BGP UPDATE message is 669 transmitted between two BGP speakers, the exact format of the 670 UPDATE message is of interest to an operator, as this information 671 provides a clear indication of an message considered to be 672 erroneous by the BGP speaker to which it was transmitted. In this 673 case, it is considered of great utility that the entire UPDATE 674 message is transmitted back to the advertising speaker, in order 675 to allow for further debugging to occur. Whilst such information 676 is particularly useful to an operator, it clearly provides 677 information that is not key to protocol operation - for this 678 reason, it is expected that some of the concerns regarding the 679 additional complexity, and load that a BGP speaker is subjected to 680 is not acceptable. For this reason, it is required that where 681 mechanisms are developed to support this requirement, messages of 682 this nature can be supported both within an existing BGP session, 683 and via a dedicated separate session, be it BGP carrying messages 684 such as those defined in [I-D.ietf-idr-operational-message] or a 685 dedicated monitoring protocol akin to BMP described in 686 [I-D.ietf-grow-bmp]. 688 Whilst the operational requirement for such monitoring tools to allow 689 for visibility into BGP is clearly agreed upon, the means by which 690 such messages are transmitted between two BGP speakers is likely to 691 be dependent upon both the positions of the speakers in question (for 692 instances, the requirements for such a protocol may differ where a 693 session is between two ASBRs under separate administration). The 694 introduction of additional message types to the BGP protocol clearly 695 introduces further complexity - and leaves room for further 696 implementation and standardisation errors that may compromise the 697 robustness of the BGP protocol. In addition, the queuing and 698 scheduling of these BGP messages must be interleaved with the 699 transmission of the key protocol messages - such as KEEPALIVE and 700 UPDATE packets. It is therefore a concern that should a large number 701 of messages specifically for operational visibility be transmitted, 702 this will delay the transmission of UPDATE packets, and hence 703 adversely affect the end-to-end convergence time for NLRI carried 704 within BGP. The operational requirement for why messages are 705 advantageous to be in-band to a protocol should also be considered. 706 In particular, it should be noted that where such information is to 707 be transmitted between administrative boundaries a BGP session 708 represents an existing channel exists between the two ASes. This 709 channel is considered to be secure insofar as the routing 710 information, and requests sent via the session are considered to come 711 from a trusted source. Since error information relates to both a 712 particular attachment, and is key to ensuring that such a session is 713 operating as expected, it is considered of great operational benefit 714 that this information is transmitted over this channel. In addition, 715 the overall system scalability is improved by such in-band 716 transmission. It is expected that erroneous information resulting in 717 the 'treat-as-withdraw' mechanism being utilised is relatively 718 infrequently transmitted between two peers (when compared to the 719 frequency of UPDATE messages transmission). The impact of including 720 an additional BGP message type for such operational visibility is 721 relatively small from a resource utilisation perspective - additional 722 processing overhead is only experienced when such a message is 723 received. Where a separate session is maintained, particular network 724 elements within a service provider topology may require hundreds, or 725 thousands, of additional sessions for the transmission of this 726 information. Such an resource consumption overhead is likely to be 727 unacceptable to some network operators. 729 For the reasons explained above, it is expected that mechanisms 730 specified to meet the requirements for event visibility consider the 731 relative impacts of additional monitoring sessions, or message 732 inclusion in band to BGP in order not to compromise the security, 733 scalability and robustness of the BGP-4 protocol. 735 7. Operational Complexities Introduced by Altering RFC4271 737 The existing NOTIFICATION and subsequent teardown of a BGP session 738 upon encountering an error has the advantage that a consistent 739 approach to error handling is required of all implementations of the 740 BGP-4 protocol. This is of operational advantage as it provides a 741 clear expectation of the behaviour of the protocol. The requirements 742 defined herein add further complexity to the error-handling within 743 BGP, and hence are liable to compromise the existing deterministic 744 protocol behaviour. It is therefore deemed that there is a further 745 requirement to define a set of recommended behaviours based on the 746 reception of a particular class of erroneous UPDATE message, 747 alongside highlighting some of the implementation complexities that 748 may need to be handled in the case that particular recommendations 749 made within this memo are deployed. 751 Utilising the classes of erroneous UPDATE message described in 752 Section 2, the recommended behaviour for a BGP-4 implementation can 753 be divided into two branches. Primarily, where a semantic error is 754 identified, an implementation is expected to utilise the reduced- 755 impact error handling approach, as described in Section 3. In the 756 case that such an approach results in known NLRI being withdrawn from 757 the BGP speaker's RIB, and an implementation provides functionality 758 such that these errors are recovered from through an automatically 759 triggered means, such as those described within Section 4, some 760 consideration of the scalability of these recovery mechanisms is 761 required. Clearly, there is an computational and bandwidth overhead 762 associated with the re-advertisement of NLRI between two BGP speakers 763 - both due to the generation of UPDATE messages, their transmission 764 between the two speakers, and the parsing and processing into the RIB 765 required. This overhead is directly proportional to the number of 766 UPDATE messages that are required. Where a semantic error is 767 experienced, by definition the NLRI contained within the UPDATE can 768 be extracted. It is therefore possible to minimise the proportion of 769 the RIB that is re-advertised by targeting any recovery mechanism on 770 the NLRI contained within the erroneous UPDATE. Such a targeted 771 mechanism can be achieved through a means such as One-Time ORF, or 772 other means of targeting UPDATE messages not discussed within this 773 memo. It is recommended that where available, any automatic (or 774 manual) triggered recovery mechanism behaviour utilises such targeted 775 means in preference to any whole RIB refresh mechanism (such as 776 ROUTE-REFRESH). 778 In the case that an erroneous UPDATE has been processed through a 779 means such as treat-as-withdraw (described within Section 3), a 780 recovering mechanism may be considered superfluous, if the assumption 781 is made that the RIB inconsistency will only be recovered from based 782 on a path re-convergence (or change in BGP attribute) for the 783 advertising BGP speaker. However, where this assumption is not 784 considered to provide adequate recovery behaviour, and a mechanism to 785 restore RIB consistency automatically is implemented, some 786 consideration must be made for where repeated erroneous messages 787 occur. In this case, in order to limit the impact to the BGP 788 speaker's network operation, at a pre-defined point it is recommended 789 that such automatic recovery mechanisms towards the BGP speaker from 790 which erroneous UPDATEs are repeatedly received are suppressed, and 791 the fact that such suppression has occurred is highlighted to an 792 operator. The point at which such behaviour is suppressed is to be 793 defined on a per-implementation basis, taking into account feedback 794 from the Network Operator community based on the deployment of the 795 recommendations described in this document. It is expected that such 796 trigger points are dependent upon the mechanisms implemented for a 797 particular BGP-4 implementations, and the impact upon the speaker of 798 these means of RIB recovery. 800 Where critical errors are experienced, such that a session reset is 801 required, the mechanism discussed in Section 5 should be used. 802 Again, since such a mechanism results in a restart of a BGP session, 803 it expected that all NLRI carried over the session is re-advertised 804 as it is re-established, incurring processing overhead on both the 805 advertising and receiving BGP speaker. In order to minimise the 806 consumption of control-plane computational resource on both speakers, 807 it is recommended that mechanisms allowing a reduced set of BGP 808 UPDATE messages to be re-transmitted between two speakers are 809 employed wherever possible - for instance through employing 810 mechanisms such as those described in [I-D.ietf-idr-enhanced-gr]. 812 In the case that repeated critical errors occur, the overhead of 813 performing any mechanism implemented based on the requirements in 814 Section 5 is incurred following each erroneous UPDATE message. Since 815 these mechanisms are, by definition, performed automatically in 816 response to the erroneous message being received similar 817 considerations as to the impact to the BGP speaker must be taken into 818 account. As such, it is expected that after a certain trigger level, 819 the ongoing receipt of critical errors within BGP UPDATE messages is 820 deemed to be indicative of a long-lasting failure, and a session no 821 longer considered viable. Where such an case is experienced, it is 822 expected that the BGP session reverts to the standard session failure 823 behaviour, as described in [RFC4271] and documents updating this base 824 standard. Where such a reversion is implemented this condition 825 should be flagged to an network operator. The number of restart 826 attempts before the session reverts to being shut down should be 827 determined based on the overhead of the recovery mechanisms 828 implemented (for instance, where [I-D.ietf-idr-enhanced-gr] is 829 implemented, the impact of session restart may be significantly 830 lower), and operational experience of the deployment of the 831 recommendations described in this document. 833 Since repeated erroneous UPDATE messages which experience critical 834 errors may be indicative of long-lasting failure modes, it is 835 recommended that a back-off from restarting BGP sessions experiencing 836 such behaviour is implemented. As such, this is not applicable to 837 restart behaviour through means such as those described in Section 5 838 since such restarts are time-bound based on the period for which the 839 Adj-RIB-In from a BGP speaker is maintained as valid (e.g., when 840 considering BGP Graceful Restart, such restarts are time-bound by the 841 Restart Time described in [RFC4724]). However, following a session 842 reverting to being pulled down based on repeated error conditions, it 843 is recommended that following restart attempts are subject to an 844 exponentially increasing interval between subsequent attempts. It is 845 therefore recommended that in such cases an implementation implements 846 the increasing values of IdleHoldTimer as described in the BGP-4 FSM 847 documented in [RFC4271]. 849 7.1. Reducing the Network Impact of Session Teardown 851 As discussed within the preceding section, where repeated critical 852 UPDATE message errors are received, it is recommended that the impact 853 to the both advertising and receiving BGP-4 speakers be limited by 854 reverting to tearing the BGP-4 session experiencing such errors down. 855 The BGP-4 specification presented in [RFC4271] achieves such a 856 session shutdown by sending a NOTIFICATION message, however, this has 857 the net result that all downstream BGP speakers (i.e. those to whom 858 the NLRI carried over the now ceased BGP session was readvertised) 859 must withdraw this NLRI from their RIB, and perform a best-path 860 selection if required. In some cases, there may be no alternate path 861 being available, and hence a period of time for which no valid BGP 862 route exists. Particularly, this is very likely to occur where an 863 upstream BGP speaker performs a best-path selection and advertises 864 only a single path to its neighbours - there is a requirement for the 865 upstream speaker to perform a best-path selection, and re-advertise a 866 new set of NLRI before the downstream system is able to converge to a 867 new path. It should be noted that where UPDATE messages withdrawing 868 NLRI are not subject to the BGP session's configured 869 MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements 870 are, this may result in a BGP speaker being without a path for a 871 period up to the MRAI. 873 Clearly, it is advantageous to avoid this period of time for which 874 there may be no reachability for a set of NLRI, especially since the 875 BGP speaker terminating a particular session is doing so due to a 876 particular error handling policy. The graceful shutdown mechanism 877 detailed in [I-D.ietf-grow-bgp-gshut] provides a mechanism by which a 878 BGP speaker is able to signal that a set of NLRI is to be withdrawn, 879 and hence allow downstream systems to pre-emptively perform a best- 880 path selection, and hence advertise new reachability information in a 881 make-before-break manner. 883 It is therefore envisaged, that where a session is to be shutdown, 884 based on a trigger relating to erroneous UPDATE messages being 885 received (be they repeated or not) that the graceful shutdown 886 procedure in utilised, so as to reduce the forwarding impact of NLRI 887 received on the session being withdrawn. 889 8. IANA Considerations 891 This memo includes no request to IANA. 893 9. Security Considerations 895 The requirements outlined in this document provide mechanisms by 896 which erroneous BGP messages may be responded to with limited impact 897 to forwarding operation. This is of benefit to the security of a BGP 898 speaker in general. Where UPDATE messages may have been propagated 899 by a single malicious Autonomous System or router within a network 900 (or the Internet default free zone - DFZ), which are then propagated 901 to all devices within the same routing domain, all other NLRI 902 available over the same session become unreachable. This mechanism 903 may provide means by which an Autonomous System can be isolated from 904 required routing domains (such as the Internet), should the relevant 905 UPDATE messages be propagated via specific paths. By reducing the 906 impact of such failures, it is envisaged that this possibility may be 907 constrained to a specific set of NLRI, or a specific topology. 909 Some mechanisms meeting the requirements specified in this document, 910 particularly those within Section 6 may provide further security 911 concerns, however, it is envisaged that these are addressed in per- 912 enhancement memos. 914 10. Acknowledgements 916 The author would like to thank the following network operators for 917 their insight, and valuable input in defining the requirements for a 918 variety of operational deployments of the BGP-4 protocol; Shane 919 Amante, Bruno Decraene, Rob Evans, David Freedman, Wes George, Tom 920 Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom 921 Scholl and Ilya Varlashkin. 923 In addition, many thanks are extended to Jeff Haas, Wim Hendrickx, 924 Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert 925 Raszuk for their expertise relating to implementations of the BGP-4 926 protocol. 928 11. References 930 11.1. Normative References 932 [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, 933 "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. 935 [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, 936 September 2000. 938 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 939 Protocol 4 (BGP-4)", RFC 4271, January 2006. 941 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 942 Networks (VPNs)", RFC 4364, February 2006. 944 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 945 Reflection: An Alternative to Full Mesh Internal BGP 946 (IBGP)", RFC 4456, April 2006. 948 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 949 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 950 January 2007. 952 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 953 "Multiprotocol Extensions for BGP-4", RFC 4760, 954 January 2007. 956 11.2. Informational References 958 [I-D.chen-ebgp-error-handling] 959 Chen, E., Mohapatra, P., and K. Patel, "Revised Error 960 Handling for BGP Updates from External Neighbors", 961 draft-chen-ebgp-error-handling-01 (work in progress), 962 September 2011. 964 [I-D.ietf-grow-bgp-gshut] 965 Francois, P., Decraene, B., Pelsser, C., Patel, K., and C. 966 Filsfils, "Graceful BGP session shutdown", 967 draft-ietf-grow-bgp-gshut-03 (work in progress), 968 December 2011. 970 [I-D.ietf-grow-bmp] 971 Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring 972 Protocol", draft-ietf-grow-bmp-06 (work in progress), 973 December 2011. 975 [I-D.ietf-idr-bgp-enhanced-route-refresh] 976 Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 977 Route Refresh Capability for BGP-4", 978 draft-ietf-idr-bgp-enhanced-route-refresh-01 (work in 979 progress), December 2011. 981 [I-D.ietf-idr-bgp-gr-notification] 982 Patel, K., Fernando, R., and J. Scudder, "Notification 983 Message support for BGP Graceful Restart", 984 draft-ietf-idr-bgp-gr-notification-00 (work in progress), 985 December 2011. 987 [I-D.ietf-idr-enhanced-gr] 988 Patel, K., Chen, E., Fernando, R., and J. Scudder, 989 "Accelerated Routing Convergence for BGP Graceful 990 Restart", draft-ietf-idr-enhanced-gr-00 (work in 991 progress), December 2011. 993 [I-D.ietf-idr-operational-message] 994 Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL 995 Message", draft-ietf-idr-operational-message-00 (work in 996 progress), March 2012. 998 [I-D.ietf-idr-optional-transitive] 999 Scudder, J., Chen, E., Mohapatra, P., and K. Patel, 1000 "Revised Error Handling for BGP UPDATE Messages", 1001 draft-ietf-idr-optional-transitive-04 (work in progress), 1002 October 2011. 1004 [I-D.zeng-one-time-prefix-orf] 1005 Zeng, Q. and J. Dong, "One-time Address-Prefix Based 1006 Outbound Route Filter for BGP-4", 1007 draft-zeng-one-time-prefix-orf-01 (work in progress), 1008 October 2010. 1010 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1011 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 1012 June 2010. 1014 Author's Address 1016 Rob Shakir 1017 BT 1018 pp C3L 1019 BT Centre 1020 81, Newgate Street 1021 London EC1A 7AJ 1022 UK 1024 Email: rob.shakir@bt.com 1025 URI: http://www.bt.com/