idnits 2.17.1 draft-ietf-grow-ops-reqs-for-bgp-error-handling-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 15, 2011) is 4757 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5881' is defined on line 675, but no explicit reference was found in the text == Outdated reference: A later version (-01) exists of draft-chen-ebgp-error-handling-00 == Outdated reference: A later version (-17) exists of draft-ietf-grow-bmp-05 == Outdated reference: A later version (-04) exists of draft-ietf-idr-optional-transitive-03 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force R. Shakir 3 Internet-Draft C&W 4 Intended status: Informational April 15, 2011 5 Expires: October 17, 2011 7 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 8 draft-ietf-grow-ops-reqs-for-bgp-error-handling-00 10 Abstract 12 BGP-4 is utilised as a key intra- and inter-Autonomous System routing 13 protocol in modern IP networks. The failure modes as defined by the 14 original protocol standards are based on a number of assumptions 15 around the impact of session failure. Numerous incidents both in the 16 global Internet routing table and within Service Provider networks 17 have been caused by strict handling of a single invalid UPDATE 18 message causing large-scale failures in one or more Autonomous 19 Systems. 21 This memo describes the current use of BGP-4 within Service Provider 22 networks, and outlines a set of requirements for further work to 23 enhance the mechanisms available to a BGP-4 implementation when 24 erroneous data is detected. Whilst this document does not provide 25 specification of any standard, it is intended as an overview of a set 26 of enhancements to BGP-4 to improve the protocol's robustness to suit 27 its current deployment. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on October 17, 2011. 46 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 65 1.2. Overview of Operator Requirements for BGP-4 Error 66 Handling . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 6 68 3. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 8 69 4. Reducing the Impact of Session Reset . . . . . . . . . . . . . 10 70 5. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 12 71 6. Operational Complexities Introduced by Altering RFC4271 . . . 14 72 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 73 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 74 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 75 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 76 10.1. Normative References . . . . . . . . . . . . . . . . . . . 20 77 10.2. Informational References . . . . . . . . . . . . . . . . . 21 78 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 80 1. Introduction 82 Where BGP-4 [RFC4271] is deployed in the Internet and Service 83 Provider networks, numerous incidents have been recorded due to the 84 manner in which [RFC4271] specifies errors in routing information 85 should be handled. Whilst the behaviour defined in the existing 86 standards retains utility, the deployments of the protocol have 87 changed within modern networks, resulting in significantly different 88 demands for protocol robustness. Whilst a number of Internet Drafts 89 have been written to begin to enhance the behaviour of BGP-4 in terms 90 of the handling of erroneous messages, this draft intends to define a 91 set of requirements for ongoing work. These requirements are 92 considered from the perspective of a Network Operator, and hence this 93 draft does not intend to define the protocol mechanisms by which such 94 error handling behaviour is to be implemented. 96 1.1. Role of BGP-4 in Service Provider Networks 98 BGP was designed as an inter-Autonomous System (AS) routing protocol 99 and hence many of the error handling mechanisms within the protocol 100 specification are designed to be conducive to this role. In general, 101 this consideration as an inter-AS routing propagation mechanism 102 results in the view that a BGP session propagates a relatively small 103 amount of network-layer reachability information (NLRI) between two 104 ASes. In this case, it is the expectation of session resilience for 105 those adjacencies that are key to routing continuity (for example, it 106 is expected that two networks peering via BGP would connect multiple 107 times in order to safeguard equipment or protocol failure). In 108 addition, there is some expectation of multiple paths to a particular 109 NLRI being available - it would be expected that a network can fall 110 back to utilising alternate, less direct, paths where a failure of a 111 more direct path occurs. 113 Traditional network architectures would deploy an Interior Gateway 114 Protocol (IGP) to carry infrastructure and customer prefixes, with an 115 Exterior Gateway Protocol (EGP) such as BGP being utilised to 116 propagate these prefixes to other Autonomous Systems. However, with 117 the growth of IP-based services, this is no longer considered best 118 practice. In order to ensure that convergence is within acceptable 119 time bounds, the amount of routing information carried within the IGP 120 is significantly reduced - and tends to be only infrastructure 121 prefixes. iBGP is then utilised to propagate both customer, and 122 external prefixes within an AS. As such, BGP has become an IGP, with 123 traditional IGPs acting as a means by which to propagate the routing 124 information which is required to establish a BGP session, and reach 125 the egress node within the local routing domain. This change in role 126 presents different requirements for the robustness of BGP as a 127 routing protocol - with the expectation of similar level of 128 robustness to that of an IGP being set. 130 Along with this change in role, the nature of the IP routing 131 information that is carried has changed. BGP has become a ubiquitous 132 means by which service information can be propagated between devices. 133 For instance, BGP is utilised to carry routing information for IP/ 134 MPLS VPN services as described in [RFC4364]. Since there is an 135 existing deployment of the protocol between PE devices in numerous 136 networks, it has been adapted to propagate this routing information, 137 as its use limits number of routing protocols required on each 138 device. This additional information being propagated represents a 139 large change in requirement for the error handling of the protocol - 140 where session failure occurs, it is likely a complete service outage 141 for at least a subset of a network's customers is experienced where 142 an erroneous packet may have occurred within a different sub-topology 143 or even service (a different address family for example). For this 144 reason, there is a significant demand to avoid service affecting 145 failures that may be triggered by routing information within a single 146 sub-topology or service. 148 Both within Internet and multi-service routing architectures, a 149 number of BGP sessions propagate a large proportion of the required 150 routing information for network operation. For Internet routing, 151 these are typically BGP sessions which propagate the global routing 152 table to an AS - failure of these sessions may have a large impact on 153 network service, based on a single erroneous update. In an multi- 154 service environment, typical deployments utilise a small number of 155 core-facing BGP sessions, typically towards route reflector devices. 156 Failure of these sessions may also result in a large impact to 157 network operation. Clearly, the avoidance of conditions requiring 158 these sessions to fail is of great utility to any network operator, 159 and provides further motivation for the revision of the existing 160 behaviour. 162 Whilst the behaviour in [RFC4271] is suited to ensuring that BGP 163 messages with erroneous routing information in are limited in scope 164 (by means of session reset), with the above considerations, it is 165 clear that this mechanism is not suited to all deployments. It 166 should, however, be noted that the change in scope affects the 167 handling only of errors occurring after BGP session establishment. 168 There is no current operational requirement to amend the means by 169 which error handling in session establishment, or liveliness 170 detection, are performed. 172 1.2. Overview of Operator Requirements for BGP-4 Error Handling 174 It is the intention of this document to define a set of criteria for 175 the manner in which a revised error handling mechanism in BGP-4 is 176 required to conform. The motivation for the definition of these 177 requirements can be summarised based on certain behaviour currently 178 present in the protocol that is not deemed acceptable within current 179 operational deployments, or where there is a short-fall in the tool 180 set available to an operator. These key requirements can be 181 summarised as follows: 183 o It is unacceptable within modern deployments of the BGP-4 protocol 184 that a single erroneous UPDATE packet affects prefixes that it 185 does not carry. This requirement therefore requires some 186 modification to the means by which erroneous UPDATE packets are 187 handled, and reacted to - with a particular focus on avoiding the 188 use of the NOTIFICATION message. 190 o It is recognised that some error conditions may occur within the 191 BGP-4 protocol may not always be handled gracefully, and may 192 result in conditions whereby an implementation cannot recover. In 193 these (and similar) cases, it is unacceptable for an operator that 194 this reset of the BGP-4 session results in interruption to 195 forwarding packets (by means of withdrawing prefixes installed by 196 BGP-4 into a device's RIB, and subsequently FIB). To this end, 197 there is a requirement to define a session reset mechanism which 198 provides session re-initialisation in a non-destructive manner. 200 o Further to the requirements to provide a more robust protocol, the 201 current visibility into error conditions within the BGP-4 protocol 202 is extremely limited - where further modifications to this 203 behaviour are to be made, complexity is likely to be added. Thus, 204 to ensure that BGP-4 is manageable, there are requirements for 205 mechanisms by which the protocol can be examined and monitored. 207 This document describes each of these requirements in further depth, 208 along with an overview of means by which they are expected to be 209 achieved. In addition, the mechanism by which the enhancements 210 meeting these requirements are to interact is discussed. 212 2. Avoiding use of NOTIFICATION 214 The error handling behaviour defined in RFC4271 is problematic due to 215 the limited options that are available to an implementation. When an 216 erroneous BGP message is received, at the current time, the 217 implementation must either ignore the error, or send a NOTIFICATION 218 message, after which it is mandatory to terminate the BGP session. 219 It is apparent that this requirement is at odds with that of protocol 220 robustness. 222 There is significant complexity to this requirement. The mechanism 223 defined in [I-D.chen-ebgp-error-handling] describes a means by which 224 no NOTIFICATION message is generated for all cases whereby NLRI can 225 be extracted from an UPDATE. The NLRI contained within the erroneous 226 UPDATE message is considered as though the remote BGP speaker has 227 provided an UPDATE marking it as withdrawn. This results in a limit 228 in the propagation of the invalid routing information, whilst also 229 ensuring that no traffic is forwarded via a previously-known path 230 that may no longer be valid. This mechanism is referred to as 231 "treat-as-withdraw". 233 Whilst this behaviour results in avoiding a NOTIFICATION message, 234 keeping other routing information advertised by the remote BGP 235 speaker within the RIB, it may result in unreachability for a sub-set 236 of the NLRI advertised by the remote speaker. Two cases should be 237 considered - that where the entry for a prefix in the Adj-RIB-In of 238 the neighbour propagating an erroneous packet is utilised, and that 239 where the prefix installed in the device's RIB is learnt from another 240 BGP speaker. In the former case, should the identified NLRI not be 241 treated as withdrawn, the original NLRI is utilised within the global 242 RIB. However, this information is potentially now invalid (i.e. it 243 no longer provides a valid forwarding path), whilst an alternate 244 (valid) path may exist in another Adj-RIB-In. By continuing to 245 utilise the NLRI for which the UPDATE was considered invalid, traffic 246 may be forwarded via an invalid path, resulting in routing loops, or 247 black-holing. In the second case, no impact to the forwarding of 248 traffic, or global RIB, is incurred, yet where treat-as-withdraw is 249 implemented, possibly stale routing information is purged from the 250 Adj-RIB-In of the neighbour propagating errors. 252 Whilst mechanisms such as "treat-as-withdraw" are currently 253 documented, the proposals are limited in their scope - particularly 254 in terms of restrictions to implementation only on eBGP sessions. 255 This limitation is made based on the view that the BGP RIB must be 256 consistent across an autonomous system. By implementing treat-as- 257 withdraw for a iBGP session, one or more routers within the 258 Autonomous System may not have reachability to a prefix, and hence 259 blackholing of traffic, or routing loops, may occur. It should, 260 however, be considered if this view is valid, in light of the manner 261 in which BGP is utilised within operator networks. Inconsistency in 262 a RIB based on a single UPDATE being treated as withdrawn may cause a 263 inconsistency in a single sub-topology (e.g. Layer 3 VPN service), 264 or a service not operating completely (in the case of an UPDATE 265 carrying service membership information). Where a NOTIFICATION and 266 teardown is utilised this is destructive to all sub-topologies in all 267 address family identifiers (AFIs) carried by the session in question. 268 Even where mechanisms such as multi-session BGP are utilised, a whole 269 AFI is affected by such a NOTIFICATION message. In terms of routing 270 operation, it is therefore far less costly to endure a situation 271 where a limited sub-set of routing information within an AS is 272 invalid, than to consider all routing information as invalid based on 273 a single trigger. 275 It is considered that, if extended to cover iBGP, the mechanisms 276 described in [I-D.chen-ebgp-error-handling] and 277 [I-D.ietf-idr-optional-transitive] provide a means to avoid the 278 transmission of a NOTIFICATION to a remote BGP speaker based on a 279 single erroneous message, where at all possible, and hence meet this 280 requirement. The failure cases whereby NLRI cannot be extracted from 281 the UPDATE message represent a case whereby the receiving system 282 cannot handle the error gracefully based on this mechanism. 284 3. Recovering RIB Consistency 286 The recommendations described in Section 2 may result in the RIB for 287 a topology within an AS being inconsistent across the AS' internal 288 routers. Alternatively, where such mechanisms are deployed at an AS 289 boundary, interconnects between two ASes may be inconsistent with 290 each other. There are therefore risks of traffic blackholing, due to 291 missing routing information, or forwarding loops. Whilst this is 292 deemed an acceptable compromise in the short term, clearly, it is 293 suboptimal. Therefore, a requirement exists to provide mechanisms by 294 which a BGP speaker is able to recover the consistency of the Adj- 295 RIB-In for a particular neighbour. 297 It is envisaged that during such routing inconsistencies, the local 298 BGP speaker is aware that some routing information was not able to be 299 processed - due to the fact that an UPDATE message was not parsed 300 correctly. If the 'treat-as-withdraw' mechanism described within 301 Section 2 is utilised, it is also possible for the local BGP speaker 302 to have determined the set of NLRI for which an erroneous UPDATE 303 message was received. In this scenario, by utilising targeted 304 mechanisms to re-request the specific NLRI that was unreachable, this 305 routing information can be re-transmitted from the remote BGP 306 speaker. Such a request requires extension to the existing BGP-4 307 protocol, in terms of specific UPDATE generation filters with a 308 transient lifetime. It is envisaged that the work within 309 [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted 310 elements of the Adj-RIB-In for a BGP neighbour to be recovered. 312 In addition to such cases where specific routing information is known 313 to be erroneous, the more general case where either a large amount of 314 the Adj-RIB-In is contained in UPDATE messages subject to treat-as- 315 withdraw, or the specific prefixes are unknown to the local BGP 316 speaker must be considered. In this case, there is a requirement for 317 a BGP speaker to re-request the entire RIB advertised by a remote 318 neighbour. In this case, where such re-advertisement is required, it 319 is envisaged that a ROUTE-REFRESH as per the description in [RFC2918] 320 is utilised. [I-D.keyur-bgp-enhanced-route-refresh] provides a means 321 by which the ROUTE-REFRESH mechanism can be extended in order to meet 322 this requirement. 324 It is of particular note for both means of recovering RIB consistency 325 described that these are effective only when considering transitive 326 errors within an implementation - for instance, should an RFC 327 interpretation error within an implementation be present, regardless 328 of the number of times a specific UPDATE is generated, it is likely 329 that this error condition will persist. For this reason, there is an 330 requirement to consider the means by which such consistency recovery 331 mechanisms are utilised. It is not advisable that a transitive 332 filter and advertisement mechanism is triggered by all error handling 333 events due to the load this is likely to place on the neighbour 334 receiving such a request. Where this BGP speaker is a relatively 335 centralised device - a route reflector (as described by [RFC4456]) 336 for example - the act of generation of UPDATE messages with such 337 frequency is likely to cause disproportionate load. It is therefore 338 an operational requirement of such mechanisms that means of request 339 dampening be required by any such extension. 341 4. Reducing the Impact of Session Reset 343 Even where protocol enhancements allow errors in the BGP-4 protocol 344 to cease to trigger NOTIFICATION messages, and hence reset a BGP 345 session, it is clear that some error conditions may not be exited. 346 In particular, errors due to existing state, or memory structures, 347 associated with a specific BGP session will not be handled. It is 348 therefore important to consider how these error conditions are 349 currently handled by the protocol. It should be noted that the 350 following discussion and analysis considers only those NOTIFICATION 351 messages generated in response to errors in UPDATE messages (as 352 defined by Section 6.3 in [RFC4271]). 354 The existing NOTIFICATION behaviour triggers a reset of all elements 355 of the BGP-4 session, as described in Section 6 of [RFC4271]. It is 356 expected that session teardown requires an implementation to re- 357 initialise all structures and state required for session maintenance. 358 Clearly, there is some utility to this requirement, as error 359 conditions in BGP are, in general, exited from. However, this 360 definition is responsible for the forwarding outages within networks 361 utilising BGP for route propagation when each error is experienced. 362 The requirement described in Section 2 is intended to reduce the 363 cases whereby a NOTIFICATION is required, however, any mechanism 364 implemented as a response to this requirement by definition cannot 365 provide a session reset to the extent of that achieved by the current 366 behaviour. 368 In order to address this, there is a requirement for a means by which 369 a BGP speaker can signal that an unhandled error condition in an 370 UPDATE message occurred - requiring a session reset - yet also 371 continue to utilise the paths advertised by the neighbour that are 372 currently in use within the RIB. In this case, the Adj-RIB-In 373 received from the neighbour is not considered invalid, despite a 374 NOTIFICATION, and session reset, being required. This set of 375 requirements is akin to those answered by the BGP Graceful Restart 376 mechanism described in [RFC4724]. Since the operational requirement 377 in this case is to provide a means to achieve a complete session 378 restart without disrupting the forwarding path of those prefixes in 379 use within a BGP speaker's RIB, it is expected that utilising a 380 procedure similar to the Graceful Restart mechanism meets the error 381 handling requirement. By responding to an error condition (repeated 382 or otherwise) with a message indicating that an error that cannot be 383 handled has occurred, forcing session reset, whilst retaining 384 forwarding information within the RIB allows forwarding to all 385 prefixes within a system's RIB to continue, whilst the session 386 restarts. By placing a time bound on the restart lifetime, should an 387 error condition not be transient - for example, should an error have 388 occurred with the BGP process, rather than a specific of the BGP 389 session - the remote BGP speaker is still detected as an invalid 390 device for forwarding. 392 It should, however, be noted that a protocol enhancement meeting this 393 requirement is not able to solve all error conditions - however, a 394 complete restart of the BGP and TCP session between two BGP speakers 395 implements an identical recovery mechanism to that which is achieved 396 by the existing behaviour. Where an error condition such as memory 397 or configuration corruption has occurred in a BGP implementation, it 398 is expected that a mechanism meeting this requirement continues to 399 detect this, by means of a bound on time for session restart to 400 occur. Whilst there may be some consideration that packets continue 401 to be forwarded through a device which can be in an failure mode of 402 this nature for a longer period, due to this requirement, the 403 architecture of modern IP routers should be considered. A divided 404 forwarding and control plane is common in many devices, as well as 405 process separation for software-based devices - corruption of a 406 specific protocol daemon does not necessarily imply forwarding is 407 affected. Indeed, where forwarding behaviour of a device is 408 affected, it is envisaged that a failure detection mechanism (be it 409 Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) 410 will detect such a failure in almost all cases, with the symptomatic 411 behaviour of such a failure being an invalid UPDATE message in very 412 few other cases. 414 5. Operational Toolset for Monitoring BGP 416 A significant complexity that is introduced through the requirements 417 defined in this document is that of monitoring BGP session status for 418 an operator. Although the existing error handling behaviour causes a 419 disproportionate failure, session failure is extremely visible to 420 most operational personnel within a Network Operator due to both 421 existing definitions of SNMP trap mechanisms for BGP, along with the 422 forwarding impact typically caused by such a failure. By introducing 423 mechanisms by which errors of this nature are not as visible, this is 424 no longer the case. There is a requirement that where subsets of the 425 RIB on a device are no longer reachable from a BGP speaker, or indeed 426 an AS, that some mechanism to determine the cause is available to an 427 operator. Whilst, to some extent, this can be solved by mandating a 428 sub-requirement of each of the aforementioned requirements that a BGP 429 speaker must log where such errors occur, and are hence handled, this 430 does not solve all cases. In order to clarify this requirement, the 431 example of the transmission of an erroneous Optional Transitive 432 attribute can be considered. Since, by definition, there is no 433 requirement for all BGP speakers to parse such an attribute, a 434 receiving router may treat NLRI as withdrawn based on an erroneous 435 attribute not examined by its neighbour. In this case, the upstream 436 device or network, propagating the UPDATE, has no visibility of this 437 error. Operationally, however, it is of interest to the upstream 438 router operator that such invalid information was propagated. 440 The requirement for logging of error conditions in transmitted BGP 441 messages, which are visible to only the receiver, cannot be achieved 442 by any existing BGP message, or capability. It is envisaged that 443 each erroneous event should be transmitted to the remote peer - 444 including the information as to the set of NLRI that were considered 445 invalid. Whilst with some mechanisms this is achieved by default 446 (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf] 447 (Outbound Route Filtering) will transmit the set of prefixes that are 448 required), the operator requirement is to know which prefixes may 449 have been unreachable in all cases. It is envisaged that an 450 extension to meet this requirement will allow for such information to 451 be transmitted between peers, and hence logged. Such a mechanism may 452 provide further utility as a either a diagnostic, or logging toolset. 454 It should be noted that numerous work items within the IETF exist at 455 the time of writing that begin to solve this requirement. Within the 456 IDR working group both [I-D.raszuk-bgp-diagnostic-message] and 457 [I-D.ietf-idr-advisory] provide mechanisms by which such information 458 can be propagated in-band to an existing BGP session. Transmitting 459 such diagnostic information in-band is considered the optimal means 460 by which to propagate details of errors present in UPDATE messages, 461 due to the fact that no additional protocols (and hence security and 462 trust concerns) must be configured between two Autonomous Systems 463 (where the errors occur at an AS boundary), and the load on each BGP 464 speaker is increased only due to an additional capability, rather 465 than an additional code base, and protocol. Clearly, any mechanism 466 implemented in-band to a BGP session is required to be relatively 467 lightweight, since the information provided over the session is an 468 enhancement to the operational visibility of the protocol, and should 469 not disrupt core protocol operations. Other, out-of-band, mechanisms 470 - such as that proposed in [I-D.ietf-grow-bmp] are likely to provide 471 mechanisms by which further insight into BGP operation can be 472 achieved. The fact that such a protocol is implemented independently 473 of the BGP protocol results in further flexibility to provide 474 detailed protocol data, without introducing further complexity to the 475 BGP protocol itself. 477 6. Operational Complexities Introduced by Altering RFC4271 479 The existing NOTIFICATION and subsequent teardown of a BGP session 480 upon encountering an error has the advantage that a consistent 481 approach to error handling is required of all implementations of the 482 BGP-4 protocol. This is of operational advantage, as it provides a 483 clear expectation of the behaviour of the protocol. The requirements 484 defined herein add further complexity to the error-handling within 485 BGP, and hence are liable to compromise the existing deterministic 486 protocol behaviour. It is therefore deemed that there is a further 487 requirement to provide a clear method by which an erroneous UPDATE 488 should be reacted to, in order that all protocol implementations 489 provide a consistent means by which recovery is achieved. A further 490 complexity is introduced due to the disparate nature of the work 491 items altering the BGP error handling behaviour - since all items are 492 likely to be implemented as a BGP capability [RFC5492], situations 493 are likely to occur between devices (especially those with different 494 BGP implementations), where some of the mechanisms referenced are 495 unsupported. This adds further barriers to a standard definition of 496 the BGP-4 error handling behaviour. 498 In general, the approach considered ideal upon encountering an 499 erroneous UPDATE message can be divided into two cases - those where 500 the NLRI can be determined from the message, and those where it 501 cannot be. The latter case is the simpler of the two. In this case, 502 there is a requirement for the implementation to reset the BGP 503 session, utilising the reduced-impact approach, described in 504 Section 4. In the case where the remote BGP speaker is in a 505 transient error condition related to specific peer data structures, 506 or state, a single instance of this behaviour is likely to exit the 507 error condition. In the case of implementation errors, it is 508 possible that the BGP session in question may enter a continuous loop 509 of being reset, with a partial RIB being held by one or more of the 510 BGP speakers due to an non-deterministic order of UPDATE propagation. 511 It is therefore a requirement that within this reduced-impact 512 procedure any subsequent UPDATE messages that would result in further 513 session resets are ignored. Whilst this results in a condition where 514 an undetermined amount of the RIB is inconsistent, partial 515 reachability is maintained. In this case, the operational toolsets 516 discussed in Section 5 is likely to provide mechanisms by which this 517 condition can be brought to the attention of the relevant operators. 518 This requirement to accept a partial RIB, which results in potential 519 invalid traffic forwarding is a direct result of the deployments of 520 BGP-4, as described in Section 1.1. 522 The case where NLRI can be determined from an erroneous UPDATE 523 provides further complexities. In this case, a BGP speaker is aware 524 of the sub-set of the RIB which have been identified as being 525 contained within invalid UPDATE messages. This allows a local BGP 526 speaker to re-request single prefixes, utilising a mechanism such as 527 "one-time prefix ORF". However, a similar result is achieved by re- 528 requesting the entire RIB - albeit with greater resource 529 requirements. It is therefore expected that the process of recovery 530 utilises a staged set of mechanisms to attempt to restore consistency 531 of the RIB: 533 1. Where available, a mechanism capable of requesting only the NLRI 534 determined to have been contained within a invalid UPDATE should 535 be utilised. However, since it is possible that such an error 536 condition can be transient in nature, it is likely that more than 537 one request is to be transmitted (assuming the first does not 538 return a valid UPDATE message). In order to allow a 539 deterministic process, there is a requirement for a limit on the 540 number of specific requests transmitted to be defined. 542 2. Where a specific refresh mechanism is not available, a peer 543 should re-request the entire RIB. Again, there is a requirement 544 to limit the number of complete RIB requests that should be sent 545 via an implementation, in order to provide a bound both on the 546 expected level of load a device may experience, and on the time 547 for which the RIB may be inconsistent. 549 3. Finally, a session reset should be performed, as per the reduced- 550 impact NOTIFICATION requirement defined in Section 4. At this 551 point, a similar challenge to that discussed above exists, should 552 the error condition persist. In this case, as defined above, 553 there is a requirement to ignore those UPDATE messages that 554 continue to be erroneous. 556 It is envisaged that where limits are required, these will be defined 557 on a per memo-basis, or within a further revision of the requirements 558 described herein. 560 Whilst the approach described above provides a standard means by 561 which error recovery may be handled on a per UPDATE basis, further 562 complexities are raised where multiple errors occur. Clearly, 563 following this procedure causes control-plane load on both the BGP 564 speakers - for this reason, consideration of how repeated use of the 565 mechanisms discussed in this document is required. It is notable 566 that errors may not occur with UPDATE messages relating to only a 567 single NLRI, independent errors in multiple NLRIs may be experienced. 568 For this reason, it is required that an implementation rate limits 569 the number of error handling events sourced towards a particular 570 neighbour. It is expected that such rate limiting, or event 571 suppression is achieved on a per-session basis, where state 572 information is already held, rather than on a per-prefix basis as it 573 is envisaged that such behaviour presents significant scaling 574 problems, and introduces further state requirements for an 575 implementation of the protocol. It is recommended that where a flag 576 indicative of erroneous behaviour is implemented, the state of such a 577 value is maintained independently of session establishment. 579 7. IANA Considerations 581 This memo includes no request to IANA. 583 8. Security Considerations 585 The requirements outlined in this document provide mechanisms by 586 which erroneous BGP messages may be responded to with limited impact 587 to forwarding operation. This is of benefit to the security of a BGP 588 speaker in general. Where UPDATE messages may have been propagated 589 by a single malicious Autonomous System or router within a network 590 (or the Internet default free zone - DFZ), which are then propagated 591 to all devices within the same routing domain, all other NLRI 592 available over the same session become unreachable. This mechanism 593 may provide means by which an Autonomous System can be isolated from 594 required routing domains (such as the Internet), should the relevant 595 UPDATE messages be propagated via specific paths. By reducing the 596 impact of such failures, it is envisaged that this possibility may be 597 constrained to a specific set of NLRI, or a specific topology. 599 Some mechanisms meeting the requirements specified in this document, 600 particularly those within Section 5 may provide further security 601 concerns, however, it is envisaged that these are addressed in per- 602 enhancement memos. 604 9. Acknowledgements 606 The author would like to thank Rob Evans, David Freedman, Tom 607 Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom 608 Scholl and Ilya Varlashkin for their review and valuable feedback. 610 10. References 612 10.1. Normative References 614 [I-D.chen-ebgp-error-handling] 615 Chen, E., Mohapatra, P., and K. Patel, "Revised Error 616 Handling for BGP Updates from External Neighbors", 617 draft-chen-ebgp-error-handling-00 (work in progress), 618 September 2010. 620 [I-D.ietf-grow-bmp] 621 Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring 622 Protocol", draft-ietf-grow-bmp-05 (work in progress), 623 December 2010. 625 [I-D.ietf-idr-advisory] 626 Scholl, T., Scudder, J., Steenbergen, R., and D. Freedman, 627 "BGP Advisory Message", draft-ietf-idr-advisory-00 (work 628 in progress), October 2009. 630 [I-D.ietf-idr-optional-transitive] 631 Scudder, J. and E. Chen, "Error Handling for Optional 632 Transitive BGP Attributes", 633 draft-ietf-idr-optional-transitive-03 (work in progress), 634 September 2010. 636 [I-D.keyur-bgp-enhanced-route-refresh] 637 Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 638 Route Refresh Capability for BGP-4", 639 draft-keyur-bgp-enhanced-route-refresh-02 (work in 640 progress), March 2011. 642 [I-D.raszuk-bgp-diagnostic-message] 643 Raszuk, R., Chen, E., and B. Decraene, "BGP Diagnostic 644 Message", draft-raszuk-bgp-diagnostic-message-02 (work in 645 progress), March 2011. 647 [I-D.zeng-one-time-prefix-orf] 648 Zeng, Q. and J. Dong, "One-time Address-Prefix Based 649 Outbound Route Filter for BGP-4", 650 draft-zeng-one-time-prefix-orf-01 (work in progress), 651 October 2010. 653 [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, 654 September 2000. 656 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 657 Protocol 4 (BGP-4)", RFC 4271, January 2006. 659 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 660 Networks (VPNs)", RFC 4364, February 2006. 662 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 663 Reflection: An Alternative to Full Mesh Internal BGP 664 (IBGP)", RFC 4456, April 2006. 666 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 667 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 668 January 2007. 670 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 671 with BGP-4", RFC 5492, February 2009. 673 10.2. Informational References 675 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 676 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 677 June 2010. 679 Author's Address 681 Rob Shakir 682 Cable&Wireless Worldwide 684 Email: rob.shakir@cw.com 685 URI: http://www.cw.com/