idnits 2.17.1 draft-ietf-grow-ops-reqs-for-bgp-error-handling-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 28, 2011) is 4686 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5881' is defined on line 837, but no explicit reference was found in the text == Outdated reference: A later version (-01) exists of draft-chen-ebgp-error-handling-00 == Outdated reference: A later version (-17) exists of draft-ietf-grow-bmp-05 == Outdated reference: A later version (-04) exists of draft-ietf-idr-optional-transitive-03 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force R. Shakir 3 Internet-Draft C&W 4 Intended status: Informational June 28, 2011 5 Expires: December 30, 2011 7 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 8 draft-ietf-grow-ops-reqs-for-bgp-error-handling-01 10 Abstract 12 BGP-4 is utilised as a key intra- and inter-Autonomous System routing 13 protocol in modern IP networks. The failure modes as defined by the 14 original protocol standards are based on a number of assumptions 15 around the impact of session failure. Numerous incidents both in the 16 global Internet routing table and within Service Provider networks 17 have been caused by strict handling of a single invalid UPDATE 18 message causing large-scale failures in one or more Autonomous 19 Systems. 21 This memo describes the current use of BGP-4 within Service Provider 22 networks, and outlines a set of requirements for further work to 23 enhance the mechanisms available to a BGP-4 implementation when 24 erroneous data is detected. Whilst this document does not provide 25 specification of any standard, it is intended as an overview of a set 26 of enhancements to BGP-4 to improve the protocol's robustness to suit 27 its current deployment. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on December 30, 2011. 46 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 65 1.2. Overview of Operator Requirements for BGP-4 Error 66 Handling . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 6 68 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 8 69 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 10 70 5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 12 71 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 14 72 7. Operational Complexities Introduced by Altering RFC4271 . . . 18 73 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 74 9. Security Considerations . . . . . . . . . . . . . . . . . . . 22 75 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 76 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 77 11.1. Normative References . . . . . . . . . . . . . . . . . . . 24 78 11.2. Informational References . . . . . . . . . . . . . . . . . 25 79 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 26 81 1. Introduction 83 Where BGP-4 [RFC4271] is deployed in the Internet and Service 84 Provider networks, numerous incidents have been recorded due to the 85 manner in which [RFC4271] specifies errors in routing information 86 should be handled. Whilst the behaviour defined in the existing 87 standards retains utility, the deployments of the protocol have 88 changed within modern networks, resulting in significantly different 89 demands for protocol robustness. Whilst a number of Internet Drafts 90 have been written to begin to enhance the behaviour of BGP-4 in terms 91 of the handling of erroneous messages, this memo intends to define a 92 set of requirements for ongoing work. These requirements are 93 considered from the perspective of a Network Operator, and hence this 94 draft does not intend to define the protocol mechanisms by which such 95 error handling behaviour is to be implemented. 97 1.1. Role of BGP-4 in Service Provider Networks 99 BGP was designed as an inter-Autonomous System (AS) routing protocol 100 and hence many of the error handling mechanisms within the protocol 101 specification are designed to be conducive to this role. In general, 102 this consideration as an inter-AS routing propagation mechanism 103 results in the view that a BGP session propagates a relatively small 104 amount of network-layer reachability information (NLRI) between two 105 ASes. In this case, it is the expectation of session resilience for 106 those adjacencies that are key to routing continuity (for example, it 107 is expected that two networks peering via BGP would connect multiple 108 times in order to safeguard equipment or protocol failure). In 109 addition, there is some expectation of multiple paths to a particular 110 NLRI being available - it would be expected that a network can fall 111 back to utilising alternate, less direct, paths where a failure of a 112 more direct path occurs. 114 Traditional network architectures would deploy an Interior Gateway 115 Protocol (IGP) to carry infrastructure and customer prefixes, with an 116 Exterior Gateway Protocol (EGP) such as BGP being utilised to 117 propagate these prefixes to other Autonomous Systems. However, with 118 the growth of IP-based services, this is no longer considered best 119 practice. In order to ensure that convergence is within acceptable 120 time bounds, the amount of routing information carried within the IGP 121 is significantly reduced - and tends to be only infrastructure 122 prefixes. iBGP is then utilised to propagate both customer, and 123 external prefixes within an AS. As such, BGP has become an IGP, with 124 traditional IGPs acting as a means by which to propagate the routing 125 information which is required to establish a BGP session, and reach 126 the egress node within the local routing domain. This change in role 127 presents different requirements for the robustness of BGP as a 128 routing protocol - with the expectation of similar level of 129 robustness to that of an IGP being set. 131 Along with this change in role, the nature of the IP routing 132 information that is carried has changed. BGP has become a ubiquitous 133 means by which service information can be propagated between devices. 134 For instance, BGP is utilised to carry routing information for IP/ 135 MPLS VPN services as described in [RFC4364]. Since there is an 136 existing deployment of the protocol between PE devices in numerous 137 networks, it has been adapted to propagate this routing information, 138 as its use limits number of routing protocols required on each 139 device. This additional information being propagated represents a 140 large change in requirement for the error handling of the protocol - 141 where session failure occurs, it is likely a complete service outage 142 for at least a subset of a network's customers is experienced where 143 an erroneous packet may have occurred within a different sub-topology 144 or even service (a different address family for example). For this 145 reason, there is a significant demand to avoid service affecting 146 failures that may be triggered by routing information within a single 147 sub-topology or service. 149 Both within Internet and multi-service routing architectures, a 150 number of BGP sessions propagate a large proportion of the required 151 routing information for network operation. For Internet routing, 152 these are typically BGP sessions which propagate the global routing 153 table to an AS - failure of these sessions may have a large impact on 154 network service, based on a single erroneous update. In an multi- 155 service environment, typical deployments utilise a small number of 156 core-facing BGP sessions, typically towards route reflector devices. 157 Failure of these sessions may also result in a large impact to 158 network operation. Clearly, the avoidance of conditions requiring 159 these sessions to fail is of great utility to any network operator, 160 and provides further motivation for the revision of the existing 161 behaviour. 163 Whilst the behaviour in [RFC4271] is suited to ensuring that BGP 164 messages with erroneous routing information in are limited in scope 165 (by means of session reset), with the above considerations, it is 166 clear that this mechanism is not suited to all deployments. It 167 should, however, be noted that the change in scope affects the 168 handling only of errors occurring after BGP session establishment. 169 There is no current operational requirement to amend the means by 170 which error handling in session establishment, or liveliness 171 detection, are performed. 173 1.2. Overview of Operator Requirements for BGP-4 Error Handling 175 It is the intention of this document to define a set of criteria for 176 the manner in which a revised error handling mechanism in BGP-4 is 177 required to conform. The motivation for the definition of these 178 requirements can be summarised based on certain behaviour currently 179 present in the protocol that is not deemed acceptable within current 180 operational deployments, or where there is a short-fall in the tool 181 set available to an operator. These key requirements can be 182 summarised as follows: 184 o It is unacceptable within modern deployments of the BGP-4 protocol 185 that a single erroneous UPDATE packet affects prefixes that it 186 does not carry. This requirement therefore requires some 187 modification to the means by which erroneous UPDATE packets are 188 handled, and reacted to - with a particular focus on avoiding the 189 use of the NOTIFICATION message. 191 o It is recognised that some error conditions may occur within the 192 BGP-4 protocol may not always be handled gracefully, and may 193 result in conditions whereby an implementation cannot recover. In 194 these (and similar) cases, it is unacceptable for an operator that 195 this reset of the BGP-4 session results in interruption to 196 forwarding packets (by means of withdrawing prefixes installed by 197 BGP-4 into a device's RIB, and subsequently FIB). To this end, 198 there is a requirement to define a session reset mechanism which 199 provides session re-initialisation in a non-destructive manner. 201 o Further to the requirements to provide a more robust protocol, the 202 current visibility into error conditions within the BGP-4 protocol 203 is extremely limited - where further modifications to this 204 behaviour are to be made, complexity is likely to be added. Thus, 205 to ensure that BGP-4 is manageable, there are requirements for 206 mechanisms by which the protocol can be examined and monitored. 208 This document describes each of these requirements in further depth, 209 along with an overview of means by which they are expected to be 210 achieved. In addition, the mechanism by which the enhancements 211 meeting these requirements are to interact is discussed. 213 2. Errors within BGP-4 UPDATE Messages 215 Both through analysis of incidents occurring with the Internet DFZ, 216 and multi-service environments utilising BGP-4 to signal service or 217 routing information, a number of different classes of errors within 218 BGP-4 UPDATE messages have been observed. In order to consider the 219 applicability of enhanced error handling mechanisms, it is possible 220 to divide these errors into a number of sub-classes, particularly 221 focusing around the location of the error within the UPDATE message. 223 Where an UPDATE message is considered invalid by a BGP speaker due to 224 an error within a path attribute that is not the NLRI (where the 225 definition of NLRI includes reachability information encoded in the 226 MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in 227 [RFC4760]) it is a requirement of any enhanced error handling 228 mechanism to handle the error in a manner focused on the NLRI 229 contained within the message. Since in this case, the message 230 received from the remote peer is syntactically valid, it is 231 considered that such an UPDATE is indicative of erroneous data within 232 a path attribute - as such, it cannot be assumed that the BGP speaker 233 from whom the message was received is directly responsible for the 234 erroneous information - and hence affecting all NLRI received via a 235 specific session is disproportionate. 237 Two further error cases exist within UPDATE messages, both of which 238 are related to the mechanisms that are applicable to messages 239 received where some difficulty exists in parsing the entire BGP 240 message. The two cases concern those cases where a valid NLRI 241 attribute can be extracted, and those where such an attribute is not 242 able to be parsed. In these cases, errors in the packing of 243 attributes within a BGP message may have occurred. Such errors are 244 likely indicative of an error specifically caused by the remote BGP 245 speaker. It is, however, desirable to an operator that such errors 246 are handled without affecting all NLRI across a BGP session. As 247 such, there is a key requirement to maximise the number of cases in 248 which it is possible to extract NLRI from a BGP UPDATE message. To 249 this end, it is required that where possible the MP_REACH and 250 MP_UNREACH attributes are utilised for encoding all NLRI (including 251 IPv4 Unicast), and that this attribute is included as the first 252 attribute of a BGP UPDATE message (as originally recommended in 253 [I-D.chen-ebgp-error-handling]). Such a change to the order of 254 inclusion of this attribute maximises the number of cases in which 255 NLRI can be extracted from an UPDATE. Where this is possible, it is 256 again required that the error handling mechanisms utilised should be 257 directly applied to the NLRI included in the UPDATE. 259 For all cases whereby NLRI can be obtained from an UPDATE message, it 260 is expected that the requirements outlined in Section 3 should be 261 considered by any enhancement to the BGP-4 protocol. 263 In the case that it is not possible to completely parse the NLRI 264 attribute from the UPDATE message received from a peer, it is 265 extremely likely that this is indicative of a serious error with 266 either the process of attribute packing, or buffer usage on the 267 remote BGP speaker. In this case, clearly, it is not possible to 268 apply any error handling mechanism that is limited to a specific set 269 of NLRI, since an implementation has no knowledge of the NLRI 270 included within the UPDATE message. In addition, such errors are 271 considered to be relatively fundamental to the operation of a BGP 272 implementation, and hence may indicate a case whereby significant 273 system errors have occurred. The current BGP-4 standard results in a 274 BGP speaker restarting a session with the remote BGP speaker. 275 However where such an error does occur, it is required that a 276 graceful mechanism is utilised to provide a lower impact to network 277 operation. The requirements for enhancements of this nature to BGP-4 278 are outlined in Section 5, with the requirements outlined therein 279 focused on providing a means by which system integrity can be 280 restored whilst allowing for continued network operation. 282 3. Avoiding use of NOTIFICATION 284 The error handling behaviour defined in RFC4271 is problematic due to 285 the limited options that are available to an implementation. When an 286 erroneous BGP message is received, at the current time, the 287 implementation must either ignore the error, or send a NOTIFICATION 288 message, after which it is mandatory to terminate the BGP session. 289 It is apparent that this requirement is at odds with that of protocol 290 robustness. 292 There is significant complexity to this requirement. The mechanism 293 defined in [I-D.chen-ebgp-error-handling] describes a means by which 294 no NOTIFICATION message is generated for all cases whereby NLRI can 295 be extracted from an UPDATE. The NLRI contained within the erroneous 296 UPDATE message is considered as though the remote BGP speaker has 297 provided an UPDATE marking it as withdrawn. This results in a limit 298 in the propagation of the invalid routing information, whilst also 299 ensuring that no traffic is forwarded via a previously-known path 300 that may no longer be valid. This mechanism is referred to as 301 "treat-as-withdraw". 303 Whilst this behaviour results in avoiding a NOTIFICATION message, 304 keeping other routing information advertised by the remote BGP 305 speaker within the RIB, it may result in unreachability for a sub-set 306 of the NLRI advertised by the remote speaker. Two cases should be 307 considered - that where the entry for a prefix in the Adj-RIB-In of 308 the neighbour propagating an erroneous packet is utilised, and that 309 where the prefix installed in the device's RIB is learnt from another 310 BGP speaker. In the former case, should the identified NLRI not be 311 treated as withdrawn, the original NLRI is utilised within the global 312 RIB. However, this information is potentially now invalid (i.e. it 313 no longer provides a valid forwarding path), whilst an alternate 314 (valid) path may exist in another Adj-RIB-In. By continuing to 315 utilise the NLRI for which the UPDATE was considered invalid, traffic 316 may be forwarded via an invalid path, resulting in routing loops, or 317 black-holing. In the second case, no impact to the forwarding of 318 traffic, or global RIB, is incurred, yet where treat-as-withdraw is 319 implemented, possibly stale routing information is purged from the 320 Adj-RIB-In of the neighbour propagating errors. 322 Whilst mechanisms such as "treat-as-withdraw" are currently 323 documented, the proposals are limited in their scope - particularly 324 in terms of restrictions to implementation only on eBGP sessions. 325 This limitation is made based on the view that the BGP RIB must be 326 consistent across an autonomous system. By implementing treat-as- 327 withdraw for a iBGP session, one or more routers within the 328 Autonomous System may not have reachability to a prefix, and hence 329 blackholing of traffic, or routing loops, may occur. It should, 330 however, be considered if this view is valid, in light of the manner 331 in which BGP is utilised within operator networks. Inconsistency in 332 a RIB based on a single UPDATE being treated as withdrawn may cause a 333 inconsistency in a single sub-topology (e.g. Layer 3 VPN service), 334 or a service not operating completely (in the case of an UPDATE 335 carrying service membership information). Where a NOTIFICATION and 336 teardown is utilised this is destructive to all sub-topologies in all 337 address family identifiers (AFIs) carried by the session in question. 338 Even where mechanisms such as multi-session BGP are utilised, a whole 339 AFI is affected by such a NOTIFICATION message. In terms of routing 340 operation, it is therefore far less costly to endure a situation 341 where a limited sub-set of routing information within an AS is 342 invalid, than to consider all routing information as invalid based on 343 a single trigger. 345 It is considered that, if extended to cover iBGP, the mechanisms 346 described in [I-D.chen-ebgp-error-handling] and 347 [I-D.ietf-idr-optional-transitive] provide a means to avoid the 348 transmission of a NOTIFICATION to a remote BGP speaker based on a 349 single erroneous message, where at all possible, and hence meet this 350 requirement. The failure cases whereby NLRI cannot be extracted from 351 the UPDATE message represent a case whereby the receiving system 352 cannot handle the error gracefully based on this mechanism. 354 4. Recovering RIB Consistency 356 The recommendations described in Section 3 may result in the RIB for 357 a topology within an AS being inconsistent across the AS' internal 358 routers. Alternatively, where such mechanisms are deployed at an AS 359 boundary, interconnects between two ASes may be inconsistent with 360 each other. There are therefore risks of traffic blackholing, due to 361 missing routing information, or forwarding loops. Whilst this is 362 deemed an acceptable compromise in the short term, clearly, it is 363 suboptimal. Therefore, a requirement exists to provide mechanisms by 364 which a BGP speaker is able to recover the consistency of the Adj- 365 RIB-In for a particular neighbour. 367 It is envisaged that during such routing inconsistencies, the local 368 BGP speaker is aware that some routing information was not able to be 369 processed - due to the fact that an UPDATE message was not parsed 370 correctly. If the 'treat-as-withdraw' mechanism described within 371 Section 3 is utilised, it is also possible for the local BGP speaker 372 to have determined the set of NLRI for which an erroneous UPDATE 373 message was received. In this scenario, by utilising targeted 374 mechanisms to re-request the specific NLRI that was unreachable, this 375 routing information can be re-transmitted from the remote BGP 376 speaker. Such a request requires extension to the existing BGP-4 377 protocol, in terms of specific UPDATE generation filters with a 378 transient lifetime. It is envisaged that the work within 379 [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted 380 elements of the Adj-RIB-In for a BGP neighbour to be recovered. 382 In addition to such cases where specific routing information is known 383 to be erroneous, the more general case where either a large amount of 384 the Adj-RIB-In is contained in UPDATE messages subject to treat-as- 385 withdraw, or the specific prefixes are unknown to the local BGP 386 speaker must be considered. In this case, there is a requirement for 387 a BGP speaker to re-request the entire RIB advertised by a remote 388 neighbour. In this case, where such re-advertisement is required, it 389 is envisaged that a ROUTE-REFRESH as per the description in [RFC2918] 390 is utilised. [I-D.keyur-bgp-enhanced-route-refresh] provides a means 391 by which the ROUTE-REFRESH mechanism can be extended in order to meet 392 this requirement. 394 It is of particular note for both means of recovering RIB consistency 395 described that these are effective only when considering transitive 396 errors within an implementation - for instance, should an RFC 397 interpretation error within an implementation be present, regardless 398 of the number of times a specific UPDATE is generated, it is likely 399 that this error condition will persist. For this reason, there is an 400 requirement to consider the means by which such consistency recovery 401 mechanisms are utilised. It is not advisable that a transitive 402 filter and advertisement mechanism is triggered by all error handling 403 events due to the load this is likely to place on the neighbour 404 receiving such a request. Where this BGP speaker is a relatively 405 centralised device - a route reflector (as described by [RFC4456]) 406 for example - the act of generation of UPDATE messages with such 407 frequency is likely to cause disproportionate load. It is therefore 408 an operational requirement of such mechanisms that means of request 409 dampening be required by any such extension. 411 5. Reducing the Impact of Session Reset 413 Even where protocol enhancements allow errors in the BGP-4 protocol 414 to cease to trigger NOTIFICATION messages, and hence reset a BGP 415 session, it is clear that some error conditions may not be exited. 416 In particular, errors due to existing state, or memory structures, 417 associated with a specific BGP session will not be handled. It is 418 therefore important to consider how these error conditions are 419 currently handled by the protocol. It should be noted that the 420 following discussion and analysis considers only those NOTIFICATION 421 messages generated in response to errors in UPDATE messages (as 422 defined by Section 6.3 in [RFC4271]). 424 The existing NOTIFICATION behaviour triggers a reset of all elements 425 of the BGP-4 session, as described in Section 6 of [RFC4271]. It is 426 expected that session teardown requires an implementation to re- 427 initialise all structures and state required for session maintenance. 428 Clearly, there is some utility to this requirement, as error 429 conditions in BGP are, in general, exited from. However, this 430 definition is responsible for the forwarding outages within networks 431 utilising BGP for route propagation when each error is experienced. 432 The requirement described in Section 3 is intended to reduce the 433 cases whereby a NOTIFICATION is required, however, any mechanism 434 implemented as a response to this requirement by definition cannot 435 provide a session reset to the extent of that achieved by the current 436 behaviour. 438 In order to address this, there is a requirement for a means by which 439 a BGP speaker can signal that an unhandled error condition in an 440 UPDATE message occurred - requiring a session reset - yet also 441 continue to utilise the paths advertised by the neighbour that are 442 currently in use within the RIB. In this case, the Adj-RIB-In 443 received from the neighbour is not considered invalid, despite a 444 NOTIFICATION, and session reset, being required. This set of 445 requirements is akin to those answered by the BGP Graceful Restart 446 mechanism described in [RFC4724]. Since the operational requirement 447 in this case is to provide a means to achieve a complete session 448 restart without disrupting the forwarding path of those prefixes in 449 use within a BGP speaker's RIB, it is expected that utilising a 450 procedure similar to the Graceful Restart mechanism meets the error 451 handling requirement. By responding to an error condition (repeated 452 or otherwise) with a message indicating that an error that cannot be 453 handled has occurred, forcing session reset, whilst retaining 454 forwarding information within the RIB allows forwarding to all 455 prefixes within a system's RIB to continue, whilst the session 456 restarts. By placing a time bound on the restart lifetime, should an 457 error condition not be transient - for example, should an error have 458 occurred with the BGP process, rather than a specific of the BGP 459 session - the remote BGP speaker is still detected as an invalid 460 device for forwarding. 462 It should, however, be noted that a protocol enhancement meeting this 463 requirement is not able to solve all error conditions - however, a 464 complete restart of the BGP and TCP session between two BGP speakers 465 implements an identical recovery mechanism to that which is achieved 466 by the existing behaviour. Where an error condition such as memory 467 or configuration corruption has occurred in a BGP implementation, it 468 is expected that a mechanism meeting this requirement continues to 469 detect this, by means of a bound on time for session restart to 470 occur. Whilst there may be some consideration that packets continue 471 to be forwarded through a device which can be in an failure mode of 472 this nature for a longer period, due to this requirement, the 473 architecture of modern IP routers should be considered. A divided 474 forwarding and control plane is common in many devices, as well as 475 process separation for software-based devices - corruption of a 476 specific protocol daemon does not necessarily imply forwarding is 477 affected. Indeed, where forwarding behaviour of a device is 478 affected, it is envisaged that a failure detection mechanism (be it 479 Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) 480 will detect such a failure in almost all cases, with the symptomatic 481 behaviour of such a failure being an invalid UPDATE message in very 482 few other cases. 484 6. Operational Toolset for Monitoring BGP 486 A significant complexity that is introduced through the requirements 487 defined in this document is that of monitoring BGP session status for 488 an operator. Although the existing error handling behaviour causes a 489 disproportionate failure, session failure is extremely visible to 490 most operational personnel within a Network Operator due to both 491 existing definitions of SNMP trap mechanisms for BGP, along with the 492 forwarding impact typically caused by such a failure. By introducing 493 mechanisms by which errors of this nature are not as visible, this is 494 no longer the case. There is a requirement that where subsets of the 495 RIB on a device are no longer reachable from a BGP speaker, or indeed 496 an AS, that some mechanism to determine the cause is available to an 497 operator. Whilst, to some extent, this can be solved by mandating a 498 sub-requirement of each of the aforementioned requirements that a BGP 499 speaker must log where such errors occur, and are hence handled, this 500 does not solve all cases. In order to clarify this requirement, the 501 example of the transmission of an erroneous Optional Transitive 502 attribute can be considered. Since, by definition, there is no 503 requirement for all BGP speakers to parse such an attribute, a 504 receiving router may treat NLRI as withdrawn based on an erroneous 505 attribute not examined by its neighbour. In this case, the upstream 506 device or network, propagating the UPDATE, has no visibility of this 507 error. Operationally, however, it is of interest to the upstream 508 router operator that such invalid information was propagated. 510 The requirement for logging of error conditions in transmitted BGP 511 messages, which are visible to only the receiver, cannot be achieved 512 by any existing BGP message, or capability. It is envisaged that 513 each erroneous event should be transmitted to the remote peer - 514 including the information as to the set of NLRI that were considered 515 invalid. Whilst with some mechanisms this is achieved by default 516 (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf] 517 (Outbound Route Filtering) will transmit the set of prefixes that are 518 required), the operator requirement is to know which prefixes may 519 have been unreachable in all cases. It is envisaged that an 520 extension to meet this requirement will allow for such information to 521 be transmitted between peers, and hence logged. Such a mechanism may 522 provide further utility as a either a diagnostic, or logging toolset. 524 As such, it is possible to divide the messages that are required in 525 order to provide further visibility into BGP for an operator. Such a 526 division can be made both due to the required means of message 527 transmission, alongside the criticality of each request. 529 o Messages required to replace NOTIFICATION - In cases where the 530 error handling mechanisms defined by [RFC4271] currently result in 531 a NOTIFICATION message being generated, a number of the 532 requirements detailed within this document result this message 533 being suppressed. Despite this change, the error condition's 534 occurrence is still of interest to an operator, since some form of 535 invalid data has been received on a session in order to provide 536 both monitoring and troubleshooting capabilities. It therefore 537 considered that an implementation must generate a message both 538 locally, and transmitted to the remote peer, based on the such a 539 condition. Where such a message is transmitted to the remote 540 peer, it is considered that the BGP session via which the 541 erroneous UPDATE message was received as transport to the remote 542 peer. The information transmitted in such a message should be 543 minimised to allow identification of the paths which were 544 considered erroneous (i.e. restricting the information to that 545 which is directly relevant to a network operator in the case of an 546 error condition occurring). Any delay to convergence on the 547 session in question is considered to be acceptable, given the 548 suboptimal nature of the reception of invalid routing information 549 via a BGP session. Further concerns regarding such a mechanism 550 relate to the load generated on the BGP speaker in question, 551 however, it must be considered that in the case of an erroneous 552 UPDATE being received, and the 'treat-as-withdraw' mechanism being 553 utilised, where the erroneous path is removed from the Loc-RIB, 554 there is likely to be a requirement to generate UPDATE messages 555 withdrawing the prefix from all further BGP speakers to which the 556 prefix is advertised. The load generated by the generation of 557 such UPDATEs is likely to be much greater than that of 558 transmitting error information via a logging message type back to 559 the speaker from which it was received. It is envisaged that 560 light-weight BGP message-based signalling mechanisms such as 561 [I-D.ietf-idr-advisory] provide a suitable means to satisfy this 562 requirement. 564 o Additional Diagnostic Capabilities for BGP - In a number of cases, 565 there is an operational requirement to further debug erroneous BGP 566 UPDATE messages, along with the particulars of the state of a BGP 567 speaker. For instance, where an invalid BGP UPDATE message is 568 transmitted between two BGP speakers, the exact format of the 569 UPDATE message is of interest to an operator, as this information 570 provides a clear indication of an message considered to be 571 erroneous by the BGP speaker to which it was transmitted. In this 572 case, it is considered of great utility that the entire UPDATE 573 message is transmitted back to the advertising speaker, in order 574 to allow for further debugging to occur. Whilst such information 575 is particularly useful to an operator, it clearly provides 576 information that is not key to protocol operation - for this 577 reason, it is expected that some of the concerns regarding the 578 additional complexity, and load that a BGP speaker is subjected to 579 is not acceptable. For this reason, it is required that where 580 mechanisms are developed to support this requirement, messages of 581 this nature can be supported both within an existing BGP session, 582 and via a dedicated separate session, be it BGP carrying messages 583 such as DIAGNOSTIC [I-D.raszuk-bgp-diagnostic-message] or ADVISORY 584 [I-D.ietf-idr-advisory] or a dedicated monitoring protocol akin to 585 BMP described in [I-D.ietf-grow-bmp]. 587 Whilst the operational requirement for such monitoring tools to allow 588 for visibility into BGP is clearly agreed upon, the means by which 589 such messages are transmitted between two BGP speakers is likely to 590 be dependent upon both the positions of the speakers in question (for 591 instances, the requirements for such a protocol may differ where a 592 session is between two ASBRs under separate administration). The 593 introduction of additional message types to the BGP protocol clearly 594 introduces further complexity - and leaves room for further 595 implementation and standardisation errors that may compromise the 596 robustness of the BGP protocol. In addition, the queuing and 597 scheduling of these BGP messages must be interleaved with the 598 transmission of the key protocol messages - such as KEEPALIVE and 599 UPDATE packets. It is therefore a concern that should a large number 600 of messages specifically for operational visibility be transmitted, 601 this will delay the transmission of UPDATE packets, and hence 602 adversely affect the end-to-end convergence time for NLRI carried 603 within BGP. The operational requirement for why messages are 604 advantageous to be in-band to a protocol should also be considered. 605 In particular, it should be noted that where such information is to 606 be transmitted between administrative boundaries a BGP session 607 represents an existing channel exists between the two ASes. This 608 channel is considered to be secure insofar as the routing 609 information, and requests sent via the session are considered to come 610 from a trusted source. Since error information relates to both a 611 particular attachment, and is key to ensuring that such a session is 612 operating as expected, it is considered of great operational benefit 613 that this information is transmitted over this channel. In addition, 614 the overall system scalability is improved by such in-band 615 transmission. It is expected that erroneous information resulting in 616 the 'treat-as-withdraw' mechanism being utilised is relatively 617 infrequently transmitted between two peers (when compared to the 618 frequency of UPDATE messages transmission). The impact of including 619 an additional BGP message type for such operational visibility is 620 relatively small from a resource utilisation perspective - additional 621 processing overhead is only experienced when such a message is 622 received. Where a separate session is maintained, particular network 623 elements within a service provider topology may require hundreds, or 624 thousands, of additional sessions for the transmission of this 625 information. Such an resource consumption overhead is likely to be 626 unacceptable to some network operators. 628 For the reasons explained above, it is expected that mechanisms 629 specified to meet the requirements for event visibility consider the 630 relative impacts of additional monitoring sessions, or message 631 inclusion in band to BGP in order not to compromise the security, 632 scalability and robustness of the BGP-4 protocol. 634 7. Operational Complexities Introduced by Altering RFC4271 636 The existing NOTIFICATION and subsequent teardown of a BGP session 637 upon encountering an error has the advantage that a consistent 638 approach to error handling is required of all implementations of the 639 BGP-4 protocol. This is of operational advantage, as it provides a 640 clear expectation of the behaviour of the protocol. The requirements 641 defined herein add further complexity to the error-handling within 642 BGP, and hence are liable to compromise the existing deterministic 643 protocol behaviour. It is therefore deemed that there is a further 644 requirement to provide a clear method by which an erroneous UPDATE 645 should be reacted to, in order that all protocol implementations 646 provide a consistent means by which recovery is achieved. A further 647 complexity is introduced due to the disparate nature of the work 648 items altering the BGP error handling behaviour - since all items are 649 likely to be implemented as a BGP capability [RFC5492], situations 650 are likely to occur between devices (especially those with different 651 BGP implementations), where some of the mechanisms referenced are 652 unsupported. This adds further barriers to a standard definition of 653 the BGP-4 error handling behaviour. 655 In general, the approach considered ideal upon encountering an 656 erroneous UPDATE message can be divided into two cases - those where 657 the NLRI can be determined from the message, and those where it 658 cannot be. The latter case is the simpler of the two. In this case, 659 there is a requirement for the implementation to reset the BGP 660 session, utilising the reduced-impact approach, described in 661 Section 5. In the case where the remote BGP speaker is in a 662 transient error condition related to specific peer data structures, 663 or state, a single instance of this behaviour is likely to exit the 664 error condition. In the case of implementation errors, it is 665 possible that the BGP session in question may enter a continuous loop 666 of being reset, with a partial RIB being held by one or more of the 667 BGP speakers due to an non-deterministic order of UPDATE propagation. 668 It is therefore a requirement that within this reduced-impact 669 procedure any subsequent UPDATE messages that would result in further 670 session resets are ignored. Whilst this results in a condition where 671 an undetermined amount of the RIB is inconsistent, partial 672 reachability is maintained. In this case, the operational toolsets 673 discussed in Section 6 is likely to provide mechanisms by which this 674 condition can be brought to the attention of the relevant operators. 675 This requirement to accept a partial RIB, which results in potential 676 invalid traffic forwarding is a direct result of the deployments of 677 BGP-4, as described in Section 1.1. 679 The case where NLRI can be determined from an erroneous UPDATE 680 provides further complexities. In this case, a BGP speaker is aware 681 of the sub-set of the RIB which have been identified as being 682 contained within invalid UPDATE messages. This allows a local BGP 683 speaker to re-request single prefixes, utilising a mechanism such as 684 "one-time prefix ORF". However, a similar result is achieved by re- 685 requesting the entire RIB - albeit with greater resource 686 requirements. It is therefore expected that the process of recovery 687 utilises a staged set of mechanisms to attempt to restore consistency 688 of the RIB: 690 1. Where available, a mechanism capable of requesting only the NLRI 691 determined to have been contained within a invalid UPDATE should 692 be utilised. However, since it is possible that such an error 693 condition can be transient in nature, it is likely that more than 694 one request is to be transmitted (assuming the first does not 695 return a valid UPDATE message). In order to allow a 696 deterministic process, there is a requirement for a limit on the 697 number of specific requests transmitted to be defined. 699 2. Where a specific refresh mechanism is not available, a peer 700 should re-request the entire RIB. Again, there is a requirement 701 to limit the number of complete RIB requests that should be sent 702 via an implementation, in order to provide a bound both on the 703 expected level of load a device may experience, and on the time 704 for which the RIB may be inconsistent. 706 3. Finally, a session reset should be performed, as per the reduced- 707 impact NOTIFICATION requirement defined in Section 5. At this 708 point, a similar challenge to that discussed above exists, should 709 the error condition persist. In this case, as defined above, 710 there is a requirement to ignore those UPDATE messages that 711 continue to be erroneous. 713 It is envisaged that where limits are required, these will be defined 714 on a per memo-basis, or within a further revision of the requirements 715 described herein. 717 Whilst the approach described above provides a standard means by 718 which error recovery may be handled on a per UPDATE basis, further 719 complexities are raised where multiple errors occur. Clearly, 720 following this procedure causes control-plane load on both the BGP 721 speakers - for this reason, consideration of how repeated use of the 722 mechanisms discussed in this document is required. It is notable 723 that errors may not occur with UPDATE messages relating to only a 724 single NLRI, independent errors in multiple NLRIs may be experienced. 725 For this reason, it is required that an implementation rate limits 726 the number of error handling events sourced towards a particular 727 neighbour. It is expected that such rate limiting, or event 728 suppression is achieved on a per-session basis, where state 729 information is already held, rather than on a per-prefix basis as it 730 is envisaged that such behaviour presents significant scaling 731 problems, and introduces further state requirements for an 732 implementation of the protocol. It is recommended that where a flag 733 indicative of erroneous behaviour is implemented, the state of such a 734 value is maintained independently of session establishment. 736 8. IANA Considerations 738 This memo includes no request to IANA. 740 9. Security Considerations 742 The requirements outlined in this document provide mechanisms by 743 which erroneous BGP messages may be responded to with limited impact 744 to forwarding operation. This is of benefit to the security of a BGP 745 speaker in general. Where UPDATE messages may have been propagated 746 by a single malicious Autonomous System or router within a network 747 (or the Internet default free zone - DFZ), which are then propagated 748 to all devices within the same routing domain, all other NLRI 749 available over the same session become unreachable. This mechanism 750 may provide means by which an Autonomous System can be isolated from 751 required routing domains (such as the Internet), should the relevant 752 UPDATE messages be propagated via specific paths. By reducing the 753 impact of such failures, it is envisaged that this possibility may be 754 constrained to a specific set of NLRI, or a specific topology. 756 Some mechanisms meeting the requirements specified in this document, 757 particularly those within Section 6 may provide further security 758 concerns, however, it is envisaged that these are addressed in per- 759 enhancement memos. 761 10. Acknowledgements 763 The author would like to thank Shane Amante, Bruno Decraene, Rob 764 Evans, David Freedman, Tom Hodgson, Sven Huster, Jonathan Newton, 765 Neil McRae, Thomas Mangin, Tom Scholl and Ilya Varlashkin for their 766 review and valuable feedback. 768 11. References 770 11.1. Normative References 772 [I-D.chen-ebgp-error-handling] 773 Chen, E., Mohapatra, P., and K. Patel, "Revised Error 774 Handling for BGP Updates from External Neighbors", 775 draft-chen-ebgp-error-handling-00 (work in progress), 776 September 2010. 778 [I-D.ietf-grow-bmp] 779 Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring 780 Protocol", draft-ietf-grow-bmp-05 (work in progress), 781 December 2010. 783 [I-D.ietf-idr-advisory] 784 Scholl, T., Scudder, J., Steenbergen, R., and D. Freedman, 785 "BGP Advisory Message", draft-ietf-idr-advisory-00 (work 786 in progress), October 2009. 788 [I-D.ietf-idr-optional-transitive] 789 Scudder, J. and E. Chen, "Error Handling for Optional 790 Transitive BGP Attributes", 791 draft-ietf-idr-optional-transitive-03 (work in progress), 792 September 2010. 794 [I-D.keyur-bgp-enhanced-route-refresh] 795 Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 796 Route Refresh Capability for BGP-4", 797 draft-keyur-bgp-enhanced-route-refresh-02 (work in 798 progress), March 2011. 800 [I-D.raszuk-bgp-diagnostic-message] 801 Raszuk, R., Chen, E., and B. Decraene, "BGP Diagnostic 802 Message", draft-raszuk-bgp-diagnostic-message-02 (work in 803 progress), March 2011. 805 [I-D.zeng-one-time-prefix-orf] 806 Zeng, Q. and J. Dong, "One-time Address-Prefix Based 807 Outbound Route Filter for BGP-4", 808 draft-zeng-one-time-prefix-orf-01 (work in progress), 809 October 2010. 811 [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, 812 September 2000. 814 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 815 Protocol 4 (BGP-4)", RFC 4271, January 2006. 817 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 818 Networks (VPNs)", RFC 4364, February 2006. 820 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 821 Reflection: An Alternative to Full Mesh Internal BGP 822 (IBGP)", RFC 4456, April 2006. 824 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 825 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 826 January 2007. 828 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 829 "Multiprotocol Extensions for BGP-4", RFC 4760, 830 January 2007. 832 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 833 with BGP-4", RFC 5492, February 2009. 835 11.2. Informational References 837 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 838 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 839 June 2010. 841 Author's Address 843 Rob Shakir 844 Cable&Wireless Worldwide 845 London 846 UK 848 Email: rjs@cw.net 849 URI: http://www.cw.com/