idnits 2.17.1 draft-ietf-grow-ops-reqs-for-bgp-error-handling-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 30, 2012) is 4286 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5881' is defined on line 1070, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2858 (Obsoleted by RFC 4760) == Outdated reference: A later version (-13) exists of draft-ietf-grow-bgp-gshut-03 == Outdated reference: A later version (-17) exists of draft-ietf-grow-bmp-06 == Outdated reference: A later version (-10) exists of draft-ietf-idr-bgp-enhanced-route-refresh-02 == Outdated reference: A later version (-16) exists of draft-ietf-idr-bgp-gr-notification-00 == Outdated reference: A later version (-06) exists of draft-ietf-idr-enhanced-gr-01 == Outdated reference: A later version (-03) exists of draft-zeng-idr-one-time-prefix-orf-02 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force R. Shakir 3 Internet-Draft BT 4 Intended status: Informational July 30, 2012 5 Expires: January 31, 2013 7 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 8 draft-ietf-grow-ops-reqs-for-bgp-error-handling-05 10 Abstract 12 BGP-4 is utilised as a key intra- and inter-Autonomous System routing 13 protocol in modern IP networks. The failure modes as defined by the 14 original protocol standards are based on a number of assumptions 15 around the impact of session failure. Numerous incidents both in the 16 global Internet routing table and within Service Provider networks 17 have been caused by strict handling of a single invalid UPDATE 18 message causing large-scale failures in one or more Autonomous 19 Systems. 21 This memo describes the current use of BGP-4 within Service Provider 22 networks, and outlines a set of requirements for further work to 23 enhance the mechanisms available to a BGP-4 implementation when 24 erroneous data is detected. Whilst this document does not provide 25 specification of any standard, it is intended as an overview of a set 26 of enhancements to BGP-4 to improve the protocol's robustness to suit 27 its current deployment. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on January 31, 2013. 46 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 65 1.2. Overview of Operator Requirements for BGP-4 Error 66 Handling . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 7 68 2.1. Classifying BGP Errors and Expected Error Handling . . . . 8 69 2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 9 70 2.1.2. Semantic BGP Errors . . . . . . . . . . . . . . . . . 9 71 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 11 72 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 13 73 5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 15 74 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 17 75 7. Operational Complexities Introduced by Altering RFC4271 . . . 21 76 7.1. Reducing the Network Impact of Session Teardown . . . . . 23 77 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 78 9. Security Considerations . . . . . . . . . . . . . . . . . . . 26 79 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 80 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 81 11.1. Normative References . . . . . . . . . . . . . . . . . . . 28 82 11.2. Informational References . . . . . . . . . . . . . . . . . 28 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 30 85 1. Introduction 87 Where BGP-4 [RFC4271] is deployed in the Internet and Service 88 Provider networks, numerous incidents have been recorded due to the 89 manner in which [RFC4271] specifies errors in routing information 90 should be handled. Whilst the behaviour defined in the existing 91 standards retains utility, the deployments of the protocol have 92 changed within modern networks, resulting in significantly different 93 demands for protocol robustness. Whilst a number of Internet Drafts 94 have been written to begin to enhance the behaviour of BGP-4 in terms 95 of the handling of erroneous messages, this memo intends to define a 96 set of requirements for ongoing work. These requirements are 97 considered from the perspective of a Network Operator, and hence this 98 draft does not intend to define the protocol mechanisms by which such 99 error handling behaviour is to be implemented. 101 1.1. Role of BGP-4 in Service Provider Networks 103 BGP was designed as an inter-Autonomous System (AS) routing protocol 104 and hence many of the error handling mechanisms within the protocol 105 specification are designed to be conducive to this role. In general, 106 this consideration as an inter-AS routing propagation mechanism 107 results in the view that a BGP session propagates a relatively small 108 amount of network-layer reachability information (NLRI) between two 109 ASes. In this case, it is the expectation of session resilience for 110 those adjacencies that are key to routing continuity (for example, it 111 is expected that two networks peering via BGP would connect multiple 112 times in order to safeguard equipment or protocol failure). In 113 addition, there is some expectation of multiple paths to a particular 114 NLRI being available - it would be expected that a network can fall 115 back to utilising alternate, less direct, paths where a failure of a 116 more direct path occurs. 118 Traditional network architectures would deploy an Interior Gateway 119 Protocol (IGP) to carry infrastructure and customer routes, with an 120 Exterior Gateway Protocol (EGP) such as BGP being utilised to 121 propagate these routes to other Autonomous Systems. However, with 122 the growth of IP-based services, this is no longer considered best 123 practice. In order to ensure that convergence is within acceptable 124 time bounds, the amount of routing information carried within the IGP 125 is significantly reduced - and tends to be only infrastructure 126 routes. iBGP is then utilised to propagate both customer, and 127 external routes within an AS. As such, BGP has become an IGP, with 128 traditional IGPs acting as a means by which to propagate the routing 129 information which is required to establish a BGP session, and reach 130 the egress node within the local routing domain. This change in role 131 presents different requirements for the robustness of BGP as a 132 routing protocol - with the expectation of similar level of 133 robustness to that of an IGP being set. 135 Along with this change in role, the nature of the IP routing 136 information that is carried has changed. BGP has become a ubiquitous 137 means by which service information can be propagated between devices. 138 For instance, BGP is utilised to carry routing information for IP/ 139 MPLS VPN services as described in [RFC4364]. Since there is an 140 existing deployment of the protocol between PE devices in numerous 141 networks, it has been adapted to propagate this routing information, 142 as its use limits the number of routing protocols required on each 143 device. This additional information being propagated represents a 144 large change in requirement for the error handling of the protocol - 145 where session failure occurs, it is likely a complete service outage 146 for at least a subset of a network's customers is experienced where 147 an erroneous packet may have occurred within a different sub-topology 148 or even service (a different address family for example). For this 149 reason, there is a significant demand to avoid service affecting 150 failures that may be triggered by routing information within a single 151 sub-topology or service. 153 The combination of the increased number of deployments of BGP-4 as an 154 intra-AS routing protocol, its use for the propagation of additional 155 types of routing and service information, and the growth of IP 156 services has resulted in a substantial increase in the volume of 157 information carried within BGP-4. In numerous networks, RIB sizes of 158 the order of millions of entries exist within individual BGP 159 speakers, with particularly high-scale points exhibited at BGP 160 speakers performing aggregation or functionality designed improve 161 utilisation of network resources (e.g., route reflector hierarchies). 162 Clearly an increase in the amount routing information carried in BGP 163 results in greater impact to services during failures, which is only 164 amplified by a corresponding increase in recovery times. Following a 165 failure, there is a substantial recovery time to learn, compute and 166 distribute new paths, which results in a greater observed impact to 167 services affected, and hence adds further weight to the requirement 168 to avoid failures altogether or, at least, mitigate their impact to 169 the narrowest scope possible, (e.g., a specific NLRI). Whilst an 170 argument could be made that convergence time of BGP-4 could 171 potentially be reduced through deployment of additional computational 172 resource, it is notable that solution is not necessarily 173 straightforward from an implementation or deployment perspective, 174 (e.g., scaling computation resources within a single address-family 175 is difficult). Thus, significant challenges continue to exist for 176 operators when scaling BGP-4 deployments, and hence mechanisms which 177 improve the scalability of BGP-4 are very important. 179 Both within Internet and multi-service routing architectures, a 180 number of BGP sessions propagate a large proportion of the required 181 routing information for network operation. For Internet routing, 182 these are typically BGP sessions which propagate the global routing 183 table to an AS - failure of these sessions may have a large impact on 184 network service, based on a single erroneous update. In an multi- 185 service environment, typical deployments utilise a small number of 186 core-facing BGP sessions, typically towards route reflector devices. 187 Failure of these sessions may also result in a large impact to 188 network operation. Clearly, the avoidance of conditions requiring 189 these sessions to fail is of great utility to any network operator, 190 and provides further motivation for the revision of the existing 191 behaviour. 193 Whilst the behaviour in [RFC4271] is suited to ensuring that BGP 194 messages with erroneous routing information in are limited in scope 195 (by means of session reset), with the above considerations, it is 196 clear that this mechanism is not suited to all deployments. It 197 should, however, be noted that the change in scope affects the 198 handling only of errors occurring after BGP session establishment. 199 There is no current operational requirement to amend the means by 200 which error handling in session establishment, or liveliness 201 detection, are performed. 203 1.2. Overview of Operator Requirements for BGP-4 Error Handling 205 It is the intention of this document to define a set of criteria for 206 the manner in which a revised error handling mechanism in BGP-4 is 207 required to conform. The motivation for the definition of these 208 requirements can be summarised based on certain behaviour currently 209 present in the protocol that is not deemed acceptable within current 210 operational deployments, or where there is a short-fall in the tool 211 set available to an operator. These key requirements can be 212 summarised as follows: 214 o It is unacceptable within modern deployments of the BGP-4 protocol 215 that a single erroneous UPDATE packet affects routes that it does 216 not carry. This requirement therefore requires some modification 217 to the means by which erroneous UPDATE packets are handled, and 218 reacted to - with a particular focus on avoiding the use of the 219 NOTIFICATION message. 221 o It is recognised that some error conditions may occur within the 222 BGP-4 protocol may not always be handled gracefully, and may 223 result in conditions whereby an implementation cannot recover. In 224 these (and similar) cases, it is undesirable for an operator that 225 this reset of the BGP-4 session results in interruption to 226 forwarding packets (by means of withdrawing routes installed by 227 BGP-4 into a device's RIB, and subsequently FIB). To this end, 228 there is a requirement to define a session reset mechanism which 229 provides session re-initialisation in a non-destructive manner. 231 o Further to the requirements to provide a more robust protocol, the 232 current visibility into error conditions within the BGP-4 protocol 233 is extremely limited - where further modifications to this 234 behaviour are to be made, complexity is likely to be added. Thus, 235 to ensure that BGP-4 is manageable, there are requirements for 236 mechanisms by which the protocol can be examined and monitored. 238 This document describes each of these requirements in further depth, 239 along with an overview of means by which they are expected to be 240 achieved. In addition, the mechanism by which the enhancements 241 meeting these requirements are to interact is discussed. 243 2. Errors within BGP-4 UPDATE Messages 245 Both through analysis of incidents occurring with the Internet DFZ, 246 and multi-service environments utilising BGP-4 to signal service or 247 routing information, a number of different classes of errors within 248 BGP-4 UPDATE messages have been observed. In order to consider the 249 applicability of enhanced error handling mechanisms, it is possible 250 to divide these errors into a number of sub-classes, particularly 251 focusing around the location of the error within the UPDATE message. 253 Where an UPDATE message is considered invalid by a BGP speaker due to 254 an error within a path attribute that is not the NLRI (where the 255 definition of NLRI includes reachability information encoded in the 256 MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in 257 [RFC4760]) it is a requirement of any enhanced error handling 258 mechanism to handle the error in a manner focused on the NLRI 259 contained within the message found to be erroneous. Since in this 260 case, the message received from the remote peer is syntactically 261 valid, it is considered that such an UPDATE is indicative of 262 erroneous data within one or more path attributes. The impact of the 263 current behaviour defined within the protocol makes the implication 264 that the BGP speaker from whom the message is received is now an 265 invalid path for all NLRI announced via the session - which results 266 in a disproportionate impact to overall network operation. In 267 particular scenarios (such as networks with centralised BGP route 268 reflection) such action can result in a loss of all reachability to a 269 network. In other contexts (such as the Internet DFZ), it cannot be 270 assumed that the BGP speaker from whom the UPDATE message is received 271 is directly responsible for the erroneous information contained 272 within the message. 274 Two further error cases exist within UPDATE messages, both of which 275 are related to the mechanisms that are applicable to messages 276 received where some difficulty exists in parsing the entire BGP 277 message. The two cases concern those cases where a valid NLRI 278 attribute can be extracted, and those where such an attribute is not 279 able to be parsed. In these cases, errors in the packing of 280 attributes within a BGP message may have occurred. Such errors are 281 likely indicative of an error specifically caused by the remote BGP 282 speaker. It is, however, desirable to an operator that such errors 283 are handled without affecting all NLRI across a BGP session. As 284 such, there is a key requirement to maximise the number of cases in 285 which it is possible to extract NLRI from a BGP UPDATE message. To 286 this end, it is required that where possible the MP_REACH_NLRI and 287 MP_UNREACH_NLRI attributes are utilised for encoding all NLRI 288 (including IPv4 Unicast), and that this attribute is included as the 289 first attribute of a BGP UPDATE message (as originally recommended in 290 [I-D.chen-ebgp-error-handling]). Such a change to the order of 291 inclusion of this attribute maximises the number of cases in which 292 NLRI can be extracted from an UPDATE. Where this is possible, it is 293 again required that the error handling mechanisms utilised should be 294 directly applied to the NLRI included in the UPDATE. 296 For all cases whereby NLRI can be obtained from an UPDATE message, it 297 is expected that the requirements outlined in Section 3 should be 298 considered by any enhancement to the BGP-4 protocol. 300 In the case that it is not possible to completely parse the NLRI 301 attribute from the UPDATE message received from a peer, it is 302 extremely likely that this is indicative of a serious error with 303 either the process of attribute packing, or buffer usage on the 304 remote BGP speaker. In this case, clearly, it is not possible to 305 apply any error handling mechanism that is limited to a specific set 306 of NLRI, since an implementation has no knowledge of the NLRI 307 included within the UPDATE message. In addition, such errors are 308 considered to be relatively fundamental to the operation of a BGP 309 implementation, and hence may indicate a case whereby significant 310 system errors have occurred. The current BGP-4 standard results in a 311 BGP speaker restarting a session with the remote BGP speaker. 312 However where such an error does occur, it is required that a 313 graceful mechanism is utilised to provide a lower impact to network 314 operation. The requirements for enhancements of this nature to BGP-4 315 are outlined in Section 5, with the requirements outlined therein 316 focused on providing a means by which system integrity can be 317 restored whilst allowing for continued network operation. 319 2.1. Classifying BGP Errors and Expected Error Handling 321 It is clearly of advantage for BGP-4 implementations to utilise a 322 consistent set of error handling mechanisms for the different types 323 of errors that are described in Section 2, and provide consistent 324 nomenclature to refer to them. It is therefore suggested that errors 325 that are indicative of larger scale failures of a BGP speaker, and 326 hence require some error handling at the session level are referred 327 to as 'critical' errors, whilst those errors that are identified 328 based on incorrect content of one of more attributes of a message are 329 referred to as 'semantic' errors. 331 The errors identified within the following sections consider only 332 those errors within the specifications at the time of writing, it is 333 recommended that in the definition of future extensions to the BGP-4 334 specification, the error handling behaviour (and the category within 335 which errors within the extension should be considered by an 336 implementation) is defined. 338 2.1.1. Critical BGP Errors 340 As described in this document, it is of advantage to limit the number 341 of 'critical' errors that occur within the protocol, therefore, based 342 on analysis of the processing of BGP UPDATE messages, it is required 343 that 'critical' error handling behaviour is applied to: 345 o UPDATE Message Length errors - whereby the specified overall 346 UPDATE message length is inconsistent with sum of the Total Path 347 Attribute and Withdrawn Routes length. In this case, this is 348 indicative of message packing failure, whereby the NLRI may not be 349 correctly extracted. 351 o Errors Parsing the NLRI attributes of an UPDATE message - where 352 NLRI is carried in either the IPv4-Unicast Advertised or Withdrawn 353 routes, or in the MP_REACH_NLRI or MP_UNREACH_NLRI attributes 354 [RFC2858], it is not possible to target error handling mechanisms 355 to specific NLRI, and hence session level mechanisms must be 356 utilised. 358 It is expected that those requirements outlined in Section 5 are 359 utilised to provide session-level handling of those errors identified 360 as 'critical'. 362 2.1.2. Semantic BGP Errors 364 Where a BGP message is correctly formed, a number of cases exist 365 whereby the contents of the UPDATE are not valid - in these cases, 366 this represents errors that can be identified to affect specific 367 NLRI. The following cases are expected to be classified as semantic 368 errors: 370 o Zero or invalid length errors in path attributes excluding those 371 containing NLRI, or where the length of all path attributes 372 contained within the UPDATE does not correspond to the total path 373 attributes length. In this case, the NLRI can be correctly 374 extracted, and hence acted upon. 376 o Messages where invalid data or flags are contained in a path 377 attribute that does not relate to the NLRI. 379 o UPDATE messages missing mandatory attributes, unrecognised non- 380 optional attributes or those that contain duplicate or invalid 381 attributes (be they unsupported or unexpected). 383 o Those messages where the NEXT_HOP, or MP_REACH next-hop values are 384 missing, length zero, or invalid for the relevant AFI/SAFI. 386 In these cases, it is expected that these errors can be handled 387 gracefully, following the requirements detailed in Section 3 and 388 Section 4 of this memo. 390 3. Avoiding use of NOTIFICATION 392 The error handling behaviour defined in RFC4271 is problematic due to 393 the limited options that are available to an implementation. When an 394 erroneous BGP message is received, at the current time, the 395 implementation must either ignore the error, or send a NOTIFICATION 396 message, after which it is mandatory to terminate the BGP session. 397 It is apparent that this requirement is at odds with that of protocol 398 robustness. 400 There is significant complexity to this requirement. The mechanism 401 defined in [I-D.chen-ebgp-error-handling] describes a means by which 402 no NOTIFICATION message is generated for all cases whereby NLRI can 403 be extracted from an UPDATE. The NLRI contained within the erroneous 404 UPDATE message is considered as though the remote BGP speaker has 405 provided an UPDATE marking it as withdrawn. This results in a limit 406 in the propagation of the invalid routing information, whilst also 407 ensuring that no traffic is forwarded via a previously-known path 408 that may no longer be valid. This mechanism is referred to as 409 "treat-as-withdraw". 411 Whilst this behaviour results in avoiding a NOTIFICATION message, 412 keeping other routing information advertised by the remote BGP 413 speaker within the RIB, it may result in unreachability for a sub-set 414 of the NLRI advertised by the remote speaker. Two cases should be 415 considered - that where the entry for a route in the Adj-RIB-In of 416 the neighbour propagating an erroneous packet is utilised, and that 417 where the route installed in the device's RIB is learnt from another 418 BGP speaker. In the former case, should the identified NLRI not be 419 treated as withdrawn, the original NLRI is utilised within the global 420 RIB. However, this information is potentially now invalid (i.e. it 421 no longer provides a valid forwarding path), whilst an alternate 422 (valid) path may exist in another Adj-RIB-In. By continuing to 423 utilise the NLRI for which the UPDATE was considered invalid, traffic 424 may be forwarded via an invalid path, resulting in routing loops, or 425 black-holing. In the second case, no impact to the forwarding of 426 traffic, or global RIB, is incurred, yet where treat-as-withdraw is 427 implemented, possibly stale routing information is purged from the 428 Adj-RIB-In of the neighbour propagating errors. 430 Whilst mechanisms such as "treat-as-withdraw" are currently 431 documented, the proposals are limited in their scope - particularly 432 in terms of restrictions to implementation only on eBGP sessions. 433 This limitation is made based on the view that the BGP RIB must be 434 consistent across an autonomous system. By implementing treat-as- 435 withdraw for a iBGP session, one or more routers within the 436 Autonomous System may not have reachability to a route, and hence 437 blackholing of traffic, or routing loops, may occur. It should, 438 however, be considered if this view is valid, in light of the manner 439 in which BGP is utilised within operator networks. Inconsistency in 440 a RIB based on a single UPDATE being treated as withdrawn may cause a 441 inconsistency in a single sub-topology (e.g. Layer 3 VPN service), 442 or a service not operating completely (in the case of an UPDATE 443 carrying service membership information). Where a NOTIFICATION and 444 teardown is utilised this is destructive to all sub-topologies in all 445 address family identifiers (AFIs) carried by the session in question. 446 Even where mechanisms such as multi-session BGP are utilised, a whole 447 AFI is affected by such a NOTIFICATION message. In terms of routing 448 operation, it is therefore far less costly to endure a situation 449 where a limited sub-set of routing information within an AS is 450 invalid, than to consider all routing information as invalid based on 451 a single trigger. 453 At the time of writing, error handling mechanisms related to 454 optional, transitive attributes - such as 455 [I-D.ietf-idr-optional-transitive] are restricted to handling only a 456 subset of attribute errors - whereas the operational requirement is 457 to expand this coverage to the widest set of errors possible (i.e., 458 all semantic errors within UPDATE messages). Additionally, where 459 approaches applicable to a greater number of attributes are proposed 460 (e.g., [I-D.chen-ebgp-error-handling]), these are limited to 461 deployment in eBGP applications only, where requirements also exist 462 in intra-domain cases. As such, it is envisaged that if extended to 463 cover these expanded cases, these mechanisms provide a means to avoid 464 the transmission of a NOTIFICATION message to a remote BGP speaker, 465 based on a single erroneous message, where at all possible, and hence 466 meet this requirement. Critical errors, including those whereby the 467 NLRI cannot be extracted from the UPDATE message, represent cases 468 whereby the receiving system cannot handle the error gracefully based 469 on this mechanism. 471 4. Recovering RIB Consistency 473 The recommendations described in Section 3 may result in the RIB for 474 a topology within an AS being inconsistent across the AS' internal 475 routers. Alternatively, where such mechanisms are deployed at an AS 476 boundary, interconnects between two ASes may be inconsistent with 477 each other. There are therefore risks of traffic blackholing, due to 478 missing routing information, or forwarding loops. Whilst this is 479 deemed an acceptable compromise in the short term, clearly, it is 480 suboptimal. Therefore, a requirement exists to provide mechanisms by 481 which a BGP speaker is able to recover the consistency of the Adj- 482 RIB-In for a particular neighbour. 484 In the general case, the consistency of the BGP RIB can be recovered 485 by re-requesting the entire Adj-RIB-Out of a remote BGP speaker is 486 re-advertised. A mechanism to achieve this re-advertisement is 487 defined within the ROUTE-REFRESH specification [RFC2918]. It is 488 envisaged that by requesting a refresh of all NLRI advertised by a 489 BGP speaker, any NLRI which has been withdrawn due to being contained 490 within an invalid UPDATE message is re-learnt. Where a ROUTE REFRESH 491 is used to directly perform a consistency check between the Adj-RIB- 492 Out of a remote device, and the Adj-RIB-In of the local BGP speaker, 493 a demarcation between the ROUTE-REFRESH, and normal UPDATE messages 494 is required (in order that an "end" of the refresh can be used to 495 identify any 'stale' NLRI) - 496 [I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which 497 the ROUTE-REFRESH mechanism can be extended to meet this requirement. 499 Whilst re-advertisement of the whole BGP RIB provides a means by 500 which withdrawn NLRI can be re-advertised, there are some scaling 501 implications that must be considered. In the case that a ROUTE- 502 REFRESH is generated, all NLRI must be re-packed into UPDATE messages 503 and advertised by one speaker on the BGP session, whilst the other 504 must receive all UPDATE messages, and validate the RIB's consistency. 505 In order to avoid the control-plane load, it is therefore a 506 requirement to utilise targeted mechanisms where possible, rather 507 than incurring the additional load on both the advertising and 508 receiving speaker of building and processing UPDATEs for the entire 509 contents of the RIB. 511 It is envisaged that during routing inconsistencies caused by 512 utilising the 'treat-as-withdraw' mechanism, the local BGP speaker is 513 aware that some routing information was not able to be processed - 514 due to the fact that an UPDATE message was not parsed correctly. 515 Since this mechanism (as discussed in Section 3) requires the local 516 BGP speaker to have determined the set of NLRI for which an erroneous 517 UPDATE message was received, it is possible to use a targeted 518 mechanisms to re-request the specific NLRI that was contained within 519 the erroneous UPDATE message. By re-requesting, this provides the 520 remote BGP speaker an opportunity to re-transmit the NLRI - possibly 521 providing an opportunity to leverage alternative methods to build the 522 UPDATE message. Such a request requires extension to the existing 523 BGP-4 protocol, in terms of specific UPDATE generation filters with a 524 transient lifetime. It is envisaged that the work within 525 [I-D.zeng-idr-one-time-prefix-orf] provides a mechanism allowing 526 targeted elements of the Adj-RIB-In for a BGP neighbour to be 527 recovered. 529 It is of particular note for both means of recovering RIB consistency 530 described that these are effective only when considering transient 531 errors within an implementation - for instance, should an RFC 532 interpretation error within an implementation be present, regardless 533 of the number of times a specific UPDATE is generated, it is likely 534 that this error condition will persist (as it may with the existing 535 behaviour defined by [RFC4271]). For this reason, there is an 536 requirement to consider the means by which such consistency recovery 537 mechanisms are utilised. It is not advisable that a dynamic filter 538 and advertisement mechanism is triggered by all error handling events 539 due to the load this is likely to place on the neighbour receiving 540 such a request. Where this BGP speaker is a relatively centralised 541 device - a route reflector (as described by [RFC4456]) for example - 542 the act of generation of UPDATE messages with such frequency is 543 likely to cause disproportionate load. It is therefore an 544 operational requirement of such mechanisms that means of request 545 dampening be required by any such extension. 547 In cases whereby the consistency of the Adj-RIB-In is to be restored 548 (e.g., following the 'treat-as-withdraw' behaviour described in 549 Section 3), and mechanisms such as those described herein are 550 triggered, such a condition should be noted to an operator by means 551 of a specific flag, SNMP trap, or other logging mechanism. In order 552 to identify the subset of NLRI that are considered to be 553 inconsistent, this information is of operational benefit and hence 554 should be logged. 556 5. Reducing the Impact of Session Reset 558 Even where protocol enhancements allow errors in the BGP-4 protocol 559 to cease to trigger NOTIFICATION messages, and hence reset a BGP 560 session, it is clear that some error conditions may not be exited. 561 In particular, errors due to existing state, or memory structures, 562 associated with a specific BGP session will not be handled. It is 563 therefore important to consider how these error conditions are 564 currently handled by the protocol. It should be noted that the 565 following discussion and analysis considers only those NOTIFICATION 566 messages generated in response to errors in UPDATE messages (as 567 defined by Section 6.3 in [RFC4271]). 569 The existing NOTIFICATION behaviour triggers a reset of all elements 570 of the BGP-4 session, as described in Section 6 of [RFC4271]. It is 571 expected that session teardown requires an implementation to re- 572 initialise all structures and state required for session maintenance. 573 Clearly, there is some utility to this requirement, as error 574 conditions in BGP are, in general, exited from. However, this 575 definition is responsible for the forwarding outages within networks 576 utilising BGP for propagation of routing or service when each error 577 is experienced. The requirement described in Section 3 is intended 578 to reduce the cases whereby a NOTIFICATION is required, however, any 579 mechanism implemented as a response to this requirement by definition 580 cannot provide a session reset to the extent of that achieved by the 581 current behaviour. 583 In order to address this, there is a requirement for a means by which 584 a BGP speaker can signal that an unhandled error condition in an 585 UPDATE message occurred - requiring a session reset - yet also 586 continue to utilise the paths advertised by the neighbour that are 587 currently in use within the RIB. In this case, the Adj-RIB-In 588 received from the neighbour is not considered invalid, despite a 589 NOTIFICATION, and session reset, being required. This set of 590 requirements is akin to those answered by the BGP Graceful Restart 591 mechanism described in [RFC4724]. Since the operational requirement 592 in this case is to provide a means to achieve a complete session 593 restart without disrupting the forwarding path of those routes in use 594 within a BGP speaker's RIB, it is expected that utilising a procedure 595 similar to the Graceful Restart mechanism meets the error handling 596 requirement. By responding to an error condition (repeated or 597 otherwise) with a message indicating that an error that cannot be 598 handled has occurred, forcing session reset, whilst retaining 599 forwarding information within the RIB allows forwarding to all routes 600 within a system's RIB to continue during the period in which the 601 session restarts. It is envisaged that the additional complexity 602 introduced by the introduction of such a mechanism can be limited by 603 extending existing BGP messages - one such approach is proposed in 605 [I-D.ietf-idr-bgp-gr-notification]. By placing a time bound on the 606 restart lifetime, should an error condition not be transient - for 607 example, should an error have occurred with the BGP process, rather 608 than a specific of the BGP session - the remote BGP speaker is still 609 detected as an invalid device for forwarding. 611 In some cases, the erroneous condition may be due to corruption of 612 the Adj-RIB-Out on the advertising BGP speaker - rather than caused 613 by the receiving speaker's state. In these cases, where existing 614 structures are replayed whilst performing graceful restart 615 functionality, the error condition is not necessarily resolved. 616 Therefore, it is recommended that during a session restart event, as 617 described within this section, the advertising speaker purge and 618 rebuild RIB structures, in order to resolve any corruption within 619 these structures. 621 It should be noted that a protocol enhancement meeting this 622 requirement is not able to solve all error conditions - however, a 623 complete restart of the BGP and TCP session between two BGP speakers 624 implements an identical recovery mechanism to that which is achieved 625 by the existing behaviour. Where an error condition such as memory 626 or configuration corruption has occurred in a BGP implementation, it 627 is expected that a mechanism meeting this requirement continues to 628 detect this, by means of a bound on time for session restart to 629 occur. Whilst there may be some consideration that packets continue 630 to be forwarded through a device which can be in an failure mode of 631 this nature for a longer period due to this requirement, the 632 architecture of modern IP routers should be considered. A divided 633 forwarding and control plane is common in many devices, as well as 634 process separation for software-based devices - corruption of a 635 specific protocol daemon does not necessarily imply forwarding is 636 affected. Indeed, where forwarding behaviour of a device is 637 affected, it is envisaged that a failure detection mechanism (be it 638 Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) 639 will detect such a failure in almost all cases, with the symptomatic 640 behaviour of such a failure being an invalid UPDATE message in very 641 few other cases. 643 6. Operational Toolset for Monitoring BGP 645 A significant complexity that is introduced through the requirements 646 defined in this document is that of monitoring BGP session status for 647 an operator. Although the existing error handling behaviour causes a 648 disproportionate failure, session failure is extremely visible to 649 most operational personnel within a Network Operator due to both 650 existing definitions of SNMP trap mechanisms for BGP, along with the 651 forwarding impact typically caused by such a failure. By introducing 652 mechanisms by which errors of this nature are not as visible, this is 653 no longer the case. There is a requirement that where subsets of the 654 RIB on a device are no longer reachable from a BGP speaker, or indeed 655 an AS, that some visibility of this situation, alongside a mechanism 656 to determine the cause is available to an operator. Whilst, to some 657 extent, this can be solved by mandating a sub-requirement of each of 658 the aforementioned requirements that a BGP speaker must log where 659 such errors occur, and are hence handled, this does not solve all 660 cases. In order to clarify this requirement, the example of the 661 transmission of an erroneous Optional Transitive attribute can be 662 considered. Since, by definition, there is no requirement for all 663 BGP speakers to parse such an attribute, a receiving router may treat 664 NLRI as withdrawn based on an erroneous attribute not examined by its 665 neighbour. In this case, the upstream device or network, propagating 666 the UPDATE, has no visibility of this error. Operationally, however, 667 it is of interest to the upstream router operator that such invalid 668 information was propagated. 670 The requirement for logging of error conditions in transmitted BGP 671 messages, which are visible to only the receiver, cannot be achieved 672 by any existing BGP message, or capability. It is envisaged that 673 each erroneous event should be transmitted to the remote peer - 674 including the information as to the set of NLRI that were considered 675 invalid. Whilst with some mechanisms this is achieved by default 676 (for example, One-Time Prefix ORF [I-D.zeng-idr-one-time-prefix-orf] 677 (Outbound Route Filtering) will transmit the set of routes that are 678 required), the operator requirement is to know which routes may have 679 been unreachable in all cases. It is envisaged that an extension to 680 meet this requirement will allow for such information to be 681 transmitted between peers, and hence logged. Such a mechanism may 682 provide further utility as a either a diagnostic, or logging toolset. 684 As such, it is possible to divide the messages that are required in 685 order to provide further visibility into BGP for an operator. Such a 686 division can be made both due to the required means of message 687 transmission, alongside the criticality of each request. 689 o Messages required to replace NOTIFICATION - In cases where the 690 error handling mechanisms defined by [RFC4271] currently result in 691 a NOTIFICATION message being generated, a number of the 692 requirements detailed within this document result this message 693 being suppressed. Despite this change, the error condition's 694 occurrence is still of interest to an operator in order to provide 695 both monitoring and troubleshooting capabilities, since some form 696 of invalid data has been received on a session. It therefore 697 considered that an implementation must generate a message both 698 locally, and transmitted to the remote peer, based on the such a 699 condition. Where such a message is transmitted to the remote 700 peer, it is considered that the BGP session via which the 701 erroneous UPDATE message was received should be used as transport 702 to the remote peer. The information transmitted in such a message 703 should be minimised to allow identification of the paths which 704 were considered erroneous (i.e. restricting the information to 705 that which is directly relevant to a network operator in the case 706 of an error condition occurring). Any delay to convergence on the 707 session in question is considered to be acceptable, given the 708 suboptimal nature of the reception of invalid routing information 709 via a BGP session. Further concerns regarding such a mechanism 710 relate to the load generated on the BGP speaker in question, 711 however, it must be considered that in the case of an erroneous 712 UPDATE being received, and the 'treat-as-withdraw' mechanism being 713 utilised, where the erroneous path is removed from the Loc-RIB, 714 there is likely to be a requirement to generate UPDATE messages 715 withdrawing the route from all further BGP speakers to which the 716 prefix is advertised. The load generated by the generation of 717 such UPDATEs is likely to be much greater than that of 718 transmitting error information via a logging message type back to 719 the speaker from which it was received. It is envisaged that 720 light-weight BGP message-based signalling mechanisms such as the 721 ADVISORY message types detailed in 722 [I-D.ietf-idr-operational-message] provide a suitable means to 723 satisfy this requirement. 725 o Additional Diagnostic Capabilities for BGP - In a number of cases, 726 there is an operational requirement to further debug erroneous BGP 727 UPDATE messages, along with the particulars of the state of a BGP 728 speaker. For instance, where an invalid BGP UPDATE message is 729 transmitted between two BGP speakers, the exact format of the 730 UPDATE message is of interest to an operator, as this information 731 provides a clear indication of an message considered to be 732 erroneous by the BGP speaker to which it was transmitted. In this 733 case, it is considered of great utility that the entire UPDATE 734 message is transmitted back to the advertising speaker, in order 735 to allow for further debugging to occur. Whilst such information 736 is particularly useful to an operator, it clearly provides 737 information that is not key to protocol operation - for this 738 reason, it is expected that some of the concerns regarding the 739 additional complexity, and load that a BGP speaker is subjected to 740 is not acceptable. For this reason, it is required that where 741 mechanisms are developed to support this requirement, messages of 742 this nature can be supported both within an existing BGP session, 743 and via a dedicated separate session, be it BGP carrying messages 744 such as those defined in [I-D.ietf-idr-operational-message] or a 745 dedicated monitoring protocol akin to BMP described in 746 [I-D.ietf-grow-bmp]. 748 Whilst the operational requirement for such monitoring tools to allow 749 for visibility into BGP is clearly agreed upon, the means by which 750 such messages are transmitted between two BGP speakers is likely to 751 be dependent upon both the positions of the speakers in question (for 752 instances, the requirements for such a protocol may differ where a 753 session is between two ASBRs under separate administration). The 754 introduction of additional message types to the BGP protocol clearly 755 introduces further complexity - and leaves room for further 756 implementation and standardisation errors that may compromise the 757 robustness of the BGP protocol. In addition, the queuing and 758 scheduling of these BGP messages must be interleaved with the 759 transmission of the key protocol messages - such as KEEPALIVE and 760 UPDATE packets. It is therefore a concern that should a large number 761 of messages specifically for operational visibility be transmitted, 762 this will delay the transmission of UPDATE packets, and hence 763 adversely affect the end-to-end convergence time for NLRI carried 764 within BGP. The operational requirement for why messages are 765 advantageous to be in-band to a protocol should also be considered. 766 In particular, it should be noted that where such information is to 767 be transmitted between administrative boundaries a BGP session 768 represents an existing channel between the two ASes. This channel is 769 considered to be secure insofar as the routing information, and 770 requests sent via the session are considered to come from a trusted 771 source. Since error information relates to both a particular 772 attachment, and is key to ensuring that such a session is operating 773 as expected, it is considered of great operational benefit that this 774 information is transmitted over this channel. In addition, the 775 overall system scalability is improved by such in-band transmission. 776 It is expected that erroneous information resulting in the 'treat-as- 777 withdraw' mechanism being utilised is relatively infrequently 778 transmitted between two peers (when compared to the frequency of 779 UPDATE messages transmission). The impact of including an additional 780 BGP message type for such operational visibility is relatively small 781 from a resource utilisation perspective - additional processing 782 overhead is only experienced when such a message is received. Where 783 a separate session is maintained, particular network elements within 784 a service provider topology may require hundreds, or thousands, of 785 additional sessions for the transmission of this information. Such 786 an resource consumption overhead is likely to be unacceptable to some 787 network operators. 789 For the reasons explained above, it is expected that mechanisms 790 specified to meet the requirements for event visibility consider the 791 relative impacts of additional monitoring sessions, or message 792 inclusion in band to BGP in order not to compromise the security, 793 scalability and robustness of the BGP-4 protocol. 795 7. Operational Complexities Introduced by Altering RFC4271 797 The existing NOTIFICATION and subsequent teardown of a BGP session 798 upon encountering an error has the advantage that a consistent 799 approach to error handling is required of all implementations of the 800 BGP-4 protocol. This is of operational advantage as it provides a 801 clear expectation of the behaviour of the protocol. The requirements 802 defined herein add further complexity to the error-handling within 803 BGP, and hence are liable to compromise the existing deterministic 804 protocol behaviour. It is therefore deemed that there is a further 805 requirement to define a set of recommended behaviours based on the 806 reception of a particular class of erroneous UPDATE message, 807 alongside highlighting some of the implementation complexities that 808 may need to be handled in the case that particular recommendations 809 made within this memo are deployed. 811 Utilising the classes of erroneous UPDATE message described in 812 Section 2, the recommended behaviour for a BGP-4 implementation can 813 be divided into two branches. Primarily, where a semantic error is 814 identified, an implementation is expected to utilise the reduced- 815 impact error handling approach, as described in Section 3. In the 816 case that such an approach results in known NLRI being withdrawn from 817 the BGP speaker's RIB, and an implementation provides functionality 818 such that these errors are recovered from through an automatically 819 triggered means, such as those described within Section 4, some 820 consideration of the scalability of these recovery mechanisms is 821 required. Clearly, there is an computational and bandwidth overhead 822 associated with the re-advertisement of NLRI between two BGP speakers 823 - both due to the generation of UPDATE messages, their transmission 824 between the two speakers, and the parsing and processing into the RIB 825 required. This overhead is directly proportional to the number of 826 UPDATE messages that are required. Where a semantic error is 827 experienced, by definition the NLRI contained within the UPDATE can 828 be extracted. It is therefore possible to minimise the proportion of 829 the RIB that is re-advertised by targeting any recovery mechanism on 830 the NLRI contained within the erroneous UPDATE. Such a targeted 831 mechanism can be achieved through a means such as One-Time ORF, or 832 other means of targeting UPDATE messages not discussed within this 833 memo. It is recommended that where available, any automatic (or 834 manual) triggered recovery mechanism behaviour utilises such targeted 835 means in preference to any whole RIB refresh mechanism (such as 836 ROUTE-REFRESH). 838 In the case that an erroneous UPDATE has been processed through a 839 means such as treat-as-withdraw (described within Section 3), a 840 recovering mechanism may be considered superfluous, if the assumption 841 is made that the RIB inconsistency will only be recovered from based 842 on a path re-convergence (or change in BGP attribute) for the 843 advertising BGP speaker. However, where this assumption is not 844 considered to provide adequate recovery behaviour, and a mechanism to 845 restore RIB consistency automatically is implemented, some 846 consideration must be made for where repeated erroneous messages 847 occur. In this case, in order to limit the impact to the BGP 848 speaker's network operation, at a pre-defined point it is recommended 849 that such automatic recovery mechanisms towards the BGP speaker from 850 which erroneous UPDATEs are repeatedly received are suppressed, and 851 the fact that such suppression has occurred is highlighted to an 852 operator. The point at which such behaviour is suppressed is to be 853 defined on a per-implementation basis, taking into account feedback 854 from the Network Operator community based on the deployment of the 855 recommendations described in this document. It is expected that such 856 trigger points are dependent upon the mechanisms implemented for a 857 particular BGP-4 implementations, and the impact upon the speaker of 858 these means of RIB recovery. 860 Where critical errors are experienced, such that a session reset is 861 required, the mechanism discussed in Section 5 should be used. 862 Again, since such a mechanism results in a restart of a BGP session, 863 it expected that all NLRI carried over the session is re-advertised 864 as it is re-established, incurring processing overhead on both the 865 advertising and receiving BGP speaker. In order to minimise the 866 consumption of control-plane computational resource on both speakers, 867 it is recommended that mechanisms allowing a reduced set of BGP 868 UPDATE messages to be re-transmitted between two speakers are 869 employed wherever possible - for instance through employing 870 mechanisms such as those described in [I-D.ietf-idr-enhanced-gr]. 872 In the case that repeated critical errors occur, the overhead of 873 performing any mechanism implemented based on the requirements in 874 Section 5 is incurred following each erroneous UPDATE message. Since 875 these mechanisms are, by definition, performed automatically in 876 response to the erroneous message being received similar 877 considerations as to the impact to the BGP speaker must be taken into 878 account. As such, it is expected that after a certain trigger level, 879 the ongoing receipt of critical errors within BGP UPDATE messages is 880 deemed to be indicative of a long-lasting failure, and a session no 881 longer considered viable. Where such an case is experienced, it is 882 expected that the BGP session reverts to the standard session failure 883 behaviour, as described in [RFC4271] and documents updating this base 884 standard. Where such a reversion is implemented this condition 885 should be flagged to an network operator. The number of restart 886 attempts before the session reverts to being shut down should be 887 determined based on the overhead of the recovery mechanisms 888 implemented (for instance, where [I-D.ietf-idr-enhanced-gr] is 889 implemented, the impact of session restart may be significantly 890 lower), and operational experience of the deployment of the 891 recommendations described in this document. 893 Since repeated erroneous UPDATE messages which experience critical 894 errors may be indicative of long-lasting failure modes, it is 895 recommended that a back-off from restarting BGP sessions experiencing 896 such behaviour is implemented. As such, this is not applicable to 897 restart behaviour through means such as those described in Section 5 898 since such restarts are time-bound based on the period for which the 899 Adj-RIB-In from a BGP speaker is maintained as valid (e.g., when 900 considering BGP Graceful Restart, such restarts are time-bound by the 901 Restart Time described in [RFC4724]). However, following a session 902 reverting to being pulled down based on repeated error conditions, it 903 is recommended that following restart attempts are subject to an 904 exponentially increasing interval between subsequent attempts. It is 905 therefore recommended that in such cases an implementation implements 906 the increasing values of IdleHoldTimer as described in the BGP-4 FSM 907 documented in [RFC4271]. 909 7.1. Reducing the Network Impact of Session Teardown 911 As discussed within the preceding section, where repeated critical 912 UPDATE message errors are received, it is recommended that the impact 913 to the both advertising and receiving BGP-4 speakers be limited by 914 reverting to tearing the BGP-4 session experiencing such errors down. 915 The BGP-4 specification presented in [RFC4271] achieves such a 916 session shutdown by sending a NOTIFICATION message, however, this has 917 the net result that all downstream BGP speakers (i.e. those to whom 918 the routes carried over the now ceased BGP session was readvertised) 919 must withdraw this route from their RIB, and perform a best-path 920 selection if required. In some cases, there may be no alternate path 921 available, and hence a period of time for which no valid BGP route 922 exists. Particularly, this is very likely to occur where an upstream 923 BGP speaker performs a best-path selection and advertises only a 924 single path to its neighbours - there is a requirement for the 925 upstream speaker to perform a best-path selection, and re-advertise a 926 new set of NLRI before the downstream system is able to converge to a 927 new path. It should be noted that where UPDATE messages withdrawing 928 NLRI are not subject to the BGP session's configured 929 MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements 930 are, this may result in a BGP speaker being without a path for a 931 period up to the MRAI. 933 Clearly, it is advantageous to avoid this period of time for which 934 there may be no reachability for a set of routes, especially since 935 the BGP speaker terminating a particular session is doing so due to a 936 particular error handling policy. The graceful shutdown mechanism 937 detailed in [I-D.ietf-grow-bgp-gshut] provides a mechanism by which a 938 BGP speaker is able to signal that a set of routes are to be 939 withdrawn, and hence allow downstream systems to pre-emptively 940 perform a best-path selection, and hence advertise new reachability 941 information in a make-before-break manner. 943 It is therefore envisaged, that where a session is to be shutdown, 944 based on a trigger relating to erroneous UPDATE messages being 945 received (be they repeated or not) that the graceful shutdown 946 procedure in utilised, so as to reduce the forwarding impact of 947 routes received on the session being withdrawn. 949 8. IANA Considerations 951 This memo includes no request to IANA. 953 9. Security Considerations 955 The requirements outlined in this document provide mechanisms by 956 which erroneous BGP messages may be responded to with limited impact 957 to forwarding operation. This is of benefit to the security of a BGP 958 speaker in general. Where UPDATE messages may have been propagated 959 by a single malicious Autonomous System or router within a network 960 (or the Internet default free zone - DFZ), which are then propagated 961 to all devices within the same routing domain, all other NLRI 962 available over the same session become unreachable. This mechanism 963 may provide means by which an Autonomous System can be isolated from 964 required routing domains (such as the Internet), should the relevant 965 UPDATE messages be propagated via specific paths. By reducing the 966 impact of such failures, it is envisaged that this possibility may be 967 constrained to a specific set of NLRI, or a specific topology. 969 Some mechanisms meeting the requirements specified in this document, 970 particularly those within Section 6 may provide further security 971 concerns, however, it is envisaged that these are addressed in per- 972 enhancement memos. 974 10. Acknowledgements 976 The author would like to thank the following network operators for 977 their insight, and valuable input in defining the requirements for a 978 variety of operational deployments of the BGP-4 protocol; Shane 979 Amante, Bruno Decraene, Rob Evans, David Freedman, Wes George, Tom 980 Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom 981 Scholl and Ilya Varlashkin. 983 In addition, many thanks are extended to Jeff Haas, Wim Hendrickx, 984 Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert 985 Raszuk for their expertise relating to implementations of the BGP-4 986 protocol. 988 11. References 990 11.1. Normative References 992 [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, 993 "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. 995 [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, 996 September 2000. 998 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 999 Protocol 4 (BGP-4)", RFC 4271, January 2006. 1001 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1002 Networks (VPNs)", RFC 4364, February 2006. 1004 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 1005 Reflection: An Alternative to Full Mesh Internal BGP 1006 (IBGP)", RFC 4456, April 2006. 1008 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 1009 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 1010 January 2007. 1012 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 1013 "Multiprotocol Extensions for BGP-4", RFC 4760, 1014 January 2007. 1016 11.2. Informational References 1018 [I-D.chen-ebgp-error-handling] 1019 Chen, E., Mohapatra, P., and K. Patel, "Revised Error 1020 Handling for BGP Updates from External Neighbors", 1021 draft-chen-ebgp-error-handling-01 (work in progress), 1022 September 2011. 1024 [I-D.ietf-grow-bgp-gshut] 1025 Francois, P., Decraene, B., Pelsser, C., Patel, K., and C. 1026 Filsfils, "Graceful BGP session shutdown", 1027 draft-ietf-grow-bgp-gshut-03 (work in progress), 1028 December 2011. 1030 [I-D.ietf-grow-bmp] 1031 Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring 1032 Protocol", draft-ietf-grow-bmp-06 (work in progress), 1033 December 2011. 1035 [I-D.ietf-idr-bgp-enhanced-route-refresh] 1036 Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 1037 Route Refresh Capability for BGP-4", 1038 draft-ietf-idr-bgp-enhanced-route-refresh-02 (work in 1039 progress), June 2012. 1041 [I-D.ietf-idr-bgp-gr-notification] 1042 Patel, K., Fernando, R., and J. Scudder, "Notification 1043 Message support for BGP Graceful Restart", 1044 draft-ietf-idr-bgp-gr-notification-00 (work in progress), 1045 December 2011. 1047 [I-D.ietf-idr-enhanced-gr] 1048 Patel, K., Chen, E., Fernando, R., and J. Scudder, 1049 "Accelerated Routing Convergence for BGP Graceful 1050 Restart", draft-ietf-idr-enhanced-gr-01 (work in 1051 progress), June 2012. 1053 [I-D.ietf-idr-operational-message] 1054 Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL 1055 Message", draft-ietf-idr-operational-message-00 (work in 1056 progress), March 2012. 1058 [I-D.ietf-idr-optional-transitive] 1059 Scudder, J., Chen, E., Mohapatra, P., and K. Patel, 1060 "Revised Error Handling for BGP UPDATE Messages", 1061 draft-ietf-idr-optional-transitive-04 (work in progress), 1062 October 2011. 1064 [I-D.zeng-idr-one-time-prefix-orf] 1065 Zeng, Q., Dong, J., Heitz, J., Patel, K., Shakir, R., and 1066 Z. Huang, "One-time Address-Prefix Based Outbound Route 1067 Filter for BGP-4", draft-zeng-idr-one-time-prefix-orf-02 1068 (work in progress), July 2012. 1070 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1071 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 1072 June 2010. 1074 Author's Address 1076 Rob Shakir 1077 BT 1078 pp C3L 1079 BT Centre 1080 81, Newgate Street 1081 London EC1A 7AJ 1082 UK 1084 Email: rob.shakir@bt.com 1085 URI: http://www.bt.com/