idnits 2.17.1 draft-ietf-idr-bgp4-experience-protocol-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 300: '...by a BGP speaker MUST NOT be sent to o...' RFC 2119 keyword, line 310: '...n implementation MUST provide a mechan...' RFC 2119 keyword, line 343: '... The LOCAL_PREF MUST be sent to IBGP ...' RFC 2119 keyword, line 344: '... MUST NOT be sent to EBGP Peers. Al...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2003) is 7529 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2119' is mentioned on line 35, but not defined == Missing Reference: 'BGP-MIB' is mentioned on line 134, but not defined == Missing Reference: 'BGP-IMPL' is mentioned on line 158, but not defined == Missing Reference: 'RFC 1965' is mentioned on line 182, but not defined ** Obsolete undefined reference: RFC 1965 (Obsoleted by RFC 3065) == Unused Reference: 'RFC 1264' is defined on line 743, but no explicit reference was found in the text == Unused Reference: 'RFC 1519' is defined on line 753, but no explicit reference was found in the text == Unused Reference: 'RFC 1657' is defined on line 760, but no explicit reference was found in the text == Unused Reference: 'RFC 1772' is defined on line 768, but no explicit reference was found in the text == Unused Reference: 'RFC 1773' is defined on line 772, but no explicit reference was found in the text == Unused Reference: 'RFC 3345' is defined on line 793, but no explicit reference was found in the text == Unused Reference: 'BGP4-IMPL' is defined on line 799, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1105 (Obsoleted by RFC 1163) -- Duplicate reference: RFC1105, mentioned in 'RFC 1163', was also mentioned in 'RFC 1105'. ** Obsolete normative reference: RFC 1105 (ref. 'RFC 1163') (Obsoleted by RFC 1163) ** Obsolete normative reference: RFC 1264 (Obsoleted by RFC 4794) -- Duplicate reference: RFC1105, mentioned in 'RFC 1267', was also mentioned in 'RFC 1163'. ** Obsolete normative reference: RFC 1105 (ref. 'RFC 1267') (Obsoleted by RFC 1163) ** Obsolete normative reference: RFC 1269 (Obsoleted by RFC 4273) ** Obsolete normative reference: RFC 1519 (Obsoleted by RFC 4632) ** Obsolete normative reference: RFC 1656 (Obsoleted by RFC 1773) ** Obsolete normative reference: RFC 1657 (Obsoleted by RFC 4273) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 1773 ** Obsolete normative reference: RFC 1966 (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 2385 (Obsoleted by RFC 5925) ** Obsolete normative reference: RFC 2796 (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 3065 (Obsoleted by RFC 5065) ** Downref: Normative reference to an Informational RFC: RFC 3345 -- Possible downref: Non-RFC (?) normative reference: ref. 'BGP4-ANALYSIS' -- Possible downref: Non-RFC (?) normative reference: ref. 'BGP4-IMPL' -- Possible downref: Non-RFC (?) normative reference: ref. 'BGP4' -- Possible downref: Non-RFC (?) normative reference: ref. 'SBGP' Summary: 21 errors (**), 0 flaws (~~), 13 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Danny McPherson 2 Arbor Networks 3 Keyur Patel 4 Cisco Systems 5 Category Informational 6 Expires: March 2004 September 2003 8 Experience with the BGP-4 Protocol 9 11 Status of this Document 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 The key words "MUST"", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 33 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 34 document are to be interpreted as described in RFC 2119 [RFC 2119]. 36 This document is a product of an individual. Comments are solicited 37 and should be addressed to the author(s). 39 Copyright Notice 41 Copyright (C) The Internet Society (2003). All Rights Reserved. 43 Abstract 45 The purpose of this memo is to document how the requirements for 46 advancing a routing protocol from Draft Standard to full Standard 47 have been satisfied by Border Gateway Protocol version 4 (BGP-4). 49 This report satisfies the requirement for "the second report", as 50 described in Section 6.0 of RFC 1264. In order to fulfill the 51 requirement, this report augments RFC 1773 and describes additional 52 knowledge and understanding gained in the time between when the 53 protocol was made a Draft Standard and when it was submitted for 54 Standard. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4 61 3. Management Information Base (MIB). . . . . . . . . . . . . . . 5 62 4. Implementations. . . . . . . . . . . . . . . . . . . . . . . . 5 63 5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5 64 6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6 65 7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7 67 7.1.1. MEDs and Potatoes. . . . . . . . . . . . . . . . . . . . 8 68 7.1.2. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 69 7.1.3. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 9 70 7.1.4. MEDs and Temporal Route Selection. . . . . . . . . . . . 9 71 8. LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . . . 9 72 9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10 73 10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 11 74 11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 12 75 12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 12 76 13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 13 77 13.1. Consideration of TCP Characteristics . . . . . . . . . . . 13 78 14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 14 79 15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 15 80 16. Control over Version Negotiation. . . . . . . . . . . . . . . 15 81 17. Security Considerations . . . . . . . . . . . . . . . . . . . 15 82 17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 15 83 17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 16 84 17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 16 85 17.4. PTOMAINE and GROW. . . . . . . . . . . . . . . . . . . . . 17 86 17.5. Internet Routing Registries (IRRs) . . . . . . . . . . . . 17 87 17.6. Regional Internet Registries (RIRs) and IRRs, 88 A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . 17 89 17.7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19 90 18. References. . . . . . . . . . . . . . . . . . . . . . . . . . 20 91 19. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 21 92 20. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 22 94 1. Introduction 96 The purpose of this memo is to document how the requirements for 97 advancing a routing protocol from Draft Standard to full Standard 98 have been satisfied by Border Gateway Protocol version 4 (BGP-4). 100 This report satisfies the requirement for "the second report", as 101 described in Section 6.0 of RFC 1264. In order to fulfill the 102 requirement, this report augments RFC 1773 and describes additional 103 knowledge and understanding gained in the time between when the 104 protocol was made a Draft Standard and when it was submitted for 105 Standard. 107 2. BGP-4 Overview 109 BGP is an inter-autonomous system routing protocol designed for 110 TCP/IP internets. The primary function of a BGP speaking system is 111 to exchange network reachability information with other BGP systems. 112 This network reachability information includes information on the 113 list of Autonomous Systems (ASs) that reachability information 114 traverses. This information is sufficient to construct a graph of AS 115 connectivity for this reachability from which routing loops may be 116 pruned and some policy decisions at the AS level may be enforced. 118 The initial version of the BGP protocol was published in RFC 1105. 119 Since then BGP Versions 2, 3, and 4 have been developed and are 120 specified in [RFC 1163], [RFC 1267], and [RFC 1771], respectively. 121 Changes since BGP-4 went to Draft Standard [RFC 1771] are listed in 122 Appendix N of [BGP4]. 124 2.1. A Border Gateway Protocol 126 The Initial Version of BGP [RFC 1105]. BGP version 2 is defined in 127 [RFC 1163]. BGP version 3 is defined in [RFC 1267]. BGP version 4 128 is defined in [RFC 1771] and [BGP4]. Appendices A, B, C, and D of 129 [BGP4] provide summaries of the changes between each iteration of the 130 BGP specification. 132 3. Management Information Base (MIB) 134 The BGP-4 Management Information Base (MIB) has been published [BGP- 135 MIB]. The MIB was updated from previous versions documented in [RFC 136 1657] and [RFC 1269], respectively. 138 Apart from a few system variables, the BGP MIB is broken into two 139 tables: the BGP Peer Table and the BGP Received Path Attribute Table. 141 The Peer Table reflects information about BGP peer connections, such 142 as their state and current activity. The Received Path Attribute 143 Table contains all attributes received from all peers before local 144 routing policy has been applied. The actual attributes used in 145 determining a route are a subset of the received attribute table. 147 4. Implementations 149 There are numerous independent interoperable implementations of BGP 150 currently available. Although the previous version of this report 151 provided an overview of the implementations currently used in the 152 operational Internet, at this time it has been suggested that a 153 separate BGP Implementation Report [BGP-IMPL] be generated. 155 It should be noted that implementation experience with Cisco's BGP-4 156 implementation was documented as part of [RFC 1656]. 158 For all additional implementation information please reference [BGP- 159 IMPL]. 161 5. Operational Experience 163 This section discusses operational experience with BGP and BGP-4. 165 BGP has been used in the production environment since 1989, BGP-4 166 since 1993. Production use of BGP includes utilization of all 167 significant features of the protocol. The present production 168 environment, where BGP is used as the inter-autonomous system routing 169 protocol, is highly heterogeneous. In terms of the link bandwidth it 170 varies from 56 Kbps to 10 Gbps. In terms of the actual routers that 171 run BGP, it ranges from a relatively slow performance general purpose 172 CPUs to very high performance RISC network processors, and includes 173 both special purpose routers and the general purpose workstations 174 running various UNIX derivatives and other operating systems. 176 In terms of the actual topologies it varies from very sparse to quite 177 dense. The requirement for full-mesh IBGP topologies has been 178 largely remedied by BGP Route Reflection, Autonomous System 179 Confederations for BGP, and perhaps some mix of the two. BGP Route 180 Reflection was initially defined in [RFC 1966] and subsequently 181 updated in [RFC 2796]. Autonomous System Confederations for BGP were 182 initially defined in [RFC 1965] and subsequently updated in [RFC 183 3065]. 185 At the time of this writing BGP-4 is used as an inter-autonomous 186 system routing protocol between all Internet-attached autonomous 187 systems, with nearly 15k active autonomous systems in the global 188 Internet routing table. 190 BGP is used both for the exchange of routing information between a 191 transit and a stub autonomous system, and for the exchange of routing 192 information between multiple transit autonomous systems. There is no 193 protocol distinction between sites historically considered 194 "backbones" versus "regional" or "edge" networks. 196 The full set of exterior routes that is carried by BGP is well over 197 120,000 aggregate entries, representing several times that number of 198 connected networks. The number of active paths in some service 199 provider core routers exceeds 2.5 million. Native AS_PATH lengths 200 are as long as 10 for some routes, and "padded" path lengths of 25 or 201 more ASs exist. 203 6. TCP Awareness 205 BGP employs TCP [RFC 793] as it's Transport Layer protocol. As such, 206 all characteristics inherent to TCP are inherited by BGP. 208 For example, due to TCP's behavior, bandwidth capabilities may not be 209 realized due to TCP's slow start algorithms, and slow-start restarts 210 of connections, etc.. 212 7. Metrics 214 This section discusses different metrics used within the BGP 215 protocol. BGP has a separate metric parameter for IBGP and EBGP. This 216 allows policy based metrics to overwrite the distance based metrics; 217 allowing each autonomous systems to define their independent policies 218 in Intra-AS as well as Inter-AS. BGP Multi Exit Discriminator (MED) 219 is used as a metric by EBGP peers while BGP Local Preference is used 220 by IBGP peers. 222 7.1. MULTI_EXIT_DISC (MED) 224 BGP version 4 re-defined the old INTER-AS metric as a MULTI_EXIT_ 225 DISC (MED). This value may be used in the tie-breaking process when 226 selecting a preferred path to a given address space, and provides BGP 227 speakers with the capability to convey to a peer AS the optimal entry 228 point into the local AS. 230 Although the MED was meant to only be used when comparing paths 231 received from different external peers in the same AS, many 232 implementations provide the capability to compare MEDs between 233 different ASs as well. 235 Though this may seem a fine idea for some configurations, care must 236 be taken when comparing MEDs between different autonomous systems. 237 BGP speakers often derive MED values by obtaining the IGP metric 238 associated with reaching a given BGP NEXT_HOP within the local AS. 239 This allows MEDs to reasonably reflect IGP topologies when 240 advertising routes to peers. While this is fine when comparing MEDs 241 between multiple paths learned from a single AS, it can result in 242 potentially bad decisions when comparing MEDs between different 243 automomous systems. This is most typically the case when the 244 autonomous systems use different mechanisms to derive IGP metrics, 245 BGP MEDs, or perhaps even use different IGP procotols with vastly 246 contrasting metric spaces. 248 Another MED deployment consideration involves the impact of 249 aggregation of BGP routing information on MEDs. Aggregates are often 250 generated from multiple locations in an AS in order to accommodate 251 stability, redundancy and other network design goals. When MEDs are 252 derived from IGP metrics associated with said aggregates the MED 253 value advertised to peers can result in very suboptimal routing. 255 The MED was purposely designed to be a "weak" metric that would only 256 be used late in the best-path decision process. The BGP working 257 group was concerned that any metric specified by a remote operator 258 would only affect routing in a local AS if no other preference was 259 specified. A paramount goal of the design of the MED was to ensure 260 that peers could not "shed" or "absorb" traffic for networks that 261 they advertise. 263 7.1.1. MEDs and Potatoes 265 In a situation where traffic flows between a pair of destinations, 266 each connected to two transit networks, each of the transit networks 267 has the choice of either sending the traffic to the closest peering 268 to other transit provider or passing traffic to the peering which 269 advertises the least cost through the other provider. The former 270 method is called "hot potatoe routing" because like a hot potatoe 271 held in bare hands, whoever has it tries to get rid of it quickly. 272 Hot potatoe routing is accomplished by not passing the EGBP learned 273 MED into IBGP. This minimizes transit traffic for the provider 274 routing the traffic. Far less common is "cold potatoe routing" where 275 the transit provider uses their own transit capacity to get the 276 traffic to the point in the adjacent transit provider advertised as 277 being closest to the destination. Cold potatoe routing is 278 accomplished by passing the EBGP learned MED into IBGP. 280 If one transit provider uses hot potatoe routing and another uses 281 cold potatoe, traffic between the two tends to be symetric. 282 Depending on the business relationships, if one provider has more 283 capacity or a significantly less congested transit network, then that 284 provider may use cold potatoe routing. An example of widespread use 285 of cold potatoe routing was the NSF funded NSFNET backbone and NSF 286 funded regional networks in the mid 1990s. 288 In some cases a provider may use hot potatoe routing for some 289 destinations for a given peer AS and cold potatoe routing for others. 290 An example of this is the different treatment of commercial and 291 research traffic in the NSFNET in the mid 1990s. Then again, this 292 might best be described as 'mashed potatoe routing', a term which 293 reflects the complexity of router configurations in use at the time. 295 7.1.2. Sending MEDs to BGP Peers 297 [BGP4] allows MEDs received from any EBGP peers by a BGP speaker to 298 be passed to its IBGP peers. Although advertising MEDs to IBGP peers 299 is not a required behavior, it is a common default. MEDs received 300 from EBGP peers by a BGP speaker MUST NOT be sent to other EBGP 301 peers. 303 Note that many implementations provide a mechanism to derive MED 304 values from IGP metrics in order to allow BGP MED information to 305 reflect the IGP topologies and metrics of the network when 306 propagating information to adjacent autonomous systems. 308 7.1.3. MED of Zero Versus No MED 310 An implementation MUST provide a mechanism that allows for MED to be 311 removed. Previously, implementations did not consider a missing MED 312 value to be the same as a MED of zero. No MED value should now be 313 equal to a value of zero. 315 Note that many implementations provide an mechanism to explicitly 316 define a missing MED value as "worst" or less preferable than zero or 317 larger values. 319 7.1.4. MEDs and Temporal Route Selection 321 Some implementations have hooks to apply temporal behavior in MED- 322 based best path selection. That is, all other things being equal up 323 to MED consideration, preference would be applied to the "oldest" 324 path, without preferring the lower MED value. The reasoning for this 325 is that "older" paths are presumably more stable, and thus more 326 preferable. However, temporal behavior in route selection results in 327 non-deterministic behavior, and as such, is often undesirable. 329 8. LOCAL_PREF 331 The LOCAL_PREF attribute was added so a network operator could easily 332 configure a policy that overrode the standard best path determination 333 mechanism without independently configuring local preference policy 334 on each router. 336 One shortcoming in the BGP-4 specification was a suggestion for a 337 default value of LOCAL-PREF to be assumed if none was provided. 338 Defaults of 0 or the maximum value each have range limitations, so a 339 common default would aid in the interoperation of multi-vendor 340 routers in the same AS (since LOCAL_PREF is a local administration 341 knob, there is no interoperability drawback across AS boundaries). 343 The LOCAL_PREF MUST be sent to IBGP Peers. The LOCAL_PREF Attribute 344 MUST NOT be sent to EBGP Peers. Although no default value for 345 LOCAL_PREF is defined, the common default value is 100. 347 Another area where more exploration is required is a method whereby 348 an originating AS may influence the best path selection process. For 349 example, a dual-connected site may select one AS as a primary transit 350 service provider and have one as a backup. 352 /---- transit B ----\ 353 end-customer transit A---- 354 /---- transit C ----\ 356 In a topology where the two transit service providers connect to a 357 third provider, the real decision is performed by the third provider 358 and there is no mechanism for indicating a preference should the 359 third provider wish to respect that preference. 361 A general purpose suggestion that has been brought up is the 362 possibility of carrying an optional vector corresponding to the AS- 363 PATH where each transit AS may indicate a preference value for a 364 given route. Cooperating ASs may then chose traffic based upon 365 comparison of "interesting" portions of this vector according to 366 routing policy. 368 While protecting a given ASs routing policy is of paramount concern, 369 avoiding extensive hand configuration of routing policies needs to be 370 examined more carefully in future BGP-like protocols. 372 9. Internal BGP In Large Autonomous Systems 374 While not strictly a protocol issue, one other concern has been 375 raised by network operators who need to maintain autonomous systems 376 with a large number of peers. Each speaker peering with an external 377 router is responsible for propagating reachability and path 378 information to all other transit and border routers within that AS. 379 This is typically done by establishing internal BGP connections to 380 all transit and border routers in the local AS. 382 Note that the number of BGP peers that can be fully meshed depends on 383 a number of factors, to include number of prefixes in the routing 384 system, stability of the system, and perhaps most importantly, 385 implementation ifficiency. As a result, although it's difficult to 386 define "a large number of peers", there is always some practical 387 limit. 389 In a large AS, this leads to a full mesh of TCP connections (n * 390 (n-1)) and some method of configuring and maintaining those 391 connections. BGP does not specify how this information is to be 392 propagated, so alternatives, such as injecting BGP routing 393 information into the local IGP have been attempted, though it turned 394 out to be a non-practical alternative (to say the least). 396 Several alternatives to a full mesh IBGP have been defined, to 397 include BGP Route Reflection [RFC 2796] and AS Confederations for BGP 398 [RFC 3065], in order to alleviate the the need for "full mesh" IBGP. 400 10. Internet Dynamics 402 As discussed in [BGP4-ANALYSIS], the driving force in CPU and 403 bandwidth utilization is the dynamic nature of routing in the 404 Internet. As the net has grown, the number of route changes per 405 second has increased. 407 We automatically get some level of damping when more specific NLRI is 408 aggregated into larger blocks, however, this isn't sufficient. In 409 Appendix F of [BGP4] are descriptions of damping techniques that 410 should be applied to advertisements. In future specifications of 411 BGP-like protocols, damping methods should be considered for 412 mandatory inclusion in compliant implementations. 414 BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap 415 Damping defines a mechanism to help reduce the amount of routing 416 information passed between BGP peers, and subsequently, the load on 417 these peers, without adversely affecting route convergence time for 418 relatively stable routes. 420 None of the current implementations of BGP Route Flap Damping store 421 route history by unique NRLI and AS Path although it is listed as 422 manditory in RFC 2439. A potential result of failure to consider 423 each AS Path separately is an overly aggressive suppression of 424 destinations in a densely meshed network, with the most severe 425 consequence being suppression of a destination after a single 426 failure. Because the top tier autonomous systems in the Internet are 427 densely meshed, these adverse consequences are observed. 429 Route changes are announced using BGP UPDATE messages. The greatest 430 overhead in advertising UPDATE messages happens whenever route 431 changes to be announced are inefficiently packed. As previously 432 discussed, announcing routing changes sharing common attributes in a 433 single BGP UPDATE message helps save considerable bandwidth and lower 434 processing overhead. 436 Persistent BGP errors may cause BGP peers to flap persistently if 437 peer dampening is not implemented. This would result in significant 438 CPU utilization. Implementors may find it useful to implement peer 439 dampening to avoid such persistent peer flapping [BGP4]. 441 11. BGP Routing Information Bases (RIBs) 443 [BGP4] states "Any local policy which results in routes being added 444 to an Adj-RIB-Out without also being added to the local BGP speaker's 445 forwarding table, is outside the scope of this document". 447 However, several well-known implementations do not confirm that Loc- 448 RIB entries were used to populate the forwarding table before 449 installing them in the Adj-RIB-Out. The most common occurrence of 450 this is when routes for a given prefix are presented by more than one 451 protocol and the preferences for the BGP learned route is lower than 452 that of another protocol. As such, the route learned via the other 453 protocol is used to populate the forwarding table. 455 It may be desirable for an implementation to provide a knob that 456 permits advertisement of "inactive" BGP routes. 458 It may be also desirable for an implementation to provide a knob that 459 allows a BGP speaker to advertise BGP routes that were not selected 460 by decision process. 462 12. Update Packing 464 Multiple unfeasible routes can be advertised in a single BGP Update 465 message. In addition, one or more feasible routes can be advertised 466 in a single Update message so long as all prefixes share a common 467 attribute set. 469 The BGP4 protocol permits advertisement of multiple prefixes with a 470 common set of path attributes to be advertised in a single update 471 message, this is commonly referred to as "update packing". When 472 possible, update packing is recommended as it provides a mechanism 473 for more efficient behavior in a number of areas, to include: 475 o Reduction in system overhead due to generation or receipt of 476 fewer Update messages. 478 o Reduction in network overhead as a result of less packets 479 and lower bandwidth consumption. 481 o Allows you to process path attributes and look for matching 482 sets in your AS_PATH database (if you have one) less 483 frequently. Consistent ordering of the path attributes 484 allows for ease of matching in the database as you don't have 485 different representations of the same data. 487 The BGP protocol suggests that withdrawal information should be 488 packed in the begining of Update message, followed by information 489 about more or less specific reachable routes in a single UPDATE 490 message. This helps alleviate excessive route flapping in BGP. 492 13. Limit Rate Updates 494 The BGP protocol defines different mechanisms to rate limit Update 495 advertisement. The BGP protocol defines MinRouteAdvertisementInterval 496 parameter that determines the minimum time that must be elapse 497 between the advertisement of routes to a particular destination from 498 a single BGP speaker. This value is set on a per BGP peer basis. 500 Due to the fact that BGP relies on TCP as the Transport protocol, TCP 501 can prevent transmission of data due to empty windows. As a result, 502 multiple Updates may be spaced closer together than orginally queued. 503 Although this is not a common occurrence, implementations should be 504 aware of this. 506 13.1. Consideration of TCP Characteristics 508 If a TCP receiver is processing input more slowly than the sender or 509 if the TCP connection rate is the limiting factor, a form of 510 backpressure is observed by the TCP sending application. When the 511 TCP buffer fills, the sending application will either block on the 512 write or receive an error on the write. Common errors in either 513 early implementations or an occasional naive new implementation are 514 to either set options to block on the write or set options for non- 515 blocking writes and then treat the errors due to a full buffer as 516 fatal. 518 Having recognized that full write buffers are to be expected 519 additional implementation pitfalls exist. The application should not 520 attempt to store the TCP stream within the application itself. If 521 the receiver or the TCP connection is persistently slow, then the 522 buffer can grow until memory is exhausted. A BGP implementation must 523 send changes to all peers for which the TCP connection is not blocked 524 and must remember to send those changes to the remaining peers when 525 the connection becomes unblocked. 527 If the preferred route for a given NLRI changes multiple times while 528 writes to one or more peers is blocked, only the most recent best 529 route needs to be sent. In this way BGP is work conserving. In 530 times of extremely high route change, a higher volume of route change 531 is sent to those peers which are able to process it more quickly and 532 a lower volume of route change is sent to those peers not able to 533 process the changes as quickly. 535 For implentations which handle differing peer capacity to absorb 536 route change well, if the majority of route change is contributed by 537 a subset of unstable NRLI, the only impact on relatively stable NRLI 538 which make an isolated route change is a slower convergence for which 539 convergence time remains bounded regardless of the amount of 540 instability. 542 14. Ordering of Path Attributes 544 The BGP protocol suggests that BGP speakers sending multiple prefixes 545 per an UPDATE message should sort and order path attributes according 546 to Type Codes. This would help their peers to quickly identify sets 547 of attributes from different update messages which are semantically 548 different. 550 Implementers may find it useful to order path attributes according to 551 Type Code so that sets of attributes with identical semantics can be 552 more quickly identified. 554 15. AS_SET Sorting 556 AS_SETs are commonly used in BGP route aggregation. They reduce the 557 size of AS_PATH information by listing AS numbers only once 558 regardless of any number of times it might appear in process of 559 aggregation. AS_SETs are usually sorted in increasing order to 560 facilitate efficient lookups of AS numbers within them. This 561 optimization is entirely optional. 563 16. Control over Version Negotiation 565 Because pre-BGP-4 route aggregation can't be supported by earlier 566 version of BGP, an implementation that supports versions in addition 567 to BGP-4 should provide the version support on a per-peer basis. 569 17. Security Considerations 571 BGP a provides flexible and extendable mechanism for authentication 572 and security. The mechanism allows to support schemes with various 573 degree of complexity. BGP sessions are authenticated based on the IP 574 address of a peer. In addition, all BGP sessions are authenticated 575 based on the autonomous system number advertised by a peer. 577 Since BGP runs over TCP and IP, BGP's authentication scheme may be 578 augmented by any authentication or security mechanism provided by 579 either TCP or IP. 581 17.1. TCP MD5 Signature Option 583 [RFC 2385] defines a way in which the TCP MD5 signature option can be 584 used to validate information transmitted between two peers. This 585 method prevents any third party from injecting information (e.g., a 586 TCP Reset) into the datastream, or modifying the routing information 587 carried between two BGP peers. 589 TCP MD5 is not ubiquitously deployed at the moment, especially in 590 inter- domain scenarios, largely because of key distribution issues. 591 Most key distribution mechanisms are considered to be too "heavy" at 592 this point. 594 17.2. BGP Over IPSEC 596 BGP can run over IPSEC, either in a tunnel, or in transport mode, 597 where the TCP portion of the IP packet is encrypted. This not only 598 prevents random insertion of information into the data stream between 599 two BGP peers, it also prevents an attacker from learning the data 600 which is being exchanged between the peers. 602 IPSEC does, however, offer several options for exchanging session 603 keys, which may be useful on inter-domain configurations. These 604 options are being explored in many deployments, although no 605 definitive solution has been reached on the issue of key exchange for 606 BGP in IPSEC. 608 It should be noted that since BGP runs over TCP and IP, BGP is 609 vulnerable to the same denial of service or authentication attacks 610 that are present in any other TCP based protocol. 612 17.3. Miscellaneous 614 Another issue any routing protocol faces is providing evidence of the 615 validity and authority of the routing information carried within the 616 routing system. This is currently the focus of several efforts at 617 the moment, including efforts to define the threats which can be used 618 against this routing information in BGP [draft-murphy, attack tree], 619 and efforts at developing a means to provide validation and authority 620 for routing information carried within BGP [SBGP] [soBGP]. 622 In addition, the Routing Protocol Security Requirements (RPSEC) 623 working group has been chartered within the Routing Area of the IETF 624 in order to discuss and assist in addressing issues surrounding 625 routing protocol security. It is the intent that this work within 626 RPSEC will result in feedback to BGPv4 and future enhancements to the 627 protocol where appropriate. 629 17.4. PTOMAINE and GROW 631 The Prefix Taxonomy (PTOMAINE) working group, recently replaced by 632 the Global Routing Operations (GROW) working group, is chartered to 633 consider and measure the problem of routing table growth, the effects 634 of the interactions between interior and exterior routing protocols, 635 and the effect of address allocation policies and practices on the 636 global routing system. Finally, where appropriate, GROW will also 637 document the operational aspects of measurement, policy, security and 638 VPN infrastructures. 640 One such item GROW is currently studying is the effects of route 641 aggregation and the inability to aggregate over multiple provider 642 boundaries due to inadequate provider coordination. 644 It is the intent that this work within GROW will result in feedback 645 to BGPv4 and future enhancements to the protocol as necessary. 647 17.5. Internet Routing Registries (IRRs) 649 Many organizations register their routing policy and prefix 650 origination in the various distributed databases of the Internet 651 Routing Registry. These databases provide access to the information 652 using the RPSL language as defined in [RFC 2622]. While registered 653 information may be maintained and correct for certain providers, the 654 lack of timely or correct data in the various IRR databases has 655 prevented wide-spread use of this resource. 657 17.6. Regional Internet Registries (RIRs) and IRRs, A Bit of History 659 The NSFNET program used EGP and then BGP to provide external routing 660 information. It was the NSF policy of offering differing pricing and 661 providing a different level of support to the Research and Education 662 (RE) networks and the Commercial (CO) networks that led to BGP's 663 initial policy requirements. CO networks were not able to use the 664 NSFNET backbone to reach other CO networks, in addition to being 665 charged more. The rationelle was that commercial users of the NSFNET 666 with business with research entities should subsidize the RE 667 community. Recognition that the Internet was evolving away from a 668 hierarchical network to a mesh of peers led to changes from EGP and 669 BGP-1 that eliminated any assumptions of hierarchy. 671 Enforcement of NSF policy was accomplished through maintenance of the 672 NSF Policy Routing Database (PRDB). The PRDB not only contained each 673 networks designation as CO or RE, but also contained a list of the 674 preferred exit points to the NSFNET to reach each network. This was 675 the basis for setting what would later be called BGP LOCAL_PREF on 676 the NSFNET. Tools provided with the PRDB generated complete router 677 configurations for the NSFNET. 679 Use of the PRDB had the fortunate consequence of greatly improving 680 reliability of the NSFNET relative to peer networks of the time and 681 offering more optimal routing for those networks sufficiently 682 knowledgeable and willing to keep their entries current. 684 With the decommission of the NSFNET Backbone Network Service in 1995, 685 it was recognized that the PRDB should be made less single provider 686 centric and its legacy contents plus any further updates made 687 available to any provider willing to make use of it. The European 688 networking community had long seen the PRDB as too US centric. 689 Through Reseaux IP Europeens (RIPE) the Europeans had created an open 690 format in RIPE-181 and had been maintaining an open database used for 691 address and AS registry more than policy. The initial conversion of 692 the PRDB was to RIPE-181 format and tools were converted to make use 693 of this format. The collection of databases was termed the Internet 694 Routing Registry, with the RIPE database and US NSF funded Routing 695 Arbitrator (RA) being the inital components of the IRR. 697 A need to extend RIPE-181 was recognized and RIPE agreed to allow the 698 extensions to be defined within the IETF in the RPS WG. The result 699 was the RPSL language. Other work products of the RPS WG provided an 700 authentication framework and means to widely distribute the database 701 in a controlled manner and synchronize the many repositories. Freely 702 available tools were provided primarily by RIPE, Merit, and ISI, the 703 most comprehensive set from ISI. The efforts of the IRR participants 704 has been severely hampered by providers unwilling to keep information 705 in the IRR up to date. The larger of these providers have been 706 vocal, claiming that the database entry, simple as it may be, are an 707 administrative burden and some acknowledge that doing so provides a 708 advantage to competitors that use the IRR. The result has been an 709 erosion of the usefulness of the IRR and an increase in vulnerability 710 of the Internet to routing based attack or accidental injection of 711 faulty routing information. 713 There have been numerous cases of accidental disruption of Internet 714 routing which were avoided by providers using the IRR but highly 715 detrimental to non-users. As filters have had to be relaxed due to 716 the erosion of the IRR to less complete coverage, these types of 717 disruptions have continued to occur very infrequently, but have had 718 increasingly widespread impact. 720 17.7. Acknowledgements 722 We would like to thank Paul Traina and Yakov Rekhter for authoring 723 previous versions of this document and providing valuable input on 724 this update as well. We would also like to explicitly acknowledge 725 Curtis Villamizar for providing both text and thorough reviews. 726 Thanks to Russ White, Jeffrey Haas, Sean Mentzer, Mitchell Erblich 727 and Jude Ballard for supplying their usual keen eye. 729 Finally, we'd like to think the IDR WG for general and specific input 730 that contributed to this document. 732 18. References 734 [RFC 793] Postel, J., "Transmission Control Protocol", RFC 793, 735 September 1981. 737 [RFC 1105] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol 738 BGP", RFC 1105, June 1989. 740 [RFC 1163] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol 741 BGP", RFC 1105, June 1990. 743 [RFC 1264] Hinden, R., "Internet Routing Protocol Standardization 744 Criteria", RFC 1264, October 1991. 746 [RFC 1267] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol 3 747 (BGP-3)", RFC 1105, October 1991. 749 [RFC 1269] Willis, S., and Burruss, J., "Definitions of Managed 750 Objects for the Border Gateway Protocol (Version 3)", 751 RFC 1269, October 1991. 753 [RFC 1519] Fuller, V., Li. T., Yu J., and K. Varadhan, "Classless 754 Inter-Domain Routing (CIDR): an Address Assignment and 755 Aggregation Strategy", RFC 1519, September 1993. 757 [RFC 1656] Traina, P., "BGP-4 Protocol Document Roadmap and 758 Implementation Experience", RFC 1656, July 1994. 760 [RFC 1657] Willis, S., Burruss, J., Chu, J., " Definitions of 761 Managed Objects for the Fourth Version of the Border 762 Gateway Protocol (BGP-4) using SMIv2", RFC 1657, July 763 1994. 765 [RFC 1771] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 766 (BGP-4)", RFC 1771, March 1995. 768 [RFC 1772] Rekhter, Y., and P. Gross, Editors, "Application of the 769 Border Gateway Protocol in the Internet", RFC 1772, March 770 1995. 772 [RFC 1773] Traina, P., "Experience with the BGP-4 protocol", RFC 773 1773, March 1995. 775 [RFC 1966] Bates, T., Chandra, R., "BGP Route Reflection: An 776 alternative to full mesh IBGP", RFC 1966, June 1996. 778 [RFC 2385] Heffernan, A., "Protection of BGP Sessions via the TCP 779 MD5 Signature Option", RFC 2385, August 1998. 781 [RFC 2439] Villamizar, C. and Chandra, R., "BGP Route Flap Damping", 782 RFC 2439, November 1998. 784 [RFC 2622] C. Alaettinoglu et al., "Routing Policy Specification 785 Language", RFC 2622, June 1999. 787 [RFC 2796] Bates, T., Chandra, R., and Chen, E, "Route Reflection - 788 An Alternative to Full Mesh IBGP", RFC 2796, April 2000. 790 [RFC 3065] Traina, P., McPherson, D., and Scudder, J, "Autonomous 791 System Confederations for BGP", RFC 3065, Febuary 2001. 793 [RFC 3345] McPherson, D., Gill, V., Walton, D., and Retana, A, "BGP 794 Persistent Route Oscillation Condition", RFC 3345, 795 August 2002. 797 [BGP4-ANALYSIS] Work in Progress. 799 [BGP4-IMPL] Work in Progress. 801 [BGP4] Rekhter, Y., T. Li., and Hares. S, Editors, "A Border 802 Gateway Protocol 4 (BGP-4)", BGP Draft, Work in Progress. 804 [SBGP] 806 [soBGP] 808 19. Authors' Addresses 810 Danny McPherson 811 Arbor Networks 812 Email: danny@arbor.net 814 Keyur Patel 815 Cisco Systems 816 Email: keyupate@cisco.com 818 20. Full Copyright Statement 820 Copyright (C) The Internet Society (2003). All Rights Reserved. 822 This document and translations of it may be copied and furnished to 823 others, and derivative works that comment on or otherwise explain it 824 or assist in its implementation may be prepared, copied, published 825 and distributed, in whole or in part, without restriction of any 826 kind, provided that the above copyright notice and this paragraph are 827 included on all such copies and derivative works. However, this 828 document itself may not be modified in any way, such as by removing 829 the copyright notice or references to the Internet Society or other 830 Internet organizations, except as needed for the purpose of 831 developing Internet standards in which case the procedures for 832 copyrights defined in the Internet Standards process must be 833 followed, or as required to translate it into languages other than 834 English. 836 The limited permissions granted above are perpetual and will not be 837 revoked by the Internet Society or its successors or assigns. 839 This document and the information contained herein is provided on an 840 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 841 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 842 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 843 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 844 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.