idnits 2.17.1 draft-ietf-idr-bgp4-experience-protocol-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 875. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 852. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 859. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 865. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 882), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 39. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 303: '...by a BGP speaker SHOULD NOT be sent to...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2004) is 7156 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BGP-MIB' is mentioned on line 133, but not defined == Missing Reference: 'BGP-IMPL' is mentioned on line 157, but not defined == Missing Reference: 'RFC 1965' is mentioned on line 181, but not defined ** Obsolete undefined reference: RFC 1965 (Obsoleted by RFC 3065) == Unused Reference: 'RFC 1519' is defined on line 756, but no explicit reference was found in the text == Unused Reference: 'RFC 3345' is defined on line 775, but no explicit reference was found in the text == Unused Reference: 'BGP4-IMPL' is defined on line 782, but no explicit reference was found in the text == Unused Reference: 'RFC 1657' is defined on line 788, but no explicit reference was found in the text == Unused Reference: 'RFC 1264' is defined on line 808, but no explicit reference was found in the text == Unused Reference: 'RFC 1772' is defined on line 824, but no explicit reference was found in the text == Unused Reference: 'RFC 1773' is defined on line 828, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1519 (Obsoleted by RFC 4632) ** Obsolete normative reference: RFC 1966 (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 2385 (Obsoleted by RFC 5925) ** Obsolete normative reference: RFC 2796 (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 3065 (Obsoleted by RFC 5065) ** Downref: Normative reference to an Informational RFC: RFC 3345 -- Possible downref: Non-RFC (?) normative reference: ref. 'BGP4-ANALYSIS' -- Possible downref: Non-RFC (?) normative reference: ref. 'BGP4-IMPL' -- Possible downref: Non-RFC (?) normative reference: ref. 'BGP4' ** Obsolete normative reference: RFC 1657 (Obsoleted by RFC 4273) -- Possible downref: Non-RFC (?) normative reference: ref. 'SBGP' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1105 (Obsoleted by RFC 1163) -- Duplicate reference: RFC1105, mentioned in 'RFC 1163', was also mentioned in 'RFC 1105'. -- Obsolete informational reference (is this intentional?): RFC 1105 (ref. 'RFC 1163') (Obsoleted by RFC 1163) -- Obsolete informational reference (is this intentional?): RFC 1264 (Obsoleted by RFC 4794) -- Duplicate reference: RFC1105, mentioned in 'RFC 1267', was also mentioned in 'RFC 1163'. -- Obsolete informational reference (is this intentional?): RFC 1105 (ref. 'RFC 1267') (Obsoleted by RFC 1163) -- Obsolete informational reference (is this intentional?): RFC 1269 (Obsoleted by RFC 4273) -- Obsolete informational reference (is this intentional?): RFC 1656 (Obsoleted by RFC 1773) -- Obsolete informational reference (is this intentional?): RFC 1771 (Obsoleted by RFC 4271) Summary: 17 errors (**), 0 flaws (~~), 12 warnings (==), 20 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Danny McPherson 2 Arbor Networks 3 Keyur Patel 4 Cisco Systems 5 Category Informational 6 Expires: March 2005 September 2004 8 Experience with the BGP-4 Protocol 9 11 Status of this Document 13 By submitting this Internet-Draft, I certify that any applicable 14 patent or other IPR claims of which I am aware have been disclosed, 15 and any of which I become aware will be disclosed, in accordance with 16 RFC 3668. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This document is an individual submission. Comments are solicited and 35 should be addressed to the author(s). 37 Copyright Notice 39 Copyright (C) The Internet Society (2004). All Rights Reserved. 41 Abstract 43 The purpose of this memo is to document how the requirements for 44 advancing a routing protocol from Draft Standard to full Standard 45 have been satisfied by Border Gateway Protocol version 4 (BGP-4). 47 This report satisfies the requirement for "the second report", as 48 described in Section 6.0 of RFC 1264. In order to fulfill the 49 requirement, this report augments RFC 1773 and describes additional 50 knowledge and understanding gained in the time between when the 51 protocol was made a Draft Standard and when it was submitted for 52 Standard. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 58 2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4 59 3. Management Information Base (MIB). . . . . . . . . . . . . . . 5 60 4. Implementation Information . . . . . . . . . . . . . . . . . . 5 61 5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5 62 6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6 63 7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7 65 7.1.1. MEDs and Potatoes. . . . . . . . . . . . . . . . . . . . 8 66 7.1.2. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 67 7.1.3. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 9 68 7.1.4. MEDs and Temporal Route Selection. . . . . . . . . . . . 9 69 8. Local Preference . . . . . . . . . . . . . . . . . . . . . . . 9 70 9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10 71 10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 11 72 11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 12 73 12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 12 74 13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 13 75 13.1. Consideration of TCP Characteristics . . . . . . . . . . . 14 76 14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 14 77 15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 15 78 16. Control over Version Negotiation. . . . . . . . . . . . . . . 15 79 17. Security Considerations . . . . . . . . . . . . . . . . . . . 15 80 17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 16 81 17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 16 82 17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 17 83 18. PTOMAINE and GROW . . . . . . . . . . . . . . . . . . . . . . 17 84 19. Internet Routing Registries (IRRs). . . . . . . . . . . . . . 17 85 20. Regional Internet Registries (RIRs) and IRRs, A 86 Bit of History. . . . . . . . . . . . . . . . . . . . . . . . . . 18 87 21. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 19 88 22. References. . . . . . . . . . . . . . . . . . . . . . . . . . 20 89 22.1. Normative References . . . . . . . . . . . . . . . . . . . 20 90 22.2. Informative References . . . . . . . . . . . . . . . . . . 21 91 23. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 21 93 1. Introduction 95 The purpose of this memo is to document how the requirements for 96 advancing a routing protocol from Draft Standard to full Standard 97 have been satisfied by Border Gateway Protocol version 4 (BGP-4). 99 This report satisfies the requirement for "the second report", as 100 described in Section 6.0 of RFC 1264. In order to fulfill the 101 requirement, this report augments RFC 1773 and describes additional 102 knowledge and understanding gained in the time between when the 103 protocol was made a Draft Standard and when it was submitted for 104 Standard. 106 2. BGP-4 Overview 108 BGP is an inter-autonomous system routing protocol designed for 109 TCP/IP internets. The primary function of a BGP speaking system is 110 to exchange network reachability information with other BGP systems. 111 This network reachability information includes information on the 112 list of Autonomous Systems (ASs) that reachability information 113 traverses. This information is sufficient to construct a graph of AS 114 connectivity for this reachability from which routing loops may be 115 pruned and some policy decisions at the AS level may be enforced. 117 The initial version of the BGP protocol was published in RFC 1105. 118 Since then BGP Versions 2, 3, and 4 have been developed and are 119 specified in [RFC 1163], [RFC 1267], and [RFC 1771], respectively. 120 Changes since BGP-4 went to Draft Standard [RFC 1771] are listed in 121 Appendix N of [BGP4]. 123 2.1. A Border Gateway Protocol 125 The Initial Version of BGP protocol was published in [RFC 1105]. BGP 126 version 2 is defined in [RFC 1163]. BGP version 3 is defined in [RFC 127 1267]. BGP version 4 is defined in [RFC 1771] and [BGP4]. 128 Appendices A, B, C, and D of [BGP4] provide summaries of the changes 129 between each iteration of the BGP specification. 131 3. Management Information Base (MIB) 133 The BGP-4 Management Information Base (MIB) has been published [BGP- 134 MIB]. The MIB was updated from previous versions documented in [RFC 135 1657] and [RFC 1269], respectively. 137 Apart from a few system variables, the BGP MIB is broken into two 138 tables: the BGP Peer Table and the BGP Received Path Attribute Table. 140 The Peer Table reflects information about BGP peer connections, such 141 as their state and current activity. The Received Path Attribute 142 Table contains all attributes received from all peers before local 143 routing policy has been applied. The actual attributes used in 144 determining a route are a subset of the received attribute table. 146 4. Implementation Information 148 There are numerous independent interoperable implementations of BGP 149 currently available. Although the previous version of this report 150 provided an overview of the implementations currently used in the 151 operational Internet, at this time it has been suggested that a 152 separate BGP Implementation Report [BGP-IMPL] be generated. 154 It should be noted that implementation experience with Cisco's BGP-4 155 implementation was documented as part of [RFC 1656]. 157 For all additional implementation information please reference [BGP- 158 IMPL]. 160 5. Operational Experience 162 This section discusses operational experience with BGP and BGP-4. 164 BGP has been used in the production environment since 1989, BGP-4 165 since 1993. Production use of BGP includes utilization of all 166 significant features of the protocol. The present production 167 environment, where BGP is used as the inter-autonomous system routing 168 protocol, is highly heterogeneous. In terms of the link bandwidth it 169 varies from 56 Kbps to 10 Gbps. In terms of the actual routers that 170 run BGP, it ranges from a relatively slow performance general purpose 171 CPUs to very high performance RISC network processors, and includes 172 both special purpose routers and the general purpose workstations 173 running various UNIX derivatives and other operating systems. 175 In terms of the actual topologies it varies from very sparse to quite 176 dense. The requirement for full-mesh IBGP topologies has been 177 largely remedied by BGP Route Reflection, Autonomous System 178 Confederations for BGP, and often some mix of the two. BGP Route 179 Reflection was initially defined in [RFC 1966] and subsequently 180 updated in [RFC 2796]. Autonomous System Confederations for BGP were 181 initially defined in [RFC 1965] and subsequently updated in [RFC 182 3065]. 184 At the time of this writing BGP-4 is used as an inter-autonomous 185 system routing protocol between all Internet-attached autonomous 186 systems, with nearly 15k active autonomous systems in the global 187 Internet routing table. 189 BGP is used both for the exchange of routing information between a 190 transit and a stub autonomous system, and for the exchange of routing 191 information between multiple transit autonomous systems. There is no 192 protocol distinction between sites historically considered 193 "backbones" versus "regional" or "edge" networks. 195 The full set of exterior routes that is carried by BGP is well over 196 134,000 aggregate entries, representing several times that number of 197 connected networks. The number of active paths in some service 198 provider core routers exceeds 2.5 million. Native AS path lengths 199 are as long as 10 for some routes, and "padded" path lengths of 25 or 200 more autonomous systems exist. 202 6. TCP Awareness 204 BGP employs TCP [RFC 793] as it's Transport Layer protocol. As such, 205 all characteristics inherent to TCP are inherited by BGP. 207 For example, due to TCP's behavior, bandwidth capabilities may not be 208 realized due to TCP's slow start algorithms, and slow-start restarts 209 of connections, etc.. 211 7. Metrics 213 This section discusses different metrics used within the BGP 214 protocol. BGP has a separate metric parameter for IBGP and EBGP. This 215 allows policy based metrics to overwrite the distance based metrics; 216 allowing each autonomous systems to define their independent policies 217 in Intra-AS as well as Inter-AS. BGP Multi Exit Discriminator (MED) 218 is used as a metric by EBGP peers (i.e., inter-domain) while Local 219 Preference (LOCAL_PREF) is used by IBGP peers (i.e., intra-domain). 221 7.1. MULTI_EXIT_DISC (MED) 223 BGP version 4 re-defined the old INTER-AS metric as a MULTI_EXIT_ 224 DISC (MED). This value may be used in the tie-breaking process when 225 selecting a preferred path to a given address space, and provides BGP 226 speakers with the capability to convey to a peer AS the optimal entry 227 point into the local AS. 229 Although the MED was meant to only be used when comparing paths 230 received from different external peers in the same AS, many 231 implementations provide the capability to compare MEDs between 232 different autonomous systems as well. 234 Though this may seem a fine idea for some configurations, care must 235 be taken when comparing MEDs between different autonomous systems. 236 BGP speakers often derive MED values by obtaining the IGP metric 237 associated with reaching a given BGP NEXT_HOP within the local AS. 238 This allows MEDs to reasonably reflect IGP topologies when 239 advertising routes to peers. While this is fine when comparing MEDs 240 between multiple paths learned from a single adjacent AS, it can 241 result in potentially bad decisions when comparing MEDs between 242 different automomous systems. This is most typically the case when 243 the autonomous systems use different mechanisms to derive IGP 244 metrics, BGP MEDs, or perhaps even use different IGP procotols with 245 vastly contrasting metric spaces. 247 Another MED deployment consideration involves the impact of 248 aggregation of BGP routing information on MEDs. Aggregates are often 249 generated from multiple locations in an AS in order to accommodate 250 stability, redundancy and other network design goals. When MEDs are 251 derived from IGP metrics associated with said aggregates the MED 252 value advertised to peers can result in very suboptimal routing. 254 The MED was purposely designed to be a "weak" metric that would only 255 be used late in the best-path decision process. The BGP working 256 group was concerned that any metric specified by a remote operator 257 would only affect routing in a local AS if no other preference was 258 specified. A paramount goal of the design of the MED was to ensure 259 that peers could not "shed" or "absorb" traffic for networks that 260 they advertise. 262 7.1.1. MEDs and Potatoes 264 In a situation where traffic flows between a pair of destinations, 265 each connected to two transit networks, each of the transit networks 266 has the choice of either sending the traffic to the closest peering 267 to other transit provider or passing traffic to the peering which 268 advertises the least cost through the other provider. The former 269 method is called "hot potato routing" because like a hot potato held 270 in bare hands, whoever has it tries to get rid of it quickly. Hot 271 potato routing is accomplished by not passing the EGBP learned MED 272 into IBGP. This minimizes transit traffic for the provider routing 273 the traffic. Far less common is "cold potato routing" where the 274 transit provider uses their own transit capacity to get the traffic 275 to the point in the adjacent transit provider advertised as being 276 closest to the destination. Cold potato routing is accomplished by 277 passing the EBGP learned MED into IBGP. 279 If one transit provider uses hot potato routing and another uses cold 280 potato, traffic between the two tends to be symetric. Depending on 281 the business relationships, if one provider has more capacity or a 282 significantly less congested transit network, then that provider may 283 use cold potato routing. An example of widespread use of cold potato 284 routing was the NSF funded NSFNET backbone and NSF funded regional 285 networks in the mid 1990s. 287 In some cases a provider may use hot potato routing for some 288 destinations for a given peer AS and cold potato routing for others. 289 An example of this is the different treatment of commercial and 290 research traffic in the NSFNET in the mid 1990s. Then again, this 291 might best be described as 'mashed potato routing', a term which 292 reflects the complexity of router configurations in use at the time. 294 Seemingly more intuitive references that fall outside the vegetable 295 kingdom refer to cold potato routing as "best exit routing", and hot 296 potato routing as "closest exit routing". 298 7.1.2. Sending MEDs to BGP Peers 300 [BGP4] allows MEDs received from any EBGP peers by a BGP speaker to 301 be passed to its IBGP peers. Although advertising MEDs to IBGP peers 302 is not a required behavior, it is a common default. MEDs received 303 from EBGP peers by a BGP speaker SHOULD NOT be sent to other EBGP 304 peers. 306 Note that many implementations provide a mechanism to derive MED 307 values from IGP metrics in order to allow BGP MED information to 308 reflect the IGP topologies and metrics of the network when 309 propagating information to adjacent autonomous systems. 311 7.1.3. MED of Zero Versus No MED 313 [BGP4] requires that an implementation must provide a mechanism that 314 allows for MED to be removed. Previously, implementations did not 315 consider a missing MED value to be the same as a MED of zero. [BGP4] 316 now requires that no MED value be equal to a value of zero. 318 Note that many implementations provide a mechanism to explicitly 319 define a missing MED value as "worst" or less preferable than zero or 320 larger values. 322 7.1.4. MEDs and Temporal Route Selection 324 Some implementations have hooks to apply temporal behavior in MED- 325 based best path selection. That is, all other things being equal up 326 to MED consideration, preference would be applied to the "oldest" 327 path, without preferring the lower MED value. The reasoning for this 328 is that "older" paths are presumably more stable, and thus more 329 preferable. However, temporal behavior in route selection results in 330 non-deterministic behavior, and as such, may often be undesirable. 332 8. Local Preference 334 The LOCAL_PREF attribute was added so a network operator could easily 335 configure a policy that overrode the standard best path determination 336 mechanism without independently configuring local preference policy 337 on each router. 339 One shortcoming in the BGP-4 specification was a suggestion for a 340 default value of LOCAL_PREF to be assumed if none was provided. 341 Defaults of 0 or the maximum value each have range limitations, so a 342 common default would aid in the interoperation of multi-vendor 343 routers in the same AS (since LOCAL_PREF is a local administration 344 attribute, there is no interoperability drawback across AS 345 boundaries). 347 [BGP4] requires that LOCAL_PREF be sent to IBGP Peers and must not be 348 sent to EBGP Peers. Although no default value for LOCAL_PREF is 349 defined, the common default value is 100. 351 Another area where more exploration is required is a method whereby 352 an originating AS may influence the best path selection process. For 353 example, a dual-connected site may select one AS as a primary transit 354 service provider and have one as a backup. 356 /---- transit B ----\ 357 end-customer transit A---- 358 /---- transit C ----\ 360 In a topology where the two transit service providers connect to a 361 third provider, the real decision is performed by the third provider 362 and there is no mechanism for indicating a preference should the 363 third provider wish to respect that preference. 365 A general purpose suggestion that has been brought up is the 366 possibility of carrying an optional vector corresponding to the AS_ 367 PATH where each transit AS may indicate a preference value for a 368 given route. Cooperating autonomous systems may then chose traffic 369 based upon comparison of "interesting" portions of this vector 370 according to routing policy. 372 While protecting a given autonoumous systems routing policy is of 373 paramount concern, avoiding extensive hand configuration of routing 374 policies needs to be examined more carefully in future BGP-like 375 protocols. 377 9. Internal BGP In Large Autonomous Systems 379 While not strictly a protocol issue, one other concern has been 380 raised by network operators who need to maintain autonomous systems 381 with a large number of peers. Each speaker peering with an external 382 router is responsible for propagating reachability and path 383 information to all other transit and border routers within that AS. 384 This is typically done by establishing internal BGP connections to 385 all transit and border routers in the local AS. 387 Note that the number of BGP peers that can be fully meshed depends on 388 a number of factors, to include number of prefixes in the routing 389 system, number of unique path, stability of the system, and perhaps 390 most importantly, implementation efficiency. As a result, although 391 it's difficult to define "a large number of peers", there is always 392 some practical limit. 394 In a large AS, this leads to a full mesh of TCP connections (n * 395 (n-1)) and some method of configuring and maintaining those 396 connections. BGP does not specify how this information is to be 397 propagated, so alternatives, such as injecting BGP routing 398 information into the local IGP have been attempted, though it turned 399 out to be a non-practical alternative (to say the least). 401 Several alternatives to a full mesh IBGP have been defined, to 402 include BGP Route Reflection [RFC 2796] and AS Confederations for BGP 403 [RFC 3065], in order to alleviate the the need for "full mesh" IBGP. 405 10. Internet Dynamics 407 As discussed in [BGP4-ANALYSIS], the driving force in CPU and 408 bandwidth utilization is the dynamic nature of routing in the 409 Internet. As the Internet has grown, the frequency of route changes 410 per second has increased. 412 We automatically get some level of damping when more specific NLRI is 413 aggregated into larger blocks, however, this isn't sufficient. In 414 Appendix F of [BGP4] are descriptions of damping techniques that 415 should be applied to advertisements. In future specifications of 416 BGP-like protocols, damping methods should be considered for 417 mandatory inclusion in compliant implementations. 419 BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap 420 Damping defines a mechanism to help reduce the amount of routing 421 information passed between BGP peers, and subsequently, the load on 422 these peers, without adversely affecting route convergence time for 423 relatively stable routes. 425 None of the current implementations of BGP Route Flap Damping store 426 route history by unique NRLI and AS Path although it is listed as 427 mandatory in RFC 2439. A potential result of failure to consider 428 each AS Path separately is an overly aggressive suppression of 429 destinations in a densely meshed network, with the most severe 430 consequence being suppression of a destination after a single 431 failure. Because the top tier autonomous systems in the Internet are 432 densely meshed, these adverse consequences are observed. 434 Route changes are announced using BGP UPDATE messages. The greatest 435 overhead in advertising UPDATE messages happens whenever route 436 changes to be announced are inefficiently packed. As discussed in a 437 later section, announcing routing changes sharing common attributes 438 in a single BGP UPDATE message helps save considerable bandwidth and 439 lower processing overhead. 441 Persistent BGP errors may cause BGP peers to flap persistently if 442 peer dampening is not implemented. This would result in significant 443 CPU utilization. Implementors may find it useful to implement peer 444 dampening to avoid such persistent peer flapping [BGP4]. 446 11. BGP Routing Information Bases (RIBs) 448 [BGP4] states "Any local policy which results in routes being added 449 to an Adj-RIB-Out without also being added to the local BGP speaker's 450 forwarding table, is outside the scope of this document". 452 However, several well-known implementations do not confirm that Loc- 453 RIB entries were used to populate the forwarding table before 454 installing them in the Adj-RIB-Out. The most common occurrence of 455 this is when routes for a given prefix are presented by more than one 456 protocol and the preferences for the BGP learned route is lower than 457 that of another protocol. As such, the route learned via the other 458 protocol is used to populate the forwarding table. 460 It may be desirable for an implementation to provide a knob that 461 permits advertisement of "inactive" BGP routes. 463 It may be also desirable for an implementation to provide a knob that 464 allows a BGP speaker to advertise BGP routes that were not selected 465 by decision process. 467 12. Update Packing 469 Multiple unfeasible routes can be advertised in a single BGP Update 470 message. In addition, one or more feasible routes can be advertised 471 in a single Update message so long as all prefixes share a common 472 attribute set. 474 The BGP4 protocol permits advertisement of multiple prefixes with a 475 common set of path attributes to be advertised in a single update 476 message, this is commonly referred to as "update packing". When 477 possible, update packing is recommended as it provides a mechanism 478 for more efficient behavior in a number of areas, to include: 480 o Reduction in system overhead due to generation or receipt of 481 fewer Update messages. 483 o Reduction in network overhead as a result of less packets 484 and lower bandwidth consumption. 486 o Allows you to process path attributes and look for matching 487 sets in your AS_PATH database (if you have one) less 488 frequently. Consistent ordering of the path attributes 489 allows for ease of matching in the database as you don't have 490 different representations of the same data. 492 The BGP protocol suggests that withdrawal information should be 493 packed in the begining of Update message, followed by information 494 about more or less specific reachable routes in a single UPDATE 495 message. This helps alleviate excessive route flapping in BGP. 497 13. Limit Rate Updates 499 The BGP protocol defines different mechanisms to rate limit Update 500 advertisement. The BGP protocol defines MinRouteAdvertisementInterval 501 parameter that determines the minimum time that must be elapse 502 between the advertisement of routes to a particular destination from 503 a single BGP speaker. This value is set on a per BGP peer basis. 505 Due to the fact that BGP relies on TCP as the Transport protocol, TCP 506 can prevent transmission of data due to empty windows. As a result, 507 multiple Updates may be spaced closer together than orginally queued. 508 Although this is not a common occurrence, implementations should be 509 aware of this. 511 13.1. Consideration of TCP Characteristics 513 If a TCP receiver is processing input more slowly than the sender or 514 if the TCP connection rate is the limiting factor, a form of 515 backpressure is observed by the TCP sending application. When the 516 TCP buffer fills, the sending application will either block on the 517 write or receive an error on the write. Common errors in either 518 early implementations or an occasional naive new implementation are 519 to either set options to block on the write or set options for non- 520 blocking writes and then treat the errors due to a full buffer as 521 fatal. 523 Having recognized that full write buffers are to be expected 524 additional implementation pitfalls exist. The application should not 525 attempt to store the TCP stream within the application itself. If 526 the receiver or the TCP connection is persistently slow, then the 527 buffer can grow until memory is exhausted. A BGP implementation is 528 required to send changes to all peers for which the TCP connection is 529 not blocked and is required to remember to send those changes to the 530 remaining peers when the connection becomes unblocked. 532 If the preferred route for a given NLRI changes multiple times while 533 writes to one or more peers is blocked, only the most recent best 534 route needs to be sent. In this way BGP is work conserving. In 535 times of extremely high route change, a higher volume of route change 536 is sent to those peers which are able to process it more quickly and 537 a lower volume of route change is sent to those peers not able to 538 process the changes as quickly. 540 For implentations which handle differing peer capacity to absorb 541 route change well, if the majority of route change is contributed by 542 a subset of unstable NRLI, the only impact on relatively stable NRLI 543 which make an isolated route change is a slower convergence for which 544 convergence time remains bounded regardless of the amount of 545 instability. 547 14. Ordering of Path Attributes 549 The BGP protocol suggests that BGP speakers sending multiple prefixes 550 per an UPDATE message should sort and order path attributes according 551 to Type Codes. This would help their peers to quickly identify sets 552 of attributes from different update messages which are semantically 553 different. 555 Implementers may find it useful to order path attributes according to 556 Type Code so that sets of attributes with identical semantics can be 557 more quickly identified. 559 15. AS_SET Sorting 561 AS_SETs are commonly used in BGP route aggregation. They reduce the 562 size of AS_PATH information by listing AS numbers only once 563 regardless of any number of times it might appear in process of 564 aggregation. AS_SETs are usually sorted in increasing order to 565 facilitate efficient lookups of AS numbers within them. This 566 optimization is entirely optional. 568 16. Control over Version Negotiation 570 Because pre-BGP-4 route aggregation can't be supported by earlier 571 version of BGP, an implementation that supports versions in addition 572 to BGP-4 should provide the version support on a per-peer basis. At 573 the time of this writing all BGP speakers on the Internet are thought 574 to be running BGP version 4. 576 17. Security Considerations 578 BGP a provides flexible and extendable mechanism for authentication 579 and security. The mechanism allows to support schemes with various 580 degree of complexity. BGP sessions are authenticated based on the IP 581 address of a peer. In addition, all BGP sessions are authenticated 582 based on the autonomous system number advertised by a peer. 584 Since BGP runs over TCP and IP, BGP's authentication scheme may be 585 augmented by any authentication or security mechanism provided by 586 either TCP or IP. 588 17.1. TCP MD5 Signature Option 590 [RFC 2385] defines a way in which the TCP MD5 signature option can be 591 used to validate information transmitted between two peers. This 592 method prevents any third party from injecting information (e.g., a 593 TCP Reset) into the datastream, or modifying the routing information 594 carried between two BGP peers. 596 TCP MD5 is not ubiquitously deployed at the moment, especially in 597 inter- domain scenarios, largely because of key distribution issues. 598 Most key distribution mechanisms are considered to be too "heavy" at 599 this point. 601 It was naively assumed by many for some time that in order to inject 602 a data segement or reset a TCP transport connection between two BGP 603 peers an attacker must correctly guess the exact TCP sequence number 604 (of course, in addition to source and destination ports and IP 605 addresses). However, it has recently been observed and openly 606 discussed that the malicous data only needs to fall within the TCP 607 receive window, which may be quite large, thereby significantly 608 lowering the complexity of such an attack. 610 As such, it is recommended that the MD5 TCP Signature Option be 611 employed to protect BGP from session resets and malicious data 612 injection. 614 17.2. BGP Over IPSEC 616 BGP can run over IPSEC, either in a tunnel, or in transport mode, 617 where the TCP portion of the IP packet is encrypted. This not only 618 prevents random insertion of information into the data stream between 619 two BGP peers, it also prevents an attacker from learning the data 620 which is being exchanged between the peers. 622 IPSEC does, however, offer several options for exchanging session 623 keys, which may be useful on inter-domain configurations. These 624 options are being explored in many deployments, although no 625 definitive solution has been reached on the issue of key exchange for 626 BGP in IPSEC. 628 It should be noted that since BGP runs over TCP and IP, BGP is 629 vulnerable to the same denial of service or authentication attacks 630 that are present in any other TCP based protocol. 632 17.3. Miscellaneous 634 Another issue any routing protocol faces is providing evidence of the 635 validity and authority of the routing information carried within the 636 routing system. This is currently the focus of several efforts at 637 the moment, including efforts to define the threats which can be used 638 against this routing information in BGP [draft-murphy, attack tree], 639 and efforts at developing a means to provide validation and authority 640 for routing information carried within BGP [SBGP] [soBGP]. 642 In addition, the Routing Protocol Security Requirements (RPSEC) 643 working group has been chartered within the Routing Area of the IETF 644 in order to discuss and assist in addressing issues surrounding 645 routing protocol security. It is the intent that this work within 646 RPSEC will result in feedback to BGPv4 and future enhancements to the 647 protocol where appropriate. 649 18. PTOMAINE and GROW 651 The Prefix Taxonomy (PTOMAINE) working group, recently replaced by 652 the Global Routing Operations (GROW) working group, is chartered to 653 consider and measure the problem of routing table growth, the effects 654 of the interactions between interior and exterior routing protocols, 655 and the effect of address allocation policies and practices on the 656 global routing system. Finally, where appropriate, GROW will also 657 document the operational aspects of measurement, policy, security and 658 VPN infrastructures. 660 One such item GROW is currently studying is the effects of route 661 aggregation and the inability to aggregate over multiple provider 662 boundaries due to inadequate provider coordination. 664 It is the intent that this work within GROW will result in feedback 665 to BGPv4 and future enhancements to the protocol as necessary. 667 19. Internet Routing Registries (IRRs) 669 Many organizations register their routing policy and prefix 670 origination in the various distributed databases of the Internet 671 Routing Registry. These databases provide access to the information 672 using the RPSL language as defined in [RFC 2622]. While registered 673 information may be maintained and correct for certain providers, the 674 lack of timely or correct data in the various IRR databases has 675 prevented wide-spread use of this resource. 677 20. Regional Internet Registries (RIRs) and IRRs, A Bit of History 679 The NSFNET program used EGP and then BGP to provide external routing 680 information. It was the NSF policy of offering differing pricing and 681 providing a different level of support to the Research and Education 682 (RE) networks and the Commercial (CO) networks that led to BGP's 683 initial policy requirements. CO networks were not able to use the 684 NSFNET backbone to reach other CO networks, in addition to being 685 charged more. The rationale was that commercial users of the NSFNET 686 with business with research entities should subsidize the RE 687 community. Recognition that the Internet was evolving away from a 688 hierarchical network to a mesh of peers led to changes from EGP and 689 BGP-1 that eliminated any assumptions of hierarchy. 691 Enforcement of NSF policy was accomplished through maintenance of the 692 NSF Policy Routing Database (PRDB). The PRDB not only contained each 693 networks designation as CO or RE, but also contained a list of the 694 preferred exit points to the NSFNET to reach each network. This was 695 the basis for setting what would later be called BGP LOCAL_PREF on 696 the NSFNET. Tools provided with the PRDB generated complete router 697 configurations for the NSFNET. 699 Use of the PRDB had the fortunate consequence of greatly improving 700 reliability of the NSFNET relative to peer networks of the time and 701 offering more optimal routing for those networks sufficiently 702 knowledgeable and willing to keep their entries current. 704 With the decommission of the NSFNET Backbone Network Service in 1995, 705 it was recognized that the PRDB should be made less single provider 706 centric and its legacy contents plus any further updates made 707 available to any provider willing to make use of it. The European 708 networking community had long seen the PRDB as too US centric. 709 Through Reseaux IP Europeens (RIPE) the Europeans had created an open 710 format in RIPE-181 and had been maintaining an open database used for 711 address and AS registry more than policy. The initial conversion of 712 the PRDB was to RIPE-181 format and tools were converted to make use 713 of this format. The collection of databases was termed the Internet 714 Routing Registry, with the RIPE database and US NSF funded Routing 715 Arbitrator (RA) being the inital components of the IRR. 717 A need to extend RIPE-181 was recognized and RIPE agreed to allow the 718 extensions to be defined within the IETF in the RPS WG. The result 719 was the RPSL language. Other work products of the RPS WG provided an 720 authentication framework and means to widely distribute the database 721 in a controlled manner and synchronize the many repositories. Freely 722 available tools were provided primarily by RIPE, Merit, and ISI, the 723 most comprehensive set from ISI. The efforts of the IRR participants 724 has been severely hampered by providers unwilling to keep information 725 in the IRR up to date. The larger of these providers have been 726 vocal, claiming that the database entry, simple as it may be, are an 727 administrative burden and some acknowledge that doing so provides a 728 advantage to competitors that use the IRR. The result has been an 729 erosion of the usefulness of the IRR and an increase in vulnerability 730 of the Internet to routing based attack or accidental injection of 731 faulty routing information. 733 There have been numerous cases of accidental disruption of Internet 734 routing which were avoided by providers using the IRR but highly 735 detrimental to non-users. As filters have had to be relaxed due to 736 the erosion of the IRR to less complete coverage, these types of 737 disruptions have continued to occur very infrequently, but have had 738 increasingly widespread impact. 740 21. Acknowledgements 742 We would like to thank Paul Traina and Yakov Rekhter for authoring 743 previous versions of this document and providing valuable input on 744 this update as well. We would also like to explicitly acknowledge 745 Curtis Villamizar for providing both text and thorough reviews. 746 Thanks to Russ White, Jeffrey Haas, Sean Mentzer, Mitchell Erblich 747 and Jude Ballard for supplying their usual keen eye. 749 Finally, we'd like to think the IDR WG for general and specific input 750 that contributed to this document. 752 22. References 754 22.1. Normative References 756 [RFC 1519] Fuller, V., Li. T., Yu J., and K. Varadhan, "Classless 757 Inter-Domain Routing (CIDR): an Address Assignment and 758 Aggregation Strategy", RFC 1519, September 1993. 760 [RFC 1966] Bates, T., Chandra, R., "BGP Route Reflection: An 761 alternative to full mesh IBGP", RFC 1966, June 1996. 763 [RFC 2385] Heffernan, A., "Protection of BGP Sessions via the TCP 764 MD5 Signature Option", RFC 2385, August 1998. 766 [RFC 2439] Villamizar, C. and Chandra, R., "BGP Route Flap Damping", 767 RFC 2439, November 1998. 769 [RFC 2796] Bates, T., Chandra, R., and Chen, E, "Route Reflection - 770 An Alternative to Full Mesh IBGP", RFC 2796, April 2000. 772 [RFC 3065] Traina, P., McPherson, D., and Scudder, J, "Autonomous 773 System Confederations for BGP", RFC 3065, Febuary 2001. 775 [RFC 3345] McPherson, D., Gill, V., Walton, D., and Retana, A, "BGP 776 Persistent Route Oscillation Condition", RFC 3345, 777 August 2002. 779 [BGP4-ANALYSIS] "BGP-4 Protocol Analysis", Internet-Draft, Work in 780 Progress. 782 [BGP4-IMPL] "BGP 4 Implementation Report ", Internet-Draft, Work 783 in Progress. 785 [BGP4] Rekhter, Y., T. Li., and Hares. S, Editors, "A Border 786 Gateway Protocol 4 (BGP-4)", BGP Draft, Work in Progress. 788 [RFC 1657] Willis, S., Burruss, J., Chu, J., " Definitions of 789 Managed Objects for the Fourth Version of the Border 790 Gateway Protocol (BGP-4) using SMIv2", RFC 1657, July 791 1994. 793 [SBGP] "Secure BGP", Internet-Draft, Work in Progress. 795 [soBGP] "Secure Origin BGP", Internet-Draft, Work in Progress. 797 [RFC 793] Postel, J., "Transmission Control Protocol", RFC 793, 798 September 1981. 800 22.2. Informative References 802 [RFC 1105] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol 803 BGP", RFC 1105, June 1989. 805 [RFC 1163] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol 806 BGP", RFC 1105, June 1990. 808 [RFC 1264] Hinden, R., "Internet Routing Protocol Standardization 809 Criteria", RFC 1264, October 1991. 811 [RFC 1267] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol 3 812 (BGP-3)", RFC 1105, October 1991. 814 [RFC 1269] Willis, S., and Burruss, J., "Definitions of Managed 815 Objects for the Border Gateway Protocol (Version 3)", 816 RFC 1269, October 1991. 818 [RFC 1656] Traina, P., "BGP-4 Protocol Document Roadmap and 819 Implementation Experience", RFC 1656, July 1994. 821 [RFC 1771] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 822 (BGP-4)", RFC 1771, March 1995. 824 [RFC 1772] Rekhter, Y., and P. Gross, Editors, "Application of the 825 Border Gateway Protocol in the Internet", RFC 1772, March 826 1995. 828 [RFC 1773] Traina, P., "Experience with the BGP-4 protocol", RFC 829 1773, March 1995. 831 [RFC 2622] C. Alaettinoglu et al., "Routing Policy Specification 832 Language", RFC 2622, June 1999. 834 23. Authors' Addresses 835 Danny McPherson 836 Arbor Networks 837 Email: danny@arbor.net 839 Keyur Patel 840 Cisco Systems 841 Email: keyupate@cisco.com 843 Intellectual Property Statement 845 The IETF takes no position regarding the validity or scope of any 846 Intellectual Property Rights or other rights that might be claimed to 847 pertain to the implementation or use of the technology described in 848 this document or the extent to which any license under such rights 849 might or might not be available; nor does it represent that it has 850 made any independent effort to identify any such rights. Information 851 on the procedures with respect to rights in RFC documents can be 852 found in BCP 78 and BCP 79. 854 Copies of IPR disclosures made to the IETF Secretariat and any 855 assurances of licenses to be made available, or the result of an 856 attempt made to obtain a general license or permission for the use of 857 such proprietary rights by implementers or users of this 858 specification can be obtained from the IETF on-line IPR repository at 859 http://www.ietf.org/ipr. 861 The IETF invites any interested party to bring to its attention any 862 copyrights, patents or patent applications, or other proprietary 863 rights that may cover technology that may be required to implement 864 this standard. Please address the information to the IETF at 865 ietf-ipr@ietf.org. 867 Disclaimer of Validity 869 This document and the information contained herein are provided on an 870 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 871 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 872 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 873 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 874 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 875 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 877 Copyright Statement 879 Copyright (C) The Internet Society (2004). This document 880 is subject to the rights, licenses and restrictions contained in 881 BCP 78, and except as set forth therein, the authors retain all 882 their rights. 884 Acknowledgment 886 Funding for the RFC Editor function is currently provided by the 887 Internet Society.