idnits 2.17.1 draft-ietf-idr-as-hopcount-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 452. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 429. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 436. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 442. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 22, 2005) is 6699 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '4B AS' -- Possible downref: Non-RFC (?) normative reference: ref. 'IDRP' ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 3065 (Obsoleted by RFC 5065) ** Downref: Normative reference to an Informational RFC: RFC 3221 Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Li, Ed. 3 Internet-Draft Portola Networks, Inc. 4 Expires: June 25, 2006 R. Fernando, Ed. 5 Amoora, Inc. 6 J. Abley, Ed. 7 Internet Systems Consortium 8 December 22, 2005 10 The AS_HOPCOUNT Path Attribute 11 draft-ietf-idr-as-hopcount-00.txt 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on June 25, 2006. 38 Copyright Notice 40 Copyright (C) The Internet Society (2005). 42 Abstract 44 This document describes the AS hopcount path attribute for BGP. This 45 is an optional, transitive path attribute that is designed to help 46 limit the distribution of routing information in the Internet. 48 By default, prefixes advertised into the BGP mesh are distributed 49 freely, and if not blocked by policy will propagate globally. This 50 is harmful to the scalability of the routing subsystem since 51 information that only has a local effect on routing will cause state 52 creation throughout the default-free zone. This attribute can be 53 attached to a particular path to limit its scope to a subset of the 54 Internet. 56 Table of Contents 58 1. Requirements notation . . . . . . . . . . . . . . . . . . . . 3 59 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Inter-Domain Traffic Engineering . . . . . . . . . . . . . . . 5 61 3.1. Traffic Engineering on a Diet . . . . . . . . . . . . . . 6 62 3.2. AS_HOPCOUNT as Control . . . . . . . . . . . . . . . . . . 7 63 3.3. AS_HOPCOUNT and NO_EXPORT . . . . . . . . . . . . . . . . 7 64 4. Anycast Service Distribution . . . . . . . . . . . . . . . . . 9 65 5. The AS_HOPCOUNT Attribute . . . . . . . . . . . . . . . . . . 10 66 5.1. Operations . . . . . . . . . . . . . . . . . . . . . . . . 10 67 5.2. Proxy Control . . . . . . . . . . . . . . . . . . . . . . 11 68 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 69 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 70 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 71 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 73 Intellectual Property and Copyright Statements . . . . . . . . . . 16 75 1. Requirements notation 77 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 78 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 79 document are to be interpreted as described in [RFC2119]. 81 2. Introduction 83 A prefix that is injected into BGP [RFC1771] will propagate 84 throughout the mesh of all BGP speakers unless it is explicitly 85 blocked by policy configuration. This behavior is necessary for the 86 correct operation of BGP, but has some unfortunate interactions with 87 current operational procedures. Currently, it is beneficial in some 88 cases to inject longer prefixes into BGP to control the flow of 89 traffic headed towards a particular destination. These longer 90 prefixes may be advertised in addition to an aggregate, even when the 91 aggregate advertisement is sufficient for basic reachability. This 92 particular application is known as "inter-domain traffic engineering" 93 and is a well-known phenomenon that is contributing to growth in the 94 size of the global routing table [RFC3221]. The mechanism proposed 95 here allows the propagation of those longer prefixes to be limited, 96 allowing some traffic engineering problems to be solved without such 97 global implications. 99 Another application of this mechanism is concerned with the 100 distribution of services across the Internet using anycast. Allowing 101 an anycast address advertisement to be limited to a subset of ASes in 102 the network can help control the scope of the anycast service area. 104 3. Inter-Domain Traffic Engineering 106 To perform traffic engineering, a multi-homed site advertises its 107 prefix to all of its neighbors and then also advertises more specific 108 prefixes to a subset of its neighbors. The longest match lookup 109 algorithm then causes traffic for the more specific prefixes to 110 prefer the subset of neighbors with the more specific. 112 Figure 1 shows an example of traffic engineering and its impact on 113 the network. The multi-homed site (A) has a primary provider (C) and 114 a secondary provider (B). It has a prefix, Y, that provides 115 reachability to all of A, and advertises this to both B and C. In 116 addition, due to the internal topology of end-site A, it wishes that 117 all incoming traffic to subset X of its site enter through provider 118 B. To accomplish this, A advertises the more specific prefix, X, to 119 provider B. Longest match again causes traffic to prefer X over Y if 120 the destination of the traffic is within X. 122 Assuming that there are no policy boundaries involved, BGP will 123 propagate both of these prefixes A and X throughout the entire AS- 124 level topology. This includes distant providers such as H, F and G. 125 Unfortunately, this adds to the amount of overhead in the routing 126 subsystem. The problem to be solved is to reduce this overhead and 127 thereby improve the scalability of the routing of the Internet. 129 ,--------------. ,--------------. ,--------------. 130 | Tier 2 +---+ Tier 2 +---+ Tier 3 | 131 | Provider H | | Provider E | | Provider F | 132 `--------------' `-+---------+--+ `--------------' 133 / | 134 / | 135 ,------------------+---. ,----+---------. ,-------------. 136 | Tier 1 +---+ Tier 1 | | Tier 1 | 137 | Primary Provider C | | Provider D +---+ Provider G | 138 `--------+-----------+-' `-------+------' `-------------' 139 | \ | 140 |Y \ | 141 ,--------+------. ,-+----------+-----------. 142 | Multi-homed +-----+ Tier 2 | 143 | site A |Y,X | Secondary Provider B | 144 `---------------' `------------------------' 146 The longer prefix X traverse a core and then coincides with the less- 147 specific, covering prefix Y. 149 Figure 1 151 3.1. Traffic Engineering on a Diet 153 What is needed is one or more mechanisms that an AS can use to 154 distribute its more specific routing information to a subset of the 155 network that exceeds its immediate neighboring ASes and yet is also 156 significantly less than the global BGP mesh. The solution space for 157 this is fully unbounded, as the limits that a source AS may wish to 158 apply to its more specific routes could be a fairly complicated 159 manifestation of its routing policies. One can imagine a policy that 160 restricts more specifics to ASes that only have prime AS numbers, for 161 example. 163 We already have one mechanism for performing this type of function. 164 The BGP NO_EXPORT community string attribute [RFC1997] can be 165 attached to more specific prefixes. This will cause the more 166 specifics not to be advertised past the immediate neighboring AS. 167 This is effective at helping to prevent more specific prefixes from 168 becoming global, but it is extremely limited in that the more 169 specific prefixes can only propagate to adjacent ASes. 171 Referring again to our example, A can advertise X with NO_EXPORT to 172 provider B. However, this will cause provider B not to advertise X to 173 the remainder of the network, and providers C, D, and G will not have 174 the longer prefixes and will thus send all of A's traffic via 175 provider C. This is not what A hoped to accomplish with advertising a 176 longer prefix and demonstrates why this NO_EXPORT mechanism is not 177 sufficiently flexible. 179 Instead of attempting to provide an infinitely flexible and 180 complicated mechanism for controlling the distribution of prefixes, 181 we propose a single, coarse control mechanism. This coarse mechanism 182 will provide a limited amount of control but at a very low cost and 183 address most of the evils associated with performing traffic 184 engineering through route distribution. 186 We observe that traffic engineering via longer prefixes is only 187 effective when the longer prefixes have a different next hop from the 188 less specific prefix. Thus, past the point where the next hops 189 become identical, the longer prefixes provide no value whatsoever. 190 We also observe that most traffic ends up traversing a subset of the 191 network operated by a relatively small number of large market- 192 dominant providers, joined by settlement-free interconnects. If one 193 looks one AS hop past this subset of the network, it is likely that 194 the longer prefixes and the site aggregate are using the same next 195 hop, and thus the longer prefixes have stopped providing value. 197 We can see this clearly in our example. Provider F sees that both 198 prefix X and prefix Y will lead all traffic through provider E. There 199 is no point in F carrying and propagating the more specific prefix X. 200 Similarly, providers G and H need not carry prefix X. 202 3.2. AS_HOPCOUNT as Control 204 To accomplish this, we propose to add information that will limit the 205 radius of propagation of more specific prefixes. If we attach a 206 count of the ASes that may be traversed by the more specific prefix, 207 we gain much of the control that we hope to achieve. For example, if 208 prefix X is advertised with hopcount 1, then only provider B has the 209 information and we get an effect that is identical to NO_EXPORT. If 210 prefix X is advertised with hopcount 2, then only B, C and D will 211 carry it. This is an interesting compromise as traffic for X will 212 now flow consistently through provider B, as desired. 214 However, this is not identical to fully distributing X. Consider, for 215 example that provider E in this circumstance will not receive prefix 216 X and is likely to prefer provider C for all A destinations. This 217 causes traffic for X to flow from E to C to B. If provider E did have 218 prefix X, it may choose to prefer provider D instead, resulting in a 219 different path. This second result can be achieved by increasing the 220 hopcount to 3, but this has the unfortunate effect that provider G 221 would also receive prefix X. 223 Thus, AS_HOPCOUNT is an extremely lightweight mechanism, and achieves 224 a great deal of control. It is easy to imagine more complicated 225 control mechanisms, such IDRP [IDRP] distribution lists, but we 226 currently find that the complexity of such a mechanism is simply not 227 warranted. 229 3.3. AS_HOPCOUNT and NO_EXPORT 231 Further control can be achieved by considering the implications of 232 using both AS_HOPCOUNT and NO_EXPORT simultaneously. Since NO_EXPORT 233 is widely deployed, understood by almost all implementations, and 234 since AS_HOPCOUNT is not deployed, we can make use of the overlap in 235 their semantics to provide a powerful transition mechanism. 237 Systems that receive NLRI with only the AS_HOPCOUNT attribute but 238 which do not implement AS_HOPCOUNT will ignore the attribute. This 239 will provide the current, existing behavior and the NLRI will 240 propagate according to normal BGP rules. 242 Systems that receive NLRI with both an AS_HOPCOUNT and NO_EXPORT and 243 which do implement AS_HOPCOUNT will ignore the NO_EXPORT community 244 and propagate the NLRI. 246 Systems that receive NLRI with both an AS_HOPCOUNT and NO_EXPORT but 247 which do not implement AS_HOPCOUNT will recognize and operate 248 according to NO_EXPORT semantics. This will cause them not to 249 forward the NLRI to other ASes. 251 Thus, an AS that chooses to attach the AS_HOPCOUNT attribute can 252 control how their NLRI will be processed by other ASes. If the NLRI 253 should be dropped by ASes that do not support AS_HOPCOUNT, then 254 NO_EXPORT can be attached. If the NLRI should propagate by default, 255 then NO_EXPORT should not be attached. 257 4. Anycast Service Distribution 259 A growing number of services are being distributed using anycast, by 260 advertising a route which covers one or more addresses for a service 261 which is provided autonomously at multiple locations. 263 For some services, it is useful to restrict the peak possible service 264 load, to avoid overloading local connectivity or service 265 infrastructure capabilities; it may be a better failure mode for 266 service to be retained only for a small community of surrounding 267 networks than for a single node to fail under a global load of 268 queries. 270 Although to some degree this policy can be accomplished through 271 negotiation and judicious use of NO_EXPORT without AS_HOPCOUNT, the 272 AS_HOPCOUNT attribute provides a more flexible and reliable 273 mechanism. 275 5. The AS_HOPCOUNT Attribute 277 The AS_HOPCOUNT attribute is a transitive optional BGP path 278 attribute, with Type Code XXXX. The AS_HOPCOUNT attribute has a 279 fixed length of 5 octets. The first octet is an unsigned number that 280 is the hopcount of the associated paths. The second thru fifth 281 octets are the AS number of the AS that attached the AS_HOPCOUNT 282 attribute to the NLRI. 284 5.1. Operations 286 A BGP speaker attaching the AS_HOPCOUNT attribute to an NLRI MUST 287 encode its AS number in the second thru fifth octets. The encoding 288 is described in [4B AS]. This information is intended to aid 289 debugging in the case where the AS_HOPCOUNT attribute is added by an 290 AS other than the originator of the NLRI. 292 A BGP speaker receiving a route with an associated AS_HOPCOUNT 293 attribute from an EBGP neighbor MUST examine the value of the 294 attribute. If the attribute value is zero, the path MUST be ignored 295 without further processing. If the attribute value is non-zero, then 296 the BGP speaker MAY process the path. 298 When a BGP speaker propagates a route with an associated AS_HOPCOUNT 299 attribute, which it has learned from another BGP speaker's UPDATE 300 message, it MUST modify the route's AS_HOPCOUNT attribute based on 301 the location of the BGP speaker to which the route will be sent: 303 a. When a given BGP speaker advertises the route to an internal 304 peer, the advertising speaker SHALL NOT modify the AS_HOPCOUNT 305 attribute associated with the route. 307 b. If the BGP speaker chooses to advertise the route to an external 308 peer, then the BGP speaker MUST advertise an AS_HOPCOUNT 309 attribute of one less than the value received. 311 In the context of a confederation [RFC3065], all peers outside of the 312 BGP speaker's Member-AS are considered external peers. 314 If a BGP speaker receives a route with both the AS_HOPCOUNT attribute 315 and the NO_EXPORT community string attribute, then the normal 316 semantics of NO_EXPORT do not apply and the route should be processed 317 as if NO_EXPORT was not present. 319 BGP requires that a BGP speaker that advertises a less specific 320 prefix, but not a more specific prefix that it is using, must 321 advertise the less specific prefix with the ATOMIC_AGGREGATE 322 attribute. BGP speakers that do not advertise a more specific prefix 323 based on the AS_HOPCOUNT must comply with this rule and advertise the 324 less specific prefixes with the ATOMIC_AGGREGATE attribute. To help 325 ensure compliance with this, sites that choose to advertise the 326 AS_HOPCOUNT path attribute should advertise the ATOMIC_AGGREGATE 327 attribute on all less specific covering prefixes. 329 5.2. Proxy Control 331 An AS may attach the AS_HOPCOUNT attribute to a path that it has 332 received from another system. This is a form of proxy aggregation 333 and may result in routing behaviors that the origin of the path did 334 not intend. Further, if the overlapping prefixes are not advertised 335 with the ATOMIC_AGGREGATE attribute, adding the AS_HOPCOUNT attribute 336 may cause defective implementations to advertise incorrect paths. 337 Before adding the AS_HOPCOUNT attribute an AS must carefully consider 338 the risks and consequences outlined here. 340 6. Security Considerations 342 This new BGP attribute creates no new security issues. For it to be 343 used, it must be attached to a BGP route. If the router is forging a 344 route, then this attribute limits the extent of the damage caused by 345 the forgery. If a router attaches this attribute to a route, then it 346 could have just as easily have used normal policy mechanisms to 347 filter out the route. 349 7. IANA Considerations 351 IANA is hereby requested to allocate a code point from the BGP path 352 attribute Type Code space for the AS_HOPCOUNT path attribute. Please 353 replace 'XXXX' in the text above with the newly allocated code point 354 value. 356 8. Acknowledgements 358 The editors would like to acknowledge that they are not the original 359 initiators of this concept. Over the years, many similar proposals 360 have come our way, and we had hoped that self-discipline would cause 361 this type of mechanism to be unnecessary. We were overly optimistic. 363 The names of those who originally proposed this are now lost to the 364 mists of time. This should rightfully be their document. We would 365 like to thank them for the opportunity to steward their concept to 366 fruition. 368 9. References 370 [4B AS] Vohra, Q. and E. Chen, "BGP support for Four-octet AS 371 Number Space", Sept. 2005, . 374 [IDRP] ISO/IEC, "Information Processing Systems - 375 Telecommunications and Information Exchange between 376 Systems - Protocol for Exchange of Inter-domain Routeing 377 Information among Intermediate Systems to Support 378 Forwarding of ISO 8473 PDUs", IS 10747, 1993, . 381 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 382 (BGP-4)", RFC 1771, March 1995. 384 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 385 Communities Attribute", RFC 1997, August 1996. 387 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 388 Requirement Levels", BCP 14, RFC 2119, March 1997. 390 [RFC3065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 391 System Confederations for BGP", RFC 3065, February 2001. 393 [RFC3221] Huston, G., "Commentary on Inter-Domain Routing in the 394 Internet", RFC 3221, December 2001. 396 Authors' Addresses 398 T. Li (editor) 399 Portola Networks, Inc. 401 Email: tony.li@tony.li 403 R. Fernando (editor) 404 Amoora, Inc. 405 1463 Cedarmeadow Ct. 406 San Jose, CA 95131 407 US 409 Email: rex_f@yahoo.com 411 J. Abley (editor) 412 Internet Systems Consortium 413 950 Charter Street 414 Redwood City, CA 94023 415 US 417 Phone: +1 650 423 1317 418 Email: jabley@isc.org 420 Intellectual Property Statement 422 The IETF takes no position regarding the validity or scope of any 423 Intellectual Property Rights or other rights that might be claimed to 424 pertain to the implementation or use of the technology described in 425 this document or the extent to which any license under such rights 426 might or might not be available; nor does it represent that it has 427 made any independent effort to identify any such rights. Information 428 on the procedures with respect to rights in RFC documents can be 429 found in BCP 78 and BCP 79. 431 Copies of IPR disclosures made to the IETF Secretariat and any 432 assurances of licenses to be made available, or the result of an 433 attempt made to obtain a general license or permission for the use of 434 such proprietary rights by implementers or users of this 435 specification can be obtained from the IETF on-line IPR repository at 436 http://www.ietf.org/ipr. 438 The IETF invites any interested party to bring to its attention any 439 copyrights, patents or patent applications, or other proprietary 440 rights that may cover technology that may be required to implement 441 this standard. Please address the information to the IETF at 442 ietf-ipr@ietf.org. 444 Disclaimer of Validity 446 This document and the information contained herein are provided on an 447 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 448 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 449 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 450 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 451 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 452 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 454 Copyright Statement 456 Copyright (C) The Internet Society (2005). This document is subject 457 to the rights, licenses and restrictions contained in BCP 78, and 458 except as set forth therein, the authors retain all their rights. 460 Acknowledgment 462 Funding for the RFC Editor function is currently provided by the 463 Internet Society.