idnits 2.17.1 draft-li-idr-congestion-status-extended-community-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC4360, but the abstract doesn't seem to directly say this. It does mention RFC4360 though, so this could be OK. -- The draft header indicates that this document updates RFC7153, but the abstract doesn't seem to directly say this. It does mention RFC7153 though, so this could be OK. -- The draft header indicates that this document updates RFC4271, but the abstract doesn't seem to directly say this. It does mention RFC4271 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC4271, updated by this document, for RFC5378 checks: 2006-01-13) (Using the creation date from RFC4360, updated by this document, for RFC5378 checks: 2001-07-10) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 26, 2017) is 2373 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4271' is defined on line 414, but no explicit reference was found in the text == Outdated reference: A later version (-11) exists of draft-ietf-idr-wide-bgp-communities-04 == Outdated reference: A later version (-15) exists of draft-gredler-idr-bgplu-epe-11 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Z. Li 3 Internet-Draft China Mobile 4 Updates: 4271, 4360, 7153 (if approved) J. Dong 5 Intended status: Standards Track Huawei Technologies 6 Expires: April 29, 2018 October 26, 2017 8 Carry congestion status in BGP community 9 draft-li-idr-congestion-status-extended-community-06 11 Abstract 13 To aid BGP receiver to steer the AS-outgoing traffic among the exit 14 links, this document introduces a new BGP community, congestion 15 status community, to carry the link bandwidth and utilization 16 information, especially for the exit links of one AS. If accepted, 17 this document will update RFC4271, RFC4360 and RFC7153. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on April 29, 2018. 36 Copyright Notice 38 Copyright (c) 2017 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 55 3. Previous Work . . . . . . . . . . . . . . . . . . . . . . . . 4 56 4. Solution Alternative 1: Extended Community . . . . . . . . . 4 57 5. Solution Alternative 2: Large Community . . . . . . . . . . . 5 58 6. Solution Alternative 3: Community Container . . . . . . . . . 6 59 7. Deployment Considerations . . . . . . . . . . . . . . . . . . 7 60 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 61 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 62 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 63 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 64 11.1. Normative References . . . . . . . . . . . . . . . . . . 9 65 11.2. Informative References . . . . . . . . . . . . . . . . . 10 66 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 68 1. Introduction 70 Knowing the congestion status (bandwidth and utilization) of the AS 71 exit links is useful for traffic steering, especially for steering 72 the AS outgoing traffic among the exit links. Section 7 of 73 [I-D.gredler-idr-bgplu-epe] explicitly specifies this kind of 74 requirement, which is also needed in our field network. 76 The following figure is used to illustrate the benefits of knowing 77 the congestion status of the AS exit links. AS A has multiple exit 78 links connected to AS B. Both AS A and B has exit link to AS C, and 79 AS B provides transit service for AS A. Due to cost or some other 80 reasons, AS A prefers using AS B to transmit its' traffic to AS C, 81 not the directly connected link between AS A and C. If the exit 82 routers, Router 7 and 8, in AS A tell their iBGP peers the congestion 83 status of the exit links, the peers in turn can steer some outgoing 84 traffic toward the less loaded exit link. If AS A knows the link 85 between AS B and AS C is congested, it can steer some traffic towards 86 AS C from AS B to the directly connected link by applying some route 87 policies. 89 +-------------------------------------------+ 90 | AS C | 91 | +----------+ +----------+ | 92 +--| Router 1 |---------------| Router 2 |--+ 93 +----------+ +----------+ 94 | | 95 | | 96 | +----------+ 97 | +--------| Router 3 |----------+ 98 | | +----------+ | 99 | | AS B | 100 | | +----------+ +----------+ | 101 | +-| Router 4 |----| Router 5 |-+ 102 | +----------+ +----------+ 103 | | | 104 | | | 105 +----------+ +----------+ +----------+ 106 +--| Router 6 |--------| Router 7 |----| Router 8 |-+ 107 | +----------+ +----------+ +----------+ | 108 | AS A | 109 +---------------------------------------------------+ 111 This document introduces new BGP extensions to deliver the congestion 112 status of the exit link to other BGP speakers. The BGP receiver can 113 then use this community to deploy route policy, thus steer AS 114 outgoing traffic according to the congestion status of the exit 115 links. This mechanisum can be used by both iBGP and eBGP. 117 In this verion, we provide three solution alternatives according to 118 the discussion in the face to face meetings and mail list. After 119 adoption, one solution will be selected as the final solution based 120 on the working group consensus. 122 In a network deployed SDN (Software Defined Network) controller, 123 congestion status extended community can be used by the controller to 124 steer the AS outgoing traffic among all the exit links from the 125 perspective of the whole network. 127 For the network with Route Reflectors (RRs) [RFC4456], RRs by default 128 only advertise the best route for a specific prefix to their clients. 129 Thus RR clients has no opportunity to compare the congestion status 130 among all the exit links. In this situation, to allow RR clients 131 learning all the routes for a specific prefix from all the exit 132 links, RRs are RECOMMENDED to enable add-path functionality 133 [RFC7911]. 135 2. Requirements Language 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 138 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 139 document are to be interpreted as described in [RFC2119]. 141 3. Previous Work 143 In [constrained-multiple-path], authors from France Telecom also 144 specified the requirement to know the congestion status of a link. 146 To aid a router to perform unequal cost load balancing, experts from 147 Cisco introduced Link Bandwidth Extended Community in 148 [link-bandwidth-community] to carry the cost to reach the external 149 BGP neighbor. The cost can be either configured per neighbor or 150 derived from the bandwidth of the link that connects the router to a 151 directly connected external neighbor. This document was accepted by 152 the IDR working group, but expired in 2013. 154 Link Bandwidth Extended Community only carries the link bandwidth of 155 the exit link. The method provided in our document can carry the 156 link bandwidth together with the link utilization information. What 157 the BGP receiver needs to impact its traffic steering policy is the 158 up-to-date unused link bandwith, which can be derived from the link 159 bandwith and link utilization. Since Link Bandwidth Extended 160 Community is expired, the BGP speaker who receives update message 161 with both Link Bandwidth Extended Community and Congestion Status 162 Community SHOULD ignore the Link Bandwidth Extended Community and use 163 the Congestion Status Community. 165 4. Solution Alternative 1: Extended Community 167 As described in [RFC4360], the extended community attribute is an 168 8-octet value with the first one or two octets to indicate the type 169 of this attribute. Since congestion status community needs to be 170 delivered from on AS to other ASes, and used by the BGP speakers both 171 in other ASes and within the same AS as the sender, it MUST be a 172 transitive extended community, i.e. the T bit in the first octet MUST 173 be zero. 175 We only define the congestion status community for four-octet AS 176 number [RFC6793], since all the BGP speakers can handle four-octet AS 177 number now and the two-octet AS numbers can be mapped to four-octet 178 AS numbers by setting the two high-order octets of the four-octet 179 field to zero, as per [RFC6793]. 181 Congestion status community is a sub-type allocated from Transitive 182 Four-Octet AS-Specific Extended Community Sub-Types defined in 183 section 5.2.4 of [RFC7153]. Its format is as Figure 1. 185 0 1 2 3 186 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 188 | Type =0x02 | Sub-Type | Sender AS Number | 189 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 190 | Sender AS Number (cont.) | Bandwidth | Utilization | 191 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 193 Figure 1: Congestion status extended community 195 Type: 1 octet. This field MUST be 0x02 to indicate this is a 196 Transitive Four-Octet AS-Specific Extended Community. 198 Sub-Type: 1 octet. It is used to indicate this is a Congestion 199 Status Extended Community. Its value is to be assigned by IANA. 201 Sender AS Number: 4 octets. Its value is the AS number of the BGP 202 speaker who generates this congestion status extended community. 203 If the generator has 2-octct AS number, it MUST encode its AS 204 number in the last (low order) two bytes and set the first (high 205 order) two bytes to zero, as per [RFC6793]. 207 Bandwidth: 1 octet. Its value is the bandwidth of the exit link 208 in unit of 10 gbps (gigabits per second). The link with bandwidth 209 less than 10 gbps is not suitable to use this feature. To reflect 210 the practice that sometimes the traffic is rate limited to a 211 capacity smaller than the physical link, the value of the 212 bandwidth can be the configured capacity of the link. The 213 available configured capacity can be calculated from this field 214 together with Utilization field. Zero means the bandwidth is 215 unknown or is not advertised to other peers. 217 Utilization: 1 octet. Its value is the utilization of the exit 218 link in unit of percent. A value bigger than 100 means the 219 incoming traffic is higher than the link capacity. We can use the 220 "Utilization" field together with the "Bandwidth" field to 221 calculate the traffic load that we can further steer to this exit 222 link. 224 5. Solution Alternative 2: Large Community 226 As described in [RFC8092], the BGP large community attribute is an 227 optional transitive path attribute of variable length, consisting of 228 12-octet values. The BGP large community attribute is mainly used to 229 extend the size of BGP Community [RFC1997] and Extened Community 230 [RFC4360], thus to accommodate at least two four-octet ASNs 231 [RFC6793]. As shown in the following figure, the format of the 232 12-octet BGP Large Community value is not suitable to be used to 233 define new type for congestion status community. 235 0 1 2 3 236 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 238 | Global Administrator | 239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 240 | Local Data Part 1 | 241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 242 | Local Data Part 2 | 243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 245 Figure 2 247 Global Administrator: A four-octet namespace identifier. 249 Local Data Part 1: A four-octet operator-defined value. 251 Local Data Part 2: A four-octet operator-defined value. 253 6. Solution Alternative 3: Community Container 255 As described in [I-D.ietf-idr-wide-bgp-communities], the BGP 256 Community Container has flexible encoding format, which we can use to 257 define the congestion status community. 259 A new type of the BGP Community Container is defined for the 260 congestion status community, which has the same common header as the 261 BGP Community Container with the following encoding format. 263 0 1 2 3 264 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 | Type | Flags |C|T| Reserved | 267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 268 | Length | Sender AS Number | 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 | Sender AS Number (cont.) | Bandwidth | 271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 272 | Bandwidth (cont.) | Utilization | Reserved | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 Figure 3 277 Type: 2 octets. Its value is to be assigned by IANA from the 278 registry "BGP Community Container Types" to indicate this is the 279 Congestion Status Community. 281 Flags: 1 octet. C and T bits MUST be set to indicate the 282 Congestion Status Community is transitive across confederation and 283 AS boundaries. The other bits in Flags field MUST be set to zero 284 when originated and SHOULD be ignored upon receipt. 286 Reserved: Reserved fields are reserved for future definition, 287 which MUST be set to zero when originated and SHOULD be ignored 288 upon receipt. 290 Length: 2 octets. This field represents the total length of a 291 given container's contents in octets. 293 Sender AS Number: 4 octets. Its value is the AS number of the BGP 294 speaker who generates this congestion status community. If the 295 generator has 2-octct AS number, it MUST encode its AS number in 296 the last (low order) two bytes and set the first (high order) two 297 bytes to zero, as per [RFC6793]. 299 Bandwidth: 4 octets. Its value is the bandwidth of the exit link 300 in unit of mbps (million bits per second). To reflect the 301 practice that sometimes the traffic is rate limited to a capacity 302 smaller than the physical link, the value of the bandwidth can be 303 the configured capacity of the link. The available configured 304 capacity can be calculated from this field together with 305 Utilization field. Zero means the bandwidth is unknown or is not 306 advertised to other peers. 308 Utilization: 1 octet. Its value is the utilization of the exit 309 link in unit of percent. A value bigger than 100 means the 310 incoming traffic is higher than the link capacity. We can use the 311 "Utilization" field together with the "Bandwidth" field to 312 calculate the traffic load that we can further steer to this exit 313 link. 315 7. Deployment Considerations 317 o To avoid route oscillation 319 The exit router SHOULD set a threshold. When the utilization 320 change reaches the threshold, the exit router SHOULD generate a 321 BGP update message with congestion status community. 323 Implementations SHOULD further reduce the BGP update messages 324 trigered by link utilization change using the method similar to 325 BGP Route Flap Damping [RFC2439]. When link utilization change 326 by small amounts that fall under thresholds that would cause 327 the announcement of BGP update message, implementations SHOULD 328 suppress the announcement and set the penalty value 329 accordingly. 331 To reduce the update churn introduced, when one BGP router 332 needs to re-advertise a BGP path due to attribute changes, it 333 SHOULD update its Congestion Status Community at the same time. 334 Supposing there are N ASes on the way from the far end egress 335 BGP speaker to the final ingress BGP speaker, this allows 336 reducing the update churn as the final ingress BGP speaker will 337 receive a single UPDATE refreshing the N communities, rather 338 than N UPDATEs, each refreshing one community. 340 o To avoid traffic oscillation 342 Traffic oscillation means more traffic than expected is 343 attracted to the low utilized link, and some traffic has to be 344 steered back to other links. 346 Route policy is RECOMMENDED to be set at the exit router. 347 Congestion status community is only conveyed for some specific 348 routes or only for some specific BGP peers. 350 Congestion status community can also be used in a SDN network. 351 The SDN controller uses the exit link utilization information 352 to steer the Internet access traffic among all the exit links 353 from the perspective of the whole network. 355 o Other Conserns 357 To avoid forwarding loops incremental deployment issues, 358 complications in error handling, the reception of such 359 community over IBGP session SHOULD NOT influence routing 360 decision unless tunneling is used to reach the BGP Next-Hop. 362 8. Security Considerations 364 This document defines a new BGP community to carry the congestion 365 status of the exit link. It is up to the BGP receiver to trust the 366 congestion status communities or not. Following deployment models 367 can be considered. 369 The BGP receiver may choose to only trust the congestion status 370 communities generated by some specific ASes or containing 371 bandwidth greater than a specific value. 373 You can filter the congestion status communities at the border of 374 your trust/administrative domain. Hence all the ones you receive 375 are trusted. 377 You can record the communities received over time, monitor the 378 congestion e.g. via probing, detect inconsistency and choose to 379 not trust anymore the ASes which advertise fake news. 381 9. IANA Considerations 383 For solution alternative 1, one sub-type is solicited to be assigned 384 from Transitive Four-Octet AS-Specific Extended Community Sub-Types 385 registry to indicate the Congestion Status Community defined in this 386 document. 388 For solution alternative 3, one community value is solicited to be 389 assigned from the registry "Registered Type 1 BGP Wide Community 390 Community Types" to indicate the Congestion Status Community defined 391 in this document. 393 10. Acknowledgments 395 We appreciate the constructive suggestions received from Bruno 396 Decraene. Many thanks to Rudiger Volk, Susan Hares, John Scudder, 397 Randy Bush for their review and comments to improve this document. 399 11. References 401 11.1. Normative References 403 [I-D.ietf-idr-wide-bgp-communities] 404 Raszuk, R., Haas, J., Lange, A., Decraene, B., Amante, S., 405 and P. Jakma, "BGP Community Container Attribute", draft- 406 ietf-idr-wide-bgp-communities-04 (work in progress), March 407 2017. 409 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 410 Requirement Levels", BCP 14, RFC 2119, 411 DOI 10.17487/RFC2119, March 1997, 412 . 414 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 415 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 416 DOI 10.17487/RFC4271, January 2006, 417 . 419 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 420 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 421 February 2006, . 423 [RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP 424 Extended Communities", RFC 7153, DOI 10.17487/RFC7153, 425 March 2014, . 427 [RFC8092] Heitz, J., Ed., Snijders, J., Ed., Patel, K., Bagdonas, 428 I., and N. Hilliard, "BGP Large Communities Attribute", 429 RFC 8092, DOI 10.17487/RFC8092, February 2017, 430 . 432 11.2. Informative References 434 [constrained-multiple-path] 435 Boucadair, M. and C. Jacquenet, "Constrained Multiple BGP 436 Paths", October 2010, . 439 [I-D.gredler-idr-bgplu-epe] 440 Gredler, H., Vairavakkalai, K., R, C., Rajagopalan, B., 441 Aries, E., and L. Fang, "Egress Peer Engineering using 442 BGP-LU", draft-gredler-idr-bgplu-epe-11 (work in 443 progress), October 2017. 445 [link-bandwidth-community] 446 Mohapatra, P. and R. Fernando, "BGP Link Bandwidth 447 Extended Community", January 2013, 448 . 451 [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities 452 Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, 453 . 455 [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route 456 Flap Damping", RFC 2439, DOI 10.17487/RFC2439, November 457 1998, . 459 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 460 Reflection: An Alternative to Full Mesh Internal BGP 461 (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, 462 . 464 [RFC6793] Vohra, Q. and E. Chen, "BGP Support for Four-Octet 465 Autonomous System (AS) Number Space", RFC 6793, 466 DOI 10.17487/RFC6793, December 2012, 467 . 469 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 470 "Advertisement of Multiple Paths in BGP", RFC 7911, 471 DOI 10.17487/RFC7911, July 2016, 472 . 474 Authors' Addresses 476 Zhenqiang Li 477 China Mobile 478 No.32 Xuanwumenxi Ave., Xicheng District 479 Beijing 100032 480 P.R. China 482 Email: li_zhenqiang@hotmail.com 484 Jie Dong 485 Huawei Technologies 486 Huawei Campus, No.156 Beiqing Rd. 487 Beijing 100095 488 P.R. China 490 Email: jie.dong@huawei.com