idnits 2.17.1 draft-bgp-path-marking-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 8 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 11, 2013) is 3942 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 480, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) == Outdated reference: A later version (-17) exists of draft-ietf-grow-bmp-07 == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-08 Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Camilo Cardona 3 Internet-Draft P. Pierre Francois 4 Intended status: Standards Track IMDEA Networks 5 Expires: January 12, 2014 S. Ray 6 K. Patel 7 P. Paolo Lucente 8 Cisco Systems 9 P. Mohapatra 10 Cumulus Networks 11 July 11, 2013 13 BGP Path Marking 14 draft-bgp-path-marking-00 16 Abstract 18 The potential advertisement of non-best paths by a BGP speaker 19 supporting the add-path or the best-external extensions makes it 20 difficult for other BGP speakers to identify the paths that have been 21 selected as best by those who advertise them. This information is 22 required for proper operation of some applications. Towards that 23 end, this document proposes marking the paths using extended 24 communities that encode the path type. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 12, 2014. 43 Copyright Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 This document may contain material from IETF Documents or IETF 59 Contributions published or made publicly available before November 60 10, 2008. The person(s) controlling the copyright in some of this 61 material may not have granted the IETF Trust the right to allow 62 modifications of such material outside the IETF Standards Process. 63 Without obtaining an adequate license from the person(s) controlling 64 the copyright in such materials, this document may not be modified 65 outside the IETF Standards Process, and derivative works of it may 66 not be created outside the IETF Standards Process, except to format 67 it for publication as an RFC or to translate it into languages other 68 than English. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 73 2. The BGP Path Type Community . . . . . . . . . . . . . . . . . 4 74 3. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 75 4. Operational Considerations . . . . . . . . . . . . . . . . . 6 76 5. Applications . . . . . . . . . . . . . . . . . . . . . . . . 7 77 5.1. Avoiding suboptimal routing in Inter-AS VPN . . . . . . . 7 78 5.2. Monitoring applications . . . . . . . . . . . . . . . . . 9 79 5.3. SDN applications . . . . . . . . . . . . . . . . . . . . 9 80 5.4. Selective Best-path . . . . . . . . . . . . . . . . . . . 10 81 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 82 7. Security Considerations . . . . . . . . . . . . . . . . . . . 10 83 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 10 84 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 10 85 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 86 10.1. Normative References . . . . . . . . . . . . . . . . . . 11 87 10.2. Informative References . . . . . . . . . . . . . . . . . 11 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 90 1. Introduction 92 When there are multiple paths for a given address prefix, BGP chooses 93 one of the paths as the "best-path" according to the best-path 94 selection rules prescribed in [RFC4271] and installs the best-path in 95 its forwarding table. Classically, each BGP speaker advertises only 96 the best-path to its peers. So when a BGP speaker receives a path 97 from one of its peers, it is assured that the path is used by the 98 peer for forwarding and all other peers have received the same path 99 from this peer. This leads to consistent routing in a BGP network. 101 The classical advertisement rule of sending only the best-path does 102 not convey the full routing state of a destination present on a BGP 103 speaker to its peers. 105 o In order to improve link bandwidth utilization, most BGP 106 implementations choose additional paths, that satisfy certain 107 conditions, as "multi-path", and install them in the forwarding 108 table. Incoming packets for that destination are load-balanced 109 across the best-path and the multi-path(s). I.e., there may be 110 paths installed in the forwarding table that are not advertised to 111 the peers. 113 o When an Autonomous System (AS) deploys a route-reflector 114 ([RFC4456]) instead of using full IBGP mesh, the BGP speakers 115 receive only the route reflector's best-path and therefore lose 116 information about the best-paths of other IBGP peers. 118 o If an IBGP path is chosen as the best-path by a non-route- 119 reflector BGP speaker, then the best-path is not sent to its IBGP 120 peers. Thus the IBGP peers learn nothing from this BGP speaker 121 even though it might have other EBGP paths for that destination. 123 o Even when a BGP speaker selects an EBGP path as the best-path and 124 advertises it to its peers, it may have additional EBGP paths for 125 the destination. Should those paths be advertised a priori, they 126 could be used by the peers in the event of loss of reachability of 127 the best-path resulting in faster convergence. 129 There are extensions to the classical BGP advertisement rule to 130 provide additional information about the routing state of a 131 destination. A BGP speaker supporting the best-external 132 [I-D.ietf-idr-best-external] extension sends its best external path 133 to its IBGP peers when the best-path is an IBGP path. A BGP speaker 134 supporting the add-path [I-D.ietf-idr-add-paths] extension advertises 135 multiple paths for a given address prefix. 137 With best-external or add-path extensions in use, when a BGP speaker 138 receives a path from a peer, that path may not be the best-path, or 139 it may not be installed in the peer's forwarding table. In some 140 scenarios, knowledge of the path type - i.e., whether the path is the 141 best-path, or whether the path is installed in the forwarding table - 142 is essential. 144 For instance, in a typical dual-homed VPN in primary-backup 145 configuration, the backup path is created by advertising the best- 146 external path from the backup PE with worse LOCAL_PREF. However, 147 when the customer adds a site in another AS, the LOCAL_PREF 148 information does not reach that site. As a result, data traffic 149 coming from that site may incorrectly be forwarded over the backup 150 link instead of the primary link. 152 Similarly when an add-path enabled peer receives multiple paths from 153 a peer, it does not know which one among those paths is the best-path 154 and which ones are installed in the forwarding table. An exogenous 155 monitoring system, e.g., would require that information to properly 156 tweak the policies on the router to effect desired forwarding 157 optimization. 159 This draft proposes marking the advertised paths by an extended 160 community, called Path Type community, that encodes the path type. 161 The path type provides the necessary information to the BGP speakers 162 about how the path is used by the sender when add-path or best- 163 external extensions are in use. 165 2. The BGP Path Type Community 167 The BGP Path Type Community is an IPv4 Address Extended Community 168 ([RFC4360]) defined as follows: 170 Type Field: 172 The value of the high-order octet of the extended Type Field is 173 0x01, which indicates that it is transitive. The value of low- 174 order octet of the extended type field for this community is TBD. 176 Value Field: 178 The Value field contains two sub-fields, described below: 180 +---------------------+ 181 | Router-ID (4 octet) | 182 +---------------------+ 183 | Path type (2 octet) | 184 +---------------------+ 186 The Router-ID field contains the BGP identifier of the BGP speaker 187 that adds the Path Type community to a path. 189 The Path type field contains a bitfield where each bit encodes a 190 specific role of the path. Multiple bits may be set when a path is 191 used in multiple roles. 193 +--------+--------------------+ 194 | Value | Path type | 195 +--------+--------------------+ 196 | 0x0000 | Unknown | 197 | 0x0001 | Best-path | 198 | 0x0002 | Best-external path | 199 | 0x0004 | Multi-path | 200 | 0x0008 | Backup path | 201 | 0x0010 | Uninstalled path | 202 | 0x0020 | Unreachable path | 203 +--------+--------------------+ 205 Table 1: Path Type Values 207 The best-path is defined in [RFC4271] and the best-external path is 208 defined in [I-D.ietf-idr-best-external]. 210 A multi-path is not the best-path but installed in the forwarding 211 table and used for forwarding packets. We use the convention that 212 the best-path is not considered a multi-path. 214 A backup path is installed in the forwarding table, but it is not 215 used for forwarding until all multipath(s) and the best-path become 216 unreachable. Backup paths are used for fast convergence in the event 217 of failures. 219 All other reachable paths are marked as 'Uninstalled'. 221 Lastly, all paths that are considered unreachable are marked as 222 'Unreachable'. Unreachable paths may be sent only in special cases 223 (such as to a monitoring application). 225 3. Rules 227 o A BGP speaker MAY add the Path Type community to an originated 228 path. 230 o When a BGP speaker receives a path from a peer and propagates it 231 without changing the NEXT_HOP to self: 233 * If the path contained a Path Type community, it MUST be 234 retained in the propagated path. 236 * If the path did not contain a Path Type community, the speaker 237 MAY add a Path Type community with 'Unknown' value. 239 o When a path received from a peer is propagated after changing the 240 NEXT_HOP to self: 242 * If the path did not contain a Path Type community, the Path 243 Type community indicating the path role MAY be added. 245 * If the path contained a Path Type community: 247 + If data traffic entering the router for the given 248 destination may be forwarded over other paths (e.g., for 249 doing load balancing), then the existing Path Type community 250 MUST be removed. The BGP speaker MAY add its own Path Type 251 community. 253 + If data traffic entering the router for the given 254 destination is forwarded only along the given path, then the 255 existing Path Type community MAY be retained. 257 In all cases, when a BGP speaker adds its own Path Type community, it 258 sets its own router-id in the community. Note that BGP router-id 259 need not be unique across ASes. 261 The above rule-set prevents a route reflector from modifying the Path 262 Type community set by its client (unless the route reflector is 263 changing the NEXT_HOP to self). 265 When a peer is capable of sending only one path for a given address 266 prefix and it sends the path without any Path Type community, the 267 path MAY be considered as the best-path of the peer. In all other 268 cases, a path without any Path Type community SHOULD be considered to 269 have an 'Unknown' Path type. 271 A local policy might modify the above rules. For instance, if a 272 monitoring application peers with a BGP speaker with add-path 273 capability for the sole purpose of learning its paths and their 274 types, then the speaker may always add its own Path Type community 275 when it advertises the paths to that peer even if it does not change 276 the NEXT_HOP to self. Such overriding policies should be used with 277 caution if the advertised paths may impact forwarding decisions in 278 the network. 280 4. Operational Considerations 282 If a speaker receives a path with a Path Type community with an 283 invalid combination of bits (e.g., both 'Multi-path' and 'Backup' 284 bits are set), the path MUST NOT be considered invalid. Such error 285 cases SHOULD be logged through other means. 287 An implementation SHOULD provide a configurable option for the user 288 to indicate whether a path should be readvertised when its type is 289 changed. If the user does not configure the option, the BGP speaker 290 MUST NOT readvertise a path just to update its Path Type community 291 (e.g., when a path type changes from 'Multi-path' to 'Uninstalled' 292 due to a change in IGP metric). 294 An implementation SHOULD provide a configurable option for removing 295 Path Type communities from paths that are advertised to untrusted 296 peers. 298 An implementation SHOULD mark all paths for a given address prefix 299 consistently. If one of the paths is marked, then all other paths 300 SHOULD be marked. 302 An implementation MAY modify its best-path selection algorithm to 303 take path type information into account. For instance, paths with 304 type 'Best-path' MAY be preferred over paths of other types. 305 Similarly, paths of type 'Best-external' MAY be considered ineligible 306 for being a multipath. 308 5. Applications 310 In this section, we illustrate some applications that benefit from 311 the Path Type community proposed in this draft. 313 5.1. Avoiding suboptimal routing in Inter-AS VPN 315 (RD1)A/B +---+ +---+ 316 LP=200 |RR1| |RR2| 317 +---+ ,,-+---+-.. _.-+---+-._ 318 ,|PE1|' `. / \ (RD3)A/B 319 +---+,' +---+ +---+ +---+ +---+ -> PE1 (LP=100) 320 A/B |CE1|. | AS1 |AR1|---|AR2| AS2 |PE3| -> PE2 (LP=100) 321 +---+ \ +---+ +---+ +---+ +---+ 322 >|PE2|._ _,' `. ,' 323 +---+ `-....,-' `--...--' 324 (RD2)A/B 325 LP=150 327 Figure 1: Inter-AS VPN scenario 329 Figure 1 depicts an L3VPN network that spans two ASes: AS1 and AS2. 330 The ASes may be connected using either Option-B or Option-C 331 techniques [RFC4364]. A customer site with equipment CE1 is dual- 332 homed in AS1, connected to PE1 and PE2. For prefix A/B, the customer 333 prefers to use the link between CE1 and PE1. This routing preference 334 is expressed by setting the LOCAL_PREF of the prefix advertised by 335 PE1 to a higher value than that of the prefix advertised by PE2. 336 This causes PE2 to use PE1's route as the best-path and its own EBGP 337 path becomes the best-external path. PE2 is configured to advertise 338 its best-external path. Therefore, both PEs continue to advertise 339 their own EBGP path. The provider uses unique route-distinguishers 340 for its VPNs. So PE1 and PE2 advertises different VPN prefixes: 341 (RD1)A/B and (RD2)A/B. Both these prefixes are advertised to PE3 in 342 AS2. PE3 imports both paths to its own VPN with route-distinguisher 343 RD3. 345 Existing behavior: 347 Since LOCAL_PREF is not sent across AS boundary, both paths on 348 PE3 have the default LOCAL_PREF of 100. As a result the best- 349 path selection on PE3 may boil down to tie breaking steps and 350 the path towards PE2, which is the best-external path, may be 351 chosen. Alternately, the path from PE2 may be chosen as the 352 multipath and may be used for load-balancing. Therefore, some 353 or all data traffic entering PE3 would reach CE1 via PE2, which 354 is not what the customer desired. 356 Behavior with Path Type Community: 358 When PE2 advertises its path, it adds the best-external Path 359 Type community. This community is preserved across AS 360 boundary. If option C is used, then RR1 or RR2 does not change 361 the NEXT_HOP and hence the community is preserved according to 362 the rule-set (Section 3). If option B is used, then the 363 community reaches AR1 since RR1 does not change the NEXT_HOP. 364 At AR1, (RD2)A/B has only one path and forwarding traffic 365 entering AR1 from AR2 for this destination (determined by the 366 outer label) would use this path. Therefore, AR1 retains the 367 Path Type community set by PE2. The same applies to AR2. So 368 at PE3, the path to PE2 has the best-external Path Type 369 community and therefore PE3 can choose to not use this path for 370 forwarding. 372 If the best-path algorithm takes the Path Type community values into 373 account, it eliminates the need for setting LOCAL_PREF to deprefer 374 the bext-external path even within a single AS. This simplifies the 375 network design and management. 377 Instead of using Path Type communities, it is possible to use 378 policies on the border routers (AR1 and AR2 for option B, or RR1 and 379 RR2 for option C) to recreate the LOCAL_PREF in AS2 (e.g., by 380 matching on the RD and the prefix). However, the recreated 381 LOCAL_PREF may interfere with the local policies set in AS2 (e.g., if 382 there are other paths in AS2 for A/B that the customer wants to use 383 as secondary paths). In addition, such policies are error-prone and 384 complex to manage, especially when the customer is allowed to change 385 the primary/backup relationships between PE1 and PE2 on its own. The 386 standardized mechanism of Path Type community is free from such 387 drawbacks. 389 5.2. Monitoring applications 391 A modern Service Provider (SP) network may contain thousands of BGP 392 routers. For planning, proper engineering and operation of a 393 backbone, it is a good practice to continuously monitor the routers' 394 states and perhaps keep a history. Many Network Management Systems 395 (NMS) establish IBGP sessions with BGP speakers to collect the paths 396 the speaker has. When the speaker supports add-path (or best- 397 external), the NMS receives non-best-paths. There are also 398 monitoring protocols such as BMP [I-D.ietf-grow-bmp] that similarly 399 receives all paths from a speaker. 401 When an NMS receives multiple paths for a destination, it is 402 important for its operation to know which path is the best-path, 403 which paths are installed in forwarding table, which path is used as 404 a backup, etc. The NMS system may run the best-path algorithm on 405 those paths on its own. However, its information, especially on IGP 406 metric, local policies, etc., may be incomplete and hence its own 407 calculations may not match that of the router's. It is also noted 408 that even if the NMS system collected additional information to run 409 the best-path algorithm from the point-of-view of the router, it 410 would have to do so for every router in the network, which would 411 impose a very high computational burden on the NMS. 413 When Path Type community is in use, the router provides the required 414 information directly, thus avoiding computational load on the NMS as 415 well as potential discrepancies between the point-of-view of the 416 router and that of the NMS. 418 5.3. SDN applications 420 Similar to the monitoring applications, a "Software Defined 421 Networking" application monitors the routing state and based on it, 422 may change the policies on the router, or inject additional paths, to 423 influence the forwarding. When a BGP speaker supports Path Type 424 communities and add-path, an SDN application can simply peer with the 425 router to receive its routing state in real-time even if the router 426 does not provide vendor-specific APIs for doing the same. 428 5.4. Selective Best-path 430 When the classical BGP advertisement rule is followed, all paths a 431 BGP speaker considers for best-path are already installed in the 432 forwarding table of the peer. However, when add-path, or best- 433 external extensions are used, that no longer holds. If the BGP 434 speakers support the Path Type communities, then the classical 435 behavior can be reinstated by considering only those paths in the 436 best-path algorithm that are marked as best-path or multi-path. 437 Detailed discussions on the rules and benefits of such an approach 438 are outside the scope of this draft. 440 6. IANA Considerations 442 Section 2 defines an IPv4 Address specific transitive extended 443 community called the Path Type extended community. IANA is requested 444 to assign a sub-type value for the Path Type extended community. The 445 last 2 bytes of the value field of the Path Type extended community 446 contains a bitfield that encodes the type of the advertised path. 447 IANA is expected to maintain a registry for these bits. Section 2 448 defines 6 of those bits. The rest of the bits are to be assigned by 449 IANA using the "IETF Consensus" policy defined in [RFC2434]. 451 7. Security Considerations 453 This document introduces no new security concerns to BGP or other 454 specifications referenced in this document. 456 8. Contributors 458 Adam Simpson 459 Alcatel-Lucent 460 600 March Road 461 Ottawa, Ontario K2K 2E6 462 Canada 463 Email: adam.simpson@alcatel-lucent.com 465 Roberto Fragassi 466 Alcatel-Lucent 467 600 Mountain Avenue 468 Murray Hill, New Jersey 469 USA 470 Email: roberto.fragassi@alcatel-lucent.com 472 9. Acknowledgments 474 We would like to thank Bruno Decraene for his feedback on this work. 476 10. References 478 10.1. Normative References 480 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 481 Requirement Levels", BCP 14, RFC 2119, March 1997. 483 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 484 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 485 October 1998. 487 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 488 Networks (VPNs)", RFC 4364, February 2006. 490 10.2. Informative References 492 [I-D.ietf-grow-bmp] 493 Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring 494 Protocol", draft-ietf-grow-bmp-07 (work in progress), 495 October 2012. 497 [I-D.ietf-idr-add-paths] 498 Walton, D., Retana, A., Chen, E., and J. Scudder, 499 "Advertisement of Multiple Paths in BGP", draft-ietf-idr- 500 add-paths-08 (work in progress), December 2012. 502 [I-D.ietf-idr-best-external] 503 Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. 504 Gredler, "Advertisement of the best external route in 505 BGP", draft-ietf-idr-best-external-05 (work in progress), 506 January 2012. 508 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 509 Protocol 4 (BGP-4)", RFC 4271, January 2006. 511 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 512 Communities Attribute", RFC 4360, February 2006. 514 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 515 Reflection: An Alternative to Full Mesh Internal BGP 516 (IBGP)", RFC 4456, April 2006. 518 Authors' Addresses 519 Camilo Cardona 520 IMDEA Networks 521 Avenida del Mar Mediterraneo 522 Leganes 28919 523 Spain 525 Email: juancamilo.cardona@imdea.org 527 Pierre Francois 528 IMDEA Networks 529 Avenida del Mar Mediterraneo 530 Leganes 28919 531 Spain 533 Email: pierre.francois@imdea.org 535 Saikat Ray 536 Cisco Systems 537 170 W. Tasman Drive 538 San Jose, CA 95134 539 USA 541 Email: sairay@cisco.com 543 Keyur Patel 544 Cisco Systems 545 170 W. Tasman Drive 546 San Jose, CA 95134 547 USA 549 Email: keyupate@cisco.com 551 Paolo Lucente 552 Cisco Systems 553 170 W. Tasman Drive 554 San Jose, CA 95134 555 USA 557 Email: plucente@cisco.com 558 Pradosh Mohapatra 559 Cumulus Networks 560 140 C. Whisman Rd. 561 Mountain View, CA 94041 562 USA 564 Email: pmohapat@cumulusnetworks.com