idnits 2.17.1 draft-zzhang-bess-bgp-multicast-controller-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A Tunnel Encapsulation Attribute includes Tunnel TLVs and a router receiving the TEA (associated with a route) selects one of the Tunnel TLVs to set up forwarding state - a packet is sent out of only one of the tunnels. To specify that traffic needs to be sent out of multiple tunnels, a Composite Tunnel TLV is used. The value part of the TLV includes a list of sub-TLVs, each being a Tunnel TLV. Obviously, a Composite Tunnel TLV MUST not be a sub-TLV of a Composite Tunnel TLV. -- The document date (September 21, 2017) is 2401 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5331' is mentioned on line 238, but not defined == Unused Reference: 'RFC2119' is defined on line 445, but no explicit reference was found in the text == Unused Reference: 'RFC6513' is defined on line 463, but no explicit reference was found in the text == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-07 == Outdated reference: A later version (-11) exists of draft-ietf-idr-wide-bgp-communities-04 == Outdated reference: A later version (-03) exists of draft-zzhang-bess-bgp-multicast-01 == Outdated reference: A later version (-15) exists of draft-ietf-spring-segment-routing-12 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Z. Zhang 3 Internet-Draft Juniper Networks 4 Intended status: Standards Track R. Raszuk 5 Expires: March 25, 2018 Bloomberg LP 6 D. Pacella 7 Verizon 8 A. Gulko 9 Thomson Reuters 10 September 21, 2017 12 Controller Based BGP Multicast Signaling 13 draft-zzhang-bess-bgp-multicast-controller-00 15 Abstract 17 This document specifies a way that one or more centralized 18 controllers can use BGP to set up a multicast distribution tree in a 19 network. In the case of labeled tree, the labels are assigned by the 20 controllers either from the controllers' local label spaces, or from 21 a common Segment Routing Global Block (SRGB), or from each routers 22 Segment Routing Local Block (SRLB) that the controllers learn. In 23 case of labeled unidirectional tree and label allocation from the 24 common SRGB or from the controllers' local spaces, a single common 25 label can be used for all routers on the tree to send and receive 26 traffic with. Since the controllers caculate the trees, they can use 27 sophisticated algorithms and constraints to achieve traffic 28 engineering. 30 Requirements Language 32 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 33 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 34 document are to be interpreted as described in RFC2119. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at https://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on March 25, 2018. 53 Copyright Notice 55 Copyright (c) 2017 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (https://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 2 71 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 3 72 1.2. Resilience . . . . . . . . . . . . . . . . . . . . . . . 3 73 1.3. Signaling . . . . . . . . . . . . . . . . . . . . . . . . 4 74 1.4. Label Allocation . . . . . . . . . . . . . . . . . . . . 5 75 1.4.1. Using a Common per-tree Label for All Routers . . . . 6 76 1.4.2. Upstream-assignment from Controller's Local Label 77 Space . . . . . . . . . . . . . . . . . . . . . . . . 7 78 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 8 79 2.1. Additional Tunnel Type for TEA . . . . . . . . . . . . . 8 80 2.2. RPF Label Stack Sub-TLV . . . . . . . . . . . . . . . . . 9 81 2.3. Context Label Wide Community . . . . . . . . . . . . . . 9 82 2.4. Procedures . . . . . . . . . . . . . . . . . . . . . . . 9 83 3. Security Considerations . . . . . . . . . . . . . . . . . . . 9 84 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 85 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 86 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 87 6.1. Normative References . . . . . . . . . . . . . . . . . . 10 88 6.2. Informative References . . . . . . . . . . . . . . . . . 10 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 91 1. Overview 92 1.1. Introduction 94 [I-D.zzhang-bess-bgp-multicast] describes a way to use BGP as a 95 replacement signaling for PIM [RFC7761] or mLDP [RFC6388]. The BGP- 96 based multicast signaling described there provides a mechanism for 97 setting up both (s,g)/(*,g) multicast trees (as PIM does, but 98 optionally with labels) and labeled (MPLS) multicast tunnels (as mLDP 99 does). Each router on a tree performs essentially the same 100 procedures as it would perform if using PIM or mLDP, but all the 101 inter-router signaling is done using BGP. 103 These procedures allow the routers to set up a separate tree for each 104 individual multicast (x,g) flow where the 'x' could be either 's' or 105 '*', but they also allow the routers to set up trees that are used 106 for more than one flow. In the latter case, the trees are often 107 referred to as "multicast tunnels" or "multipoint tunnels", and 108 specifically in this document they are mLDP tunnels (except that they 109 are set up with BGP signaling). While it actually does not have to 110 be restricted to mLDP tunnels, mLDP FEC is conveniently borrowed to 111 identify the tunnel. In the rest of the document, the term tree and 112 tunnel are used interchangeably. 114 The trees/tunnels are set up using the "receiver-initiated join" 115 technique of PIM/mLDP, hop by hop from downstream routers towards the 116 root. The BGP messages are either sent hop by hop between downstream 117 routers and their upstream neighbors, or can be reflected by Route 118 Reflectors (RRs). 120 As an alternative to each hop independently determining its upstream 121 router and signaling upstream towards the root (following PIM/mLDP 122 model), the entire tree can be calculated by a centralized 123 controller, and the signaling can be entirely done from the 124 controller, using the same BGP messages as defined in 125 [I-D.zzhang-bess-bgp-multicast]. For that, some additional 126 procedures and optimizations are specified in this document. 128 While it is outside the scope of this document, signaling from the 129 controllers could be done via other means as well, like Netconf or 130 any other SDN methods. 132 1.2. Resilience 134 Each router could establish direct BGP sessions with one or more 135 controllers, or it could establish BGP sessions with RRs who in turn 136 peer with controllers. For the same tree/tunnel, each controller may 137 independentantly calculate the tree/tunnel and signal the routers on 138 the tree/tunnel using CMCAST S-PMSI/Leaf A-D routes 139 [I-D.zzhang-bess-bgp-multicast]. How the tree/tunnel roots/leaves 140 are discovered and how the calculation is done are outside the scope 141 of this document. 143 On each router, BGP route selection rules will lead to one 144 controller's route for the tree/tunnel being selected as the active 145 route and used for setting up forwarding state. As long as all the 146 routers on a tree/tunnel consistently pick the same controller's 147 routes for the tree/tunnel, the setup should be consistent. If the 148 tree/tunnel is labeled, different labels will be used from different 149 controllers so there is no traffic loop issue even if the routers do 150 not consistently select the same controlle's routes. In the 151 unlabeled case, to ensure the consistency the selection SHOULD be 152 solely based on the identifier of the controller, which could be 153 carried in an Address Specific Extended Community (EC). 155 Another consistency issue is when a bidirectional tree/tunnel needs 156 to be re-routed. Because this is no longer triggered hop-by-hop from 157 downstream to upstream, it is possible that the upstream change 158 happens before the downstream, causing traffic loop. In the 159 unlabeled case, there is no good solution (other than that the 160 controller issues upstream change only after it gets acknowledgement 161 from downstream). In the labeled case, as long as a new label is 162 used there should be no problem. 164 Besides the traffic loop issue, there could be transient traffic loss 165 before both the upstream and downstream's forwarding state are 166 updated. This could be mitigated if the upstream keep sending 167 traffic on the old path (in addition to the new path) and the 168 downstream keep accepting traffic on the old path (but not on the new 169 path) for some time. It is a local matter when for the downstream to 170 switch to the new path - it could be data driven (e.g., after traffic 171 arrives on the new path) or timer driven. 173 For each tree, multiple disjoint instances could be calculated and 174 signaled for live-live protection. Different labels are used for 175 different instances, so that the leaves can differentiate incoming 176 traffic on different instances. As far as tranist routers are 177 concerned, the insances are just independent. Note that the two 178 instances are not expected to share common transit routers (it is 179 otherwise outside the scope of this document/revision). 181 1.3. Signaling 183 Each router only receives S-PMSI/Leaf A-D routes from the controllers 184 but does not originate or re-advertise those routes. The re- 185 advertisement of a received route can be blocked based on the fact 186 that a configured import RT matches the RT of the route, which 187 indicates that this router is the target and consumer of the route 188 hence it should not be re-advertised further. The routes includes 189 the outgoing forwarding information in the form of Tunnel 190 Encapsulation Attributes (TEA), with optional enhancements specified 191 in this document. The router infers the incoming forwarding 192 information from the Upstream Router's IP Address field in the NLRI 193 in case of an unlabeled tree. 195 Suppose that for a particular tree, there are two downstream routers 196 D1 and D2 for a particular upstream router U. A controller C may 197 send two Leaf A-D routes to U, as if the two routes were originated 198 by D1 and D2 but reflected by the controller. As an alternative in 199 case of a labeled tree, C could just send one route to U, with a 200 Composite Tunnel in TEA (in this case, the Originating Router's 201 Address field of the Leaf A-D route is set to the controller's 202 address) and the Composite Tunnel specifies both downstreams. The 203 tunnel in a TEA or Composite Tunnel is of type "MPLS Encapsulation" 204 with a Label Stack Sub-TLV to encode label information. 206 For comparison, the existing TEA as specified in 207 [I-D.ietf-idr-tunnel-encaps] can include multiple tunnels, but only 208 one of those is used, while with a Composite Tunnel, traffic is sent 209 out of all the enclosed tunnels to reach multiple endpoints. 211 Note that, in case of labeled trees, the (x,g) or mLDP FEC signaling 212 is actually not needed to transit routers but only needed on tunnel 213 root/leaves. However, for consistency, the same signaling is used to 214 all routers. 216 1.4. Label Allocation 218 In the case of labeled multicast signaled hop by hop towards the 219 root, whether it's (x,g) multicast or "mLDP" tunnel, labels are 220 assigned by a downstream router and advertised to its upstream router 221 (from traffic direction point of view). In the case of controller 222 based signaling, routers do not originate tree join (S-PMSI/Leaf A-D) 223 routes anymore, so the controllers have to assign labels on behalf of 224 routers, and there are three options for label assignment: 226 o From each router's SRLB that the controller learns 228 o From the common SRGB that the controller learns 230 o From the controller's local label space 232 Assignment from each router's SRLB is no different from each router 233 assigning labels from its own local label space in the hop-by-hop 234 signaling case. The assignments for a router is independent of 235 assignments for another router, even for the same tree. 237 Assignment from the controller's local label space is upstream- 238 assigned [RFC5331]. It is used if the controller does not learn the 239 common SRGB or each router's SRLB. Assignment from the SRGB 240 [I-D.ietf-spring-segment-routing] is only meaningful if all SRGBs are 241 the same and a single common label is used for all the routers on a 242 tree in case of unidirectional tree/tunnel (Section 1.4.1). 243 Otherwise, assignment from SRLB is preferred. 245 The choice of which of the options to use depends on many factors. 246 An operator may want to use a single common label per tree for ease 247 of monitoring and debugging, but that requires explicit RPF checking 248 and either SRGB or upstream assigned labels, which may not be 249 supported due to either the software or hardware limitations (e.g. 250 label imposition/disposition limits). In an SR network, assignment 251 from the common SRGB if it's required to use a single common label 252 per unidirectional tree, or otherwise assignment from SRLB is a good 253 choice because it does not require support for context label spaces. 255 1.4.1. Using a Common per-tree Label for All Routers 257 MPLS labels only have local significance. For an LSP that goes 258 through a series of routers, each router allocates a label 259 independently and it swaps the incoming label (that it advertised to 260 its upstream) to an outgoing label (that it received from its 261 downstream) when it forwards a labeled packet. Even if the incoming 262 and outgoing labels happen to be the same on a particular router, 263 that is just incidental. 265 With Segment Routing, it is becoming a common practice that all 266 routers use the same SRGB so that a SID maps to the same label on all 267 routers. This makes it easier for operators to monitor and debug 268 their network. The same concept applies to multicast trees as well - 269 a common per-tree label is used for a router to receive traffic from 270 its upstream neighbor and replicate traffic to all its downstream 271 neighbor. 273 However, a common per-tree label can only be used for unidirectional 274 trees. Additionally, it requires each router to do explicit RPF 275 check, so that only packets from its expected upstream neighbor are 276 accepted. Otherwise, traffic loop may form during topology changes, 277 because the forwarding state update is no longer ordered. 279 Traditionally, p2mp mpls forwarding does not require explicit RPF 280 check as a downstream router advertises a label only to its upstream 281 router and all traffic with that incoming label is presumed to be 282 from the upstream router and accepted. When a downtream router 283 switches to a different upstream router a different label will be 284 advertised, so it can determine if traffic is from its expected 285 upstream neighbor purely based on the label. Now with a single 286 common label used for all routers on a tree to send and receive 287 traffic with, a router can no longer determine if the traffic is from 288 its expected neighbor just based on that common tree label. 289 Therefore, explicit RPF check is needed. Instead of interface based 290 RPF checking as in PIM case, neighbor based RPF checking is used - a 291 label identifying the upstream neighbor preceeds the tree label and 292 the receiving router checks if that preceeding neighbor label matches 293 its expected upstream neighbor. Notice that this is similar to 294 what's described in Section "9.1.1 Discarding Packets from Wrong PE" 295 of RFC 6513 (an egress PE discards traffic sent from a wrong ingress 296 PE). The only difference is one is used for label based forwarding 297 and the other is used for (s,g) based forwarding. [note: for 298 bidirectional trees, we may be able to use two labels per tree - one 299 for upstream traffic and one for downstream traffic. This needs 300 further verification]. 302 Both the common per-tree label and the neighbor label are allocated 303 either from the common SRGB or from the controller's local label 304 space. In the latter case, an additional label identifying the 305 controller's label space is needed, as descrbibed in the following 306 section. 308 1.4.2. Upstream-assignment from Controller's Local Label Space 310 In this case in the multicast packet's label stack the tree label and 311 upstream neighbor label (if used in case of single common-label per 312 tree) are preceded by a downstream-assigned "context label". The 313 context label identifies a context-specific label space (the 314 controller's local label space), and the upstream-assigned label that 315 follows it is looked up in that space. 317 This specification requires that, in case of upstream-assignment from 318 a controller's local label space, each router D to assign, 319 corresponding to each controller C, a context label that identifies 320 the upstream-assigned label space used by that controller. This 321 label, call it Lc-D, is communicated by D to C. 323 Suppose a controller is setting up unidirectional tree T. It assigns 324 that tree the label Lt, and assigns label Lu to identify router U 325 which is the upstream of router D on tree T. C needs to tell U: "to 326 send a packet on the given tree/tunnel, one of the things you have to 327 do is push Lt onto the packet's label stack, then push Lu, then push 328 Lc-D onto the packet's label stack, then unicast the packet to D. 329 Controller C also needs to inform router D of the correspondence 330 between and tree T. 332 To achieve that, when C sends an S-PMSI/Leaf A-D route, for each 333 tunnel in the TEA or in the Composite Tunnel TLV, it includes a label 334 stack Sub-TLV [I-D.ietf-idr-tunnel-encaps], with the outer label 335 being the context label Lc-D (received by the controller from the 336 corresponding downstream), the next label being the upstream neighbor 337 label Lu, and the inner label being the label Lt assigned by the 338 controller for the tree. The router receiving the route will use the 339 label stacks to send traffic to its downstreams. 341 For C to sginal the expected label stack for D to receive traffic 342 with, we overload a tunnel TLV in either the TEA or the Composite 343 Tunnel in the Leaf A-D route sent to D - if the remote endpoint of 344 that tunnel TLV matches the Upstream Router field in the Leaf A-D 345 route, then it indicates that this is actually for receiving traffic 346 from the upstream. If a common tree label is used, then the TLV 347 contains a variant of the Label Stack Sub-TLV because the D needs to 348 treat the second inner most label as the upstream neighbor label and 349 set up forwarding state accordingly for explicit RPF check. This 350 variant is referred to as RPF Label Stack Sub-TLV (Section 2.2). 352 Note that the use of TEA to specify downstream and upstream 353 forwarding information also apply to label assignment from the common 354 SRGB or each router's SRLB, with the differences that the context 355 label is not needed in the SRGB/SRLB case, and that in SRLB case only 356 a Label Stack Sub-TLV with a single SRLB label is used for upstream 357 and downstream forwarding information (no RPF Label Stack Sub-TLV is 358 needed) in the SRLB case. 360 2. Specification 362 2.1. Additional Tunnel Type for TEA 364 This document specifies a Composite Tunnel TLV and a TEA Tunnel TLV. 365 The type codes will be assigned by IANA. 367 A Tunnel Encapsulation Attribute includes Tunnel TLVs and a router 368 receiving the TEA (associated with a route) selects one of the Tunnel 369 TLVs to set up forwarding state - a packet is sent out of only one of 370 the tunnels. To specify that traffic needs to be sent out of 371 multiple tunnels, a Composite Tunnel TLV is used. The value part of 372 the TLV includes a list of sub-TLVs, each being a Tunnel TLV. 373 Obviously, a Composite Tunnel TLV MUST not be a sub-TLV of a 374 Composite Tunnel TLV. 376 Consider that a Composite Tunnel TLV that includes a bunch of sub- 377 TLVs specifying a bunch of tunnels used to send traffic to a bunch of 378 endpoints. For a particular endpoint, there are multiple ways to 379 reach it - any one but only one should be used. For that purpose, a 380 TEA Tunnel TLV (for lack of a better name) is usded for that 381 endpoint. The TEA Tunnel TLV includes a bunch of sub-TLVs, each 382 being a Tunnel TLV that specifies one way to reach the same endpoint. 383 This is similar to a Tunnel Encapsulation Attribute, hence the name 384 TEA Tunnel TLV. 386 2.2. RPF Label Stack Sub-TLV 388 This is almost identifcal to Label Stack Sub-TLV. The only 389 difference is that the second inner most label in the stack 390 identifies the expected upstream neighbor and explicit RPF checking 391 needs to be set up for the tree label accordingly. 393 2.3. Context Label Wide Community 395 For a router to signal the context label that it assigns for a 396 controller (or any label allocator that assigns labels that will be 397 seen by this router), it attaches a Context Label Wide Community 398 [I-D.ietf-idr-wide-bgp-communities] to the host route for its own 399 address used in its BGP session towards the controllers (directly or 400 via RRs). This is a new wide community that specifies the (Label 401 Allocator, Context Label) tuple, and the exactly format will be 402 specified in a future revision. 404 2.4. Procedures 406 Details to be added. The general idea is described in the 407 introduction section. 409 3. Security Considerations 411 This document does not introduce new security risks? 413 4. IANA Considerations 415 To be added. 417 5. Acknowledgements 419 The authors Eric Rosen for his questions, suggestions, and help 420 finding solutions to some issues like the neighbor based explicit RPF 421 checcking. The authors also thank Lenny Giuliano and IJsbrand 422 Wijnands for their review and comments. 424 6. References 426 6.1. Normative References 428 [I-D.ietf-idr-tunnel-encaps] 429 Rosen, E., Patel, K., and G. Velde, "The BGP Tunnel 430 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-07 431 (work in progress), July 2017. 433 [I-D.ietf-idr-wide-bgp-communities] 434 Raszuk, R., Haas, J., Lange, A., Decraene, B., Amante, S., 435 and P. Jakma, "BGP Community Container Attribute", draft- 436 ietf-idr-wide-bgp-communities-04 (work in progress), March 437 2017. 439 [I-D.zzhang-bess-bgp-multicast] 440 Zhang, Z., Patel, K., Wijnands, I., and a. 441 arkadiy.gulko@thomsonreuters.com, "BGP Based Multicast", 442 draft-zzhang-bess-bgp-multicast-01 (work in progress), 443 March 2017. 445 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 446 Requirement Levels", BCP 14, RFC 2119, 447 DOI 10.17487/RFC2119, March 1997, 448 . 450 6.2. Informative References 452 [I-D.ietf-spring-segment-routing] 453 Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., 454 and R. Shakir, "Segment Routing Architecture", draft-ietf- 455 spring-segment-routing-12 (work in progress), June 2017. 457 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 458 Thomas, "Label Distribution Protocol Extensions for Point- 459 to-Multipoint and Multipoint-to-Multipoint Label Switched 460 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 461 . 463 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 464 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 465 2012, . 467 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 468 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 469 Multicast - Sparse Mode (PIM-SM): Protocol Specification 470 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 471 2016, . 473 Authors' Addresses 475 Zhaohui Zhang 476 Juniper Networks 478 EMail: zzhang@juniper.net 480 Robert Raszuk 481 Bloomberg LP 483 EMail: robert@raszuk.net 485 Dante Pacella 486 Verizon 488 EMail: dante.j.pacella@verizon.com 490 Arkadiy Gulko 491 Thomson Reuters 493 EMail: arkadiy.gulko@thomsonreuters.com