idnits 2.17.1 draft-coras-lisp-re-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 25, 2013) is 4049 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'IPLANE' is defined on line 445, but no explicit reference was found in the text == Outdated reference: A later version (-06) exists of draft-farinacci-lisp-mr-signaling-01 == Outdated reference: A later version (-12) exists of draft-farinacci-lisp-te-01 == Outdated reference: A later version (-22) exists of draft-ietf-lisp-lcaf-01 ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) ** Obsolete normative reference: RFC 6830 (Obsoleted by RFC 9300, RFC 9301) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Coras 3 Internet-Draft A. Cabellos-Aparicio 4 Intended status: Informational J. Domingo-Pascual 5 Expires: August 29, 2013 Technical University of 6 Catalonia 7 F. Maino 8 D. Farinacci 9 cisco Systems 10 February 25, 2013 12 LISP Replication Engineering 13 draft-coras-lisp-re-02 15 Abstract 17 This document describes a method to build and optimize inter-domain 18 LISP router distribution trees for locator-based unicast and 19 multicast replication of EID-based multicast packets. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on August 29, 2013. 38 Copyright Notice 40 Copyright (c) 2013 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 4 57 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 4. Overlay Distribution Tree . . . . . . . . . . . . . . . . . . 5 59 4.1. Overlay Management Considerations . . . . . . . . . . . . 6 60 5. Automated Computation of RTR Level . . . . . . . . . . . . . . 7 61 5.1. Algorithm for Computing Optimized Distribution Trees . . . 8 62 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 63 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 64 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 65 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 66 9.1. Normative References . . . . . . . . . . . . . . . . . . . 10 67 9.2. Informative References . . . . . . . . . . . . . . . . . . 10 68 Appendix A. MADDBST heuristic . . . . . . . . . . . . . . . . . . 11 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 71 1. Introduction 73 The Locator/Identifier Separation Protocol (LISP) [RFC6830] provides 74 the mechanisms for the separation of Location and Identity semantics 75 presently overloaded by IP addresses. The split results in the 76 creation of two namespaces that unambigously identify edge-site 77 network objects, Endpoint IDs (EIDs), and core routing objects, 78 Routing LOCators (RLOCs). Apart from aiding the scalablity of the 79 core routing infrastructure, the decoupling also enables the 80 (re)implementation of new or existing inter-domain routing 81 mechanisms. 83 One such mechanism is inter-domain IP source-specific multicast (SSM) 84 [RFC4607]. In this sense, [RFC6831] defines the procedures carried 85 out for delivering multicast packets from a source host in a LISP 86 site to receivers residing in the same domain or in other LISP or 87 non-LISP sites when an underlying multicast infrastructure exists. 88 The signaling protocol it specifies for conveying (S-EID,G) state and 89 building the distribution tree connecting the xTRs of the source and 90 receiver domains is PIM [RFC4601]. An alternative method that uses 91 Map-Requests instead of PIM for propagating (S-EID,G) state from 92 multicast receiver site ETRs to source site ITR is established in 93 [I-D.farinacci-lisp-mr-signaling]. 95 Although desirable to use multicast routing in the core network when 96 available, a mismatch between the multicast capabilities of receiver 97 ETRs and source ITR might impede their multicast interconnection. In 98 such a case, unicast RLOC encapsulation will be necessary to deliver 99 multicast packets directly to the ETRs. This however leads to high 100 ITR head-end replication for large sets of receiving ETRs. 101 Therefore, to reduce the replication load of the ITR and scale the 102 service with the number of multicast receivers, the ITR may choose to 103 offload replication to a set of RTRs. 105 The current document describes how multicast RTRs can be used to 106 build an inter-domain distribution tree rooted at the ITR that can 107 perform unicast and/or multicast encapsulated replication of 108 multicast packets. This concept, of distributing the replication 109 load from ITR to other RTRs downstream on the core overlay 110 distribution tree, is known as Replication Engineering or LISP-RE. 111 Since unicast replication in such overlays can be suboptimal when 112 compared to the underlay network, methods to optimize packet delivery 113 over the distribution tree are also presented. 115 This specification does not describe how (S-EID,G) state is built in 116 source and receiver domains, nor does it describe how such state is 117 propagated from receiver ETRs to source ITR. This specification 118 defines how (S-EID,G) map-cache state is built in the ITR, RTRs and 119 ETRs participating in the overlay distribution tree when they are not 120 connectable by multicast. 122 2. Definition of Terms 124 The terminology in this document is consistent with the definitions 125 in [RFC6830] and [RFC6831] however, it is extended to account for 126 LISP-RE concepts: 128 Delivery Group (DG): The outer destination address of a multicast 129 encapsulated multicast packet with an EID source. 131 Unicast Replication: Is the notion of replicating a multicast packet 132 with an EID source address at an ITR or RTR by encapsulating it 133 into a unicast packet. That is, the oif-list of a multicast map- 134 cache entry can not only have interfaces present for link-layer 135 replication and multicast encapsulation but also for unicast 136 encapsulation. 138 Overlay Distribution Tree: A degree-constrained spanning tree that 139 represents the path followed by unicast and/or multicast 140 encapsulated multicast packets from the root (ITR) to the leaves 141 (ETRs) through intermediary nodes (RTRs). The ITR and RTRs 142 unicast and/or multicast replicate packets to their tree children. 144 LISP Replication Node: A router (either the ITR or an RTR) 145 participating and replicating packets downstream in the 146 distribution tree. 148 Multicast Ingress Tunnel Router (ITR): An ITR as specificed in 149 [I-D.ietf-lisp] that participates as the root in the distribution 150 tree. 152 Multicast Egress Tunnel Router (ETR): An ETR as specified in 153 [I-D.ietf-lisp] that participates as a leaf in the distribution 154 tree. ETR are the only members of the tree that do not unicast 155 replicate. 157 Multicast Re-encapsulating Tunnel Router (RTR): An RTR as specified 158 in [I-D.farinacci-lisp-te] that participates as an intermediary 159 node in the distribution tree. 161 3. Overview 163 This specification describes a method to diminish the ITR's 164 replication load by using RTRs to build an inter-domain distribution 165 tree. The tree is managed by the source domain's Map-Server. RTRs 166 join the overlay on manual configuration and advertise to the Map- 167 Server their availability to replicate traffic for a multicast 168 channel (S-EID,G). Out of all the RTRs registering for the same 169 multicast channel, the Map-Server builds one mapping and organizes 170 the RLOCs in a multi-level hierarchy. The hierarchy is rooted at the 171 ITR and computed based on the manually configured information RTRs 172 register or by means of local policy and algorithms. ETRs always 173 join the overlay as leafs and their attachment prompts the creation 174 of a path, which traverses the RTR hierarchy, to the ITR. The path 175 is built at receiver request by incrementally linking all 176 distribution tree levels, starting at the joining ETR up to the 177 source ITR. 179 The way the distribution tree is built has several benefits. First, 180 it ensures that packets in the source domain do not reach the ITR if 181 no ETR is joined. Second, it ensures that packets are forwarded from 182 ITR to all ETRs without mapping database lookups. Finally, the 183 multicast source is allowed to roam since a first level RTR, when 184 informed of the roam event, can do a new database lookup to find the 185 new ITR to join to. 187 4. Overlay Distribution Tree 189 This section describes the signalling the ITR, RTRs and ETRs use in 190 order to participate in the overlay and build a distribution tree. 191 The signalling messages used are described in 192 [I-D.farinacci-lisp-mr-signaling] and [RFC6831]. 194 RTR participation in the overlay is condition by the configuration, 195 manual or automated, of the multicast channel (S-EID,G) the RTR is to 196 perform replication for. Once configured, an RTR Map-Registers 197 (S-EID,G) to the mapping database system with Merge-Semantics. It 198 also registers a list of usable RLOCs and a set of corresponding 199 weights and priorities. If present, information about the level of 200 the hierarchy where the RTR should attach is also conveyed by means 201 of an Replication List Entry canonical address [I-D.ietf-lisp-lcaf]. 202 Since (S-EID,G) is registered with Merge-Semantics, all RTR 203 originated Map-Register messages are aggregated in one, all- 204 encompassing mapping. If no level information is provided or if 205 configured so, an ITR should use local policy and an algorithm to 206 compute a hierarchy and associate a level in it to each entry in the 207 list (more in Section 5). It should be noted that the entries of the 208 mapping are not RLOCs but Replication List entries. 210 When an ETR creates (S-EID,G) state from a site based multicast join, 211 i.e., its oif-list goes non-empty, it must send an upstream Join 212 request. If the ETR does not have multicast connectivity to its 213 upstream and unicast replication must be performed, the ETR requests 214 that a path from ITR to itself, over the RTR hierarchy be 215 constructed. The following procedure is followed to build the path: 217 1. ETR sends a Map-Request/Join-Request for (S-EID,G) multicast 218 channel to the mapping database system. 220 2. The Map-Server receives the request, looks up the mapping 221 associated to (S-EID,G) and conveys it in a Map-Reply to the ETR. 223 3. The ETR selects out of the list of Replication List entries the 224 one with the best RLOC, according to local policy, taking into 225 account the priority and weights and the requirement that it be 226 as high as possible in the hierarchy. It then sends a Map- 227 Request/Join-Request for (S-EID,G) to the RTR that registered the 228 selected RLOC. 230 4. The RTR inserts the ETR's source address in its oif-list for 231 (S-EID,G) and confirms the Map-Request/Join-Request with a Map- 232 Reply. If not already a member of (S-EID,G), it also sends a 233 Map-Request/Join-Request for (S-EID,G) to the mapping database 234 system. From the ensuing Map-Reply, it chooses the best RLOC 235 pertaining to an adjacent upper level RTR, according to local 236 policy and taking into account the associated priority and 237 weights. It then sends a Join-Request for (S-EID,G) to the 238 selected RTR. 240 5. The previous step is recursively repeated up to when the ITR is 241 joined. On completion, there should exist a path from ITR to 242 joining ETR. 244 6. If the ITR is already member of (S-EID,G) the process stops. 245 Otherwise, the ITR sends a PIM join to the intra-domain multicast 246 source. 248 If at any point, when creating a link between two adjacent layers, 249 multicast replication can be performed, instead of unicast one, the 250 router joining its upstream can set as source of the Map-Request/ 251 Join-Request a delivery group. 253 4.1. Overlay Management Considerations 255 When an ETR's oif-list goes empty a Map-Request/Leave-Request is sent 256 to the upstream RTR which will result in the removal of the ETR's 257 associated entry from the RTR's oif-list. The procedure is repeated 258 by the RTR, and it may recurse upstream, if its own oif-list also 259 goes empty. If an RTR departs, it should first change the priority 260 of the RLOCs it registers with the Map-Server to 255 and set its 261 locators as unreachable in the RLOC-Probing replies it sends 262 downstream. Finally, once all adjacent lower level members have sent 263 Map-Request/Leave-Request messages the RTR can stop registering 264 (S-EID,G) with the mapping database system and thus leave the 265 overlay. 267 RLOC failure is detected at control-plane level through RLOC-probing 268 [RFC6830] by both upstream and downstream routers. When an RTR 269 detects the failure of an downstream RLOC, replication towards the 270 affected RLOC should cease but the associated entry should not be 271 removed from the oif-list. The routers downstream of a failed RTR 272 remove the Map-Request/Join-Request associated state and reperform 273 the join procedure. Ways of detecting RLOC failure at data-plane 274 level and of registering backup RLOCs will be discussed in future 275 versions of this document. 277 An overloaded RTR, i.e., one whose fan-out can not be increased, 278 should change the priority of the RLOCs it registers with the mapping 279 database system to 255. In such a situation, the Map-Server updates 280 the associated mapping and informs all routers having requested it 281 about the change through solicit Map Request (SMR) messages. Both 282 new ETRs attaching to the distribution tree and those already 283 connected but reperforming the join procedure must not use the RLOCs 284 with a priority of 255 as specified in [RFC6830]. However, routers 285 having performed Join-Requests prior to the change should not break 286 their existing connections to the affected RTR. 288 All routers part of an (S-EID,G) multicast channel should re-evaluate 289 their attachment point to the distribution tree whenever the Map- 290 Server updates the associated mapping. This ensures the overlay 291 member routers attach to the best suited parent when new RTRs join or 292 previously attached ones stop being overloaded. Change of a parent 293 should be done following a "make before break" procedure. 294 Specifically, the router changing attachment point first connects to 295 the new parent and only afterwards sends the Leave-Request. 297 5. Automated Computation of RTR Level 299 Operators wishing to automate the RTR joining procedure may wish to 300 use an algorithm for computing an optimized distribution tree. The 301 algorithm could be implemented in the Map-Server and its output 302 should be used to associate to all RTRs a level in the distribution 303 tree. Due to the centralized management, on-line switching between 304 algorithms may be possible in accordance to the required distribution 305 tree performance. However, their use of such algorithms is dependent 306 on the presence of overlay topological information. Ways of 307 obtaining topological information will be discussed in future 308 versions of this document. 310 5.1. Algorithm for Computing Optimized Distribution Trees 312 The current document does not recommend an algorithm for computing 313 optimized distribution trees. However, it provides as an example a 314 low computation cost heuristic, which, in the scenarios simulated in 315 [LCAST-TR], can produce latencies between the ITR and the multicast 316 receivers close to unicast ones. Its choice is to be influenced by 317 operational requirements and the hardware constraints of the 318 equipment in charge of running it. Future experiments might result 319 in a recommendation. 321 In what follows, we use the term "distance" when referring to a 322 relative length or amplitude of a metric, observed on a path 323 connecting two points, but when the exact nature of the metric is of 324 no interest. 326 Considering as goal the delivery of content for delay sensitive 327 applications, the function the algorithm minimizes is the maximum 328 distance (e.g. latency or number of AS hops) from a multicast 329 receiver to the ITR source. Notice that the reference is the 330 multicast receiver host and not an ETR. Thus, what matters in 331 deciding a member's position in the distribution tree is not solely 332 its distance to the ITR but also the number of multicast receivers it 333 serves. Then, a router close to the source but serving few receivers 334 might find itself lower in the distribution tree than another with a 335 slightly higher distance to the source but with a larger receiver 336 set. The algorithm optimizes the quality of experience for multicast 337 receivers and not for tunnel routers. 339 The problem described above, that searches for a minimum average 340 distance, degree-bounded spanning tree (MADDBST), can be formally 341 stated as: 343 Definition: Given an undirected complete graph G=(V,E), a designated 344 vertex r belonging to V, for all vertices v in V, a degree bound 345 d(v) <= dmax, dmax a positive integer, a vertex weight function 346 c(v) with positive integer values, and an edge weight function 347 w(e) with positive values, for all edges e in E. Let W(r,v,T) 348 represent the cost of the path linking r and v in the spanning 349 tree T. Find the spanning tree T of G, routed at r, satisfying 350 that d(v) <= dmax and the distance to the source per multicast 351 receiver is minimized. 353 The heuristic used to solve this problem works by incrementally 354 growing a tree, starting at the root node r, until it becomes a 355 spanning tree. For each node v, not yet a tree member, it selects a 356 potential parent node u in the tree T, such that the distance per 357 receiver to r, is minimized. At each step, the node with the 358 smallest metric value is added to the tree and the parent selection 359 is redone. The pseudocode of the heuristic is provided in 360 Appendix A. 362 [SHI] and [BAN] have previously defined and solved similar 363 optimization problems. Shi et al. [SHI] also prove that a 364 particular instance of the problem, where all vertices have weight 1, 365 is NP-complete for degree constraints 2 <= dmax <= |V|-1. 367 The algorithm can optimize an unicast overlay however, it should not 368 be used to optimize multicast underlay delivery. As a result, if 369 multicast is used as underlay between part of the overlay members, 370 once one of the members of such Delivery Group is added to the 371 distribution tree, the others should be marked as attached also. 372 These nodes should receive multicast encapsulated multicast packets 373 from the chosen node over the underlying multicast distribution tree. 375 Finally, since the RTRs do not replicate packets for multicast 376 receiver hosts, prior to applying the MADDBST heuristic, a Minimum 377 Spanning Tree (MST) algorithm should be used to compute the RTR 378 distribution tree. In this case, the MADDBST heuristic should start 379 attaching ETRs having as input the tree resulting from MST. 381 6. Security Considerations 383 Security concerns for LISP-RE the same as for 384 [I-D.ietf-lisp-multicast] and [I-D.farinacci-lisp-mr-signaling]. 386 7. IANA Considerations 388 This memo includes no request to IANA. 390 8. Acknowledgements 392 TODO 394 9. References 395 9.1. Normative References 397 [I-D.farinacci-lisp-mr-signaling] 398 Farinacci, D. and M. Napierala, "LISP Control-Plane 399 Multicast Signaling", draft-farinacci-lisp-mr-signaling-01 400 (work in progress), January 2013. 402 [I-D.farinacci-lisp-te] 403 Farinacci, D., Lahiri, P., and M. Kowal, "LISP Traffic 404 Engineering Use-Cases", draft-farinacci-lisp-te-01 (work 405 in progress), July 2012. 407 [I-D.ietf-lisp] 408 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 409 "Locator/ID Separation Protocol (LISP)", 410 draft-ietf-lisp-24 (work in progress), November 2012. 412 [I-D.ietf-lisp-lcaf] 413 Farinacci, D., Meyer, D., and J. Snijders, "LISP Canonical 414 Address Format (LCAF)", draft-ietf-lisp-lcaf-01 (work in 415 progress), January 2013. 417 [I-D.ietf-lisp-multicast] 418 Farinacci, D., Meyer, D., Zwiebel, J., and S. Venaas, 419 "LISP for Multicast Environments", 420 draft-ietf-lisp-multicast-14 (work in progress), 421 February 2012. 423 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 424 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 425 Protocol Specification (Revised)", RFC 4601, August 2006. 427 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 428 IP", RFC 4607, August 2006. 430 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 431 Locator/ID Separation Protocol (LISP)", RFC 6830, 432 January 2013. 434 [RFC6831] Farinacci, D., Meyer, D., Zwiebel, J., and S. Venaas, "The 435 Locator/ID Separation Protocol (LISP) for Multicast 436 Environments", RFC 6831, January 2013. 438 9.2. Informative References 440 [BAN] Banerjee, S., Kommareddy, C., Kar, K., Bhattacharjee, B., 441 and S. Khuller, "Construction of an efficient overlay 442 multicast infrastructure for real-time applications", 443 INFOCOM , 2002. 445 [IPLANE] Madhyastha, H., Katz-Bassett, E., Anderson, T., 446 Krishnamurthy, A., and A. Venkataramani, "iPlane: An 447 Information Plane for Distributed Services", USENIX OSDI , 448 2009. 450 [LCAST-TR] 451 Coras, F., Cabellos, A., Domingo, J., Maino, F., and D. 452 Farinacci, "Inter-Domain Multicast: LISP Edge Based 453 Trees", Technical 454 Report http://personals.ac.upc.edu/fcoras/lcast-tr.pdf, 455 2012. 457 [SHI] Shi, S., Turner, J., and M. Waldvogel, "Dimensioning 458 server access bandwidth and multicast routing in overlay 459 networks", NOSSDAV , 2001. 461 Appendix A. MADDBST heuristic 463 INPUT: G = (V,E); r; dmax; w(u,v); c(v); u, v in V 464 OUTPUT: T 466 FOREACH v in V DO 467 delta(v) = w(r,v)/c(v); 468 p(v) = r; 469 END FOREACH 471 T takes (U = {r}, D={}); 472 WHILE U != V DO 473 LET u in U-V be the vertex with the smallest delta(u); 474 U = U U {u}; L = L U {(p(u),u)}; 475 FOREACH v in V-U DO 476 delta(v) = infinity; 477 FOREACH u in U DO 478 IF d(u) < dmax and 479 W{r,u,T} + w(u,v)/c(v) < delta(v) THEN 480 delta(v) = W{r,u,T} + w(u,v)/c(v); 481 p(v) = u; 482 END IF 483 END FOR 484 END FOR 485 END WHILE 487 Figure 1 489 Authors' Addresses 491 Florin Coras 492 Technical University of Catalonia 493 C/Jordi Girona, s/n 494 BARCELONA 08034 495 Spain 497 Email: fcoras@ac.upc.edu 499 Albert Cabellos-Aparicio 500 Technical University of Catalonia 501 C/Jordi Girona, s/n 502 BARCELONA 08034 503 Spain 505 Email: acabello@ac.upc.edu 507 Jordi Domingo-Pascual 508 Technical University of Catalonia 509 C/Jordi Girona, s/n 510 BARCELONA 08034 511 Spain 513 Email: jordi.domingo@ac.upc.edu 515 Fabio Maino 516 cisco Systems 517 Tasman Drive 518 San Jose, CA 95134 519 USA 521 Email: fmaino@cisco.com 523 Dino Farinacci 524 cisco Systems 525 Tasman Drive 526 San Jose, CA 95134 527 USA 529 Email: farinacci@gmail.com