idnits 2.17.1 draft-thaler-multipath-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (15 November 1998) is 9292 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '2' is defined on line 206, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2178 (ref. '1') (Obsoleted by RFC 2328) -- Possible downref: Normative reference to a draft: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Obsolete normative reference: RFC 2362 (ref. '4') (Obsoleted by RFC 4601, RFC 5059) Summary: 11 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Dave Thaler 3 INTERNET-DRAFT Microsoft 4 Expires May 1999 15 November 1998 6 Multipath Issues in Unicast and Multicast 7 9 Status of this Memo 11 This document is an Internet Draft. Internet Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its Areas, and 13 its Working Groups. Note that other groups may also distribute working 14 documents as Internet Drafts. 16 Internet Drafts are valid for a maximum of six months and may be 17 updated, replaced, or obsoleted by other documents at any time. It is 18 inappropriate to use Internet Drafts as reference material or to cite 19 them other than as a "work in progress". 21 1. Introduction 23 Various routing protocols, including OSPF [1] and ISIS, explicitly allow 24 "Equal-Cost Multipath" routing. Some router implementations also allow 25 equal-cost multipath usage with RIP and other routing protocols. Using 26 equal-cost multipath means that if multiple equal-cost routes to the 27 same destination exist, they can be discovered and used to provide load 28 balancing among redundant paths. 30 The effect of multipath routing on a forwarder is that the forwarder 31 potentially has several next-hops for any given destination and must use 32 some method to choose which next-hop should be used for a given data 33 packet. This memo summarizes current practices, problems, and solutions. 35 2. Concerns 37 Several router implementations allow multipath forwarding. This is 38 sometimes done naively via round-robin, where each packet matching a 39 Draft Multipath Issues November 1998 41 given destination route is forwarded using the subsequent next-hop, in a 42 round-robin fashion. This does provide a form of load balancing, but 43 there are several problems with a round-robin approach: 45 Variable Path MTU 46 Since each of the redundant paths may have a different MTU, this 47 means that the overall path MTU can change on a packet-by-packet 48 basis, negating the usefulness of path MTU discovery. 50 Variable Bandwidth 51 Since each of the redundant paths may have a different amount of 52 bandwidth available, bandwidth may also change on a packet-by- 53 packet basis. Rate-adaptive protocols such as TCP are designed to 54 optimize their performance to adapt to the available bandwidth. 55 Varying the bandwidth on a packet-by-packet basis causes problems 56 with TCP's congestion control mechanisms, resulting in much lower 57 throughputs. 59 Variable Latencies 60 Since each of the redundant paths may have a different latency 61 involved, having packets take separate paths can cause packets to 62 always arrive out of order, increasing delivery latency and 63 buffering requirements. 65 Debugging 66 Common debugging utilities such as ping and traceroute are much 67 less reliable in the presence of multiple paths and may even 68 present completely wrong results. 70 In multicast routing, the problem with multiple paths is that multicast 71 routing protocols prevent loops and duplicates by constructing a single 72 tree to all receivers of the same group address. Multicast routing 73 protocols deployed today (DVMRP, PIM-DM, PIM-SM) construct shortest-path 74 trees rooted at either the source, or another router known as a Core or 75 Rendezvous Point. Hence, the way they insure that duplicates will not 76 arise is that a given tree must use only a single next-hop towards the 77 root of the tree. 79 Draft Multipath Issues November 1998 81 3. Requirements 83 All of the problems outlined in the previous section arise when packets 84 in the same unicast or multicast "flow" (or session) are split among 85 multiple paths. The natural solution is therefore to insure that 86 packets for the same flow always use the same path. 88 Two additional features are desirable: 90 Minimal disruption 91 When multipath is used, meaning that multiple routes contribute 92 valid next-hops, the chances are higher of routes being added and 93 deleted from consideration than when only the "best" route is used 94 (in which case metric changes in alternate routes have no effect on 95 traffic paths). Hence, it is desirable to minimize the number of 96 active flows affected by the addition or deletion of another next- 97 hop. 99 Fast implementation 100 The amount of additional computation required to forward a packet 101 must be as small as possible. For example, when doing round-robin, 102 this computation might consist of incrementing (modulo the number 103 of next-hops) a next-hop index. 105 4. Solutions 107 We now provide two possible methods for improving the performance of 108 multipath and then discuss their applicability to unicast and multicast 109 forwarding. 111 Modulo-N Hash 112 To select a next-hop from the list of N next-hops, the router 113 performs a modulo-N hash over the packet header fields that 114 identify a flow. This has the advantage of being fast, at the 115 expense of (N-1)/N of all flows changing paths whenever a next-hop 116 is added or removed. 118 Highest Random Weight (HRW) 119 The router uses a simple pseudo-random number function seeded with 120 the packet header fields that identify a flow, as well as a next- 122 Draft Multipath Issues November 1998 124 hop identifier (address or index), to assign a weight to each of 125 the N next hops. The next-hop receiving the highest weight is 126 chosen as the next hop. This has the advantage of minimizing the 127 number of flows affected by a next-hop addition or deletion, but is 128 approximately N times as expensive as a modulo-N hash. An analysis 129 of various deterministic weight functions can be found in [3]. 131 The applicability of these two alternatives depends on (at least) two 132 factors: whether the forwarder maintains per-flow state, and how 133 precious CPU is to a multipath forwarder. 135 If per-flow state is maintained in a multipath forwarder, then 136 computation of the next-hop can be done by the router at state creation 137 time. This entails no additional computations at packet forwarding time, 138 since the next-hop is precomputed. In this case, any method can be 139 used, including round-robin, random, modulo-N, or HRW. Hash functions 140 such as modulo-N and HRW are better if the forwarder state may be 141 deleted for any reason during the lifetime of a flow since subsequent 142 next-hop computations by the router will always select the same path. 143 This also improves the usefulness of debugging utilities such as 144 traceroute. Finally, to maximize the stability of paths (and hence the 145 usefulness of traceroute, etc.), the use of HRW is recommended. 147 If per-flow state is not maintained by the forwarder, then using 148 multiple next-hops requires that the next-hop be calculated at packet 149 arrival time. When CPU is more precious than stability of flow paths, a 150 simple modulo-N hash over all packet header fields identifying a flow is 151 recommended. 153 4.1. Unicast Forwarding 155 Depending on the implementation, unicast forwarding may or may not keep 156 per-flow state. We recommend that where forwarder implementations keep 157 flow state, routers should use HRW at state creation time (and next-hop 158 deletion time) to select the next-hop, and that forwarders without per- 159 flow state use a modulo-N hash over the source and destination 160 addresses. 162 4.2. Multicast Forwarding 164 Today's multicast forwarding engines use a cache of forwarding entries 165 indexed by group (or group prefix) and source (or source prefix). This 166 means that today's multicast forwarder's always keep per-flow state, 167 Draft Multipath Issues November 1998 169 although for some multicast routing protocols, the "flow" may be fairly 170 coarse (e.g., traffic from all sources to the same destination). Since 171 per-flow state is kept by the forwarder, it is recommended that the 172 router always use HRW to select the next-hop. 174 Routers using explicit-joining protocols such as PIM-SM [4] should thus 175 use the multipath information when determining to which neighbor a join 176 message should be sent. For example, when multiple next-hops exist for 177 a given Rendezvous Point (RP) toward which a (*,G) Join should be sent, 178 it is recommended that HRW be used to select the next-hop to use for 179 each group. 181 5. Redundant Parallel Links 183 A related problem occurs when multiple parallel links are used between 184 the same pair of routers. A common solution is to bundle the two links 185 together into a "super"-link when is then used for routing. For 186 multicast forwarding, this results in the two links being reduced to a 187 single next-hop (over the combined link) which can be used to prevent 188 duplicates. When a unicast or multicast packet is queued to the 189 combined link, some method, such as those discussed earlier, is still 190 required to determine the physical link on which to transmit the packet. 191 If the parallel links are identical, then most of the concerns discussed 192 in this document are avoided with the combined link. The exception is 193 packet reordering, which can still occur with round-robin, adversely 194 affecting TCP. 196 6. Security Considerations 198 This document discusses issues with various methods of choosing a next- 199 hop from among multiple valid next-hops. As such, it does not directly 200 impact the security of the Internet infrastructure or its applications. 202 7. References 204 [1] Moy, J., "OSPF Version 2", RFC 2178, July 1997. 206 [2] Semeria, C., and T. Maufer, "Introduction to IP Multicast Routing", 207 draft-ietf-mboned-intro-multicast-03.txt, October 1997. 209 [3] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to 210 Increase Hit Rates", IEEE/ACM Transactions on Networking, February 212 Draft Multipath Issues November 1998 214 1998. 216 [4] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., 217 Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei, 218 "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol 219 Specification", RFC 2362, June 1998. 221 8. Author's Address 223 Dave Thaler 224 Microsoft 225 One Microsoft Way 226 Redmond, WA 98052 227 Phone: +1 425 703 8835 228 EMail: dthaler@microsoft.com