idnits 2.17.1 draft-carpenter-flow-ecmp-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 133: '...ue set by the source MUST be delivered...' RFC 2119 keyword, line 135: '... 2. "IPv6 nodes MUST NOT assume any m...' RFC 2119 keyword, line 137: '...uter performance SHOULD NOT be depende...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 18, 2010) is 5174 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 3697 (Obsoleted by RFC 6437) -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Carpenter 3 Internet-Draft Univ. of Auckland 4 Intended status: Informational February 18, 2010 5 Expires: August 22, 2010 7 Using the IPv6 flow label for equal cost multipath routing in tunnels 8 draft-carpenter-flow-ecmp-01 10 Abstract 12 The IPv6 flow label has certain restrictions on its use. This 13 document describes how those restrictions apply when using the flow 14 label for load balancing by equal cost multipath routing, 15 particularly for tunneled traffic. 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on August 22, 2010. 40 Copyright Notice 42 Copyright (c) 2010 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 60 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 61 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 6 62 6. Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 7.1. Normative References . . . . . . . . . . . . . . . . . . . 6 65 7.2. Informative References . . . . . . . . . . . . . . . . . . 6 66 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 7 68 1. Introduction 70 When several network paths between the same two nodes are known by 71 the routing system to be equally good (in terms of capacity and 72 latency), it may be desirable to share traffic among them. This is 73 known as equal cost multipath routing (ECMP). There are of course 74 numerous possible approaches to this, but certain goals need to be 75 met: 76 o Roughly equal share of traffic on each path. 77 o Work-conserving method (no idle time when queue is non-empty). 78 o Minimize or avoid out-of-order delivery for individual traffic 79 flows. 81 There is some conflict between these goals: for example, strictly 82 avoiding idle time could cause a small packet sent on an idle path to 83 overtake a bigger packet from the same flow, causing out-of-order 84 delivery. 86 One approach to ECMP is this: if there are N equally good paths to 87 choose from, then form a hash code modulo(N) from each packet header, 88 and use the resulting value to select a particular path. If the hash 89 values have an even statistical distribution, this method will share 90 traffic roughly equally between the N paths. If the header fields 91 included in the hash are well chosen, all packets from a given flow 92 will generate the same hash, so out-of-order delivery will not occur. 93 Assuming a large number of flows from many sources are involved, it 94 is also probable that the method will be work-conserving, since the 95 queue for each link will remain non-empty. 97 The question with such a method is which IP header fields to include. 98 A minimal choice in the routing system is simply to use a hash of the 99 source and destination IP addresses. This is necessary and 100 sufficient to avoid out-of-order delivery, and with a wide variety of 101 sources and destinations, as one finds in the core of the network, 102 probably sufficient to achieve work-conserving load sharing. In 103 practice, implementations often use the 5-tuple {dest addr, source 104 addr, protocol, dest port, source port}. However, including port and 105 destination protocol numbers in the hash will not only make the hash 106 slightly more expensive to compute, but will not particularly improve 107 the hash distribution, due to the prevalence of well known port 108 numbers and popular protocol numbers. Source ports, on the other 109 hand, are quite well distributed [Lee09]. 111 The situation is different in tunneled scenarios. Assume that 112 traffic from many sources to many destinations is aggregated in a 113 single IP-in-IP tunnel from tunnel end point (TEP) A to TEP B (see 114 figure). Then all the tunnel packets have source address A and 115 destination address B. In all probability they also have the same 116 port and protocol numbers. If there are multiple paths between 117 routers R1 and R2, and ECMP is applied, the 5-tuple and its hash will 118 be constant and no load sharing will be achieved. 120 _____ _____ _____ _____ 121 | TEP |_________| R1 |-------------| R2 |_________| TEP | 122 |__A__| |_____|-------------|_____| |__B__| 123 tunnel ECMP here tunnel 125 Also, for IPv6, the total number of bits in the hash would then be 126 quite large (296), which could be an issue for some hardware 127 implementations. The question therefore arises whether the 20-bit 128 flow label in IPv6 packets would be suitable for using in an ECMP 129 hash. 131 The flow label is left experimental by [RFC2460] but is better 132 defined by [RFC3697]. We quote three rules from that RFC: 133 1. "The Flow Label value set by the source MUST be delivered 134 unchanged to the destination node(s)." 135 2. "IPv6 nodes MUST NOT assume any mathematical or other properties 136 of the Flow Label values assigned by source nodes." 137 3. "Router performance SHOULD NOT be dependent on the distribution 138 of the Flow Label values. Especially, the Flow Label bits alone 139 make poor material for a hash key." 141 These rules, especially the last one, have caused designers to 142 hesitate about using the flow label in support of ECMP. The fact is 143 today that most nodes do not set a non-zero value in the flow label, 144 and the first rule definitely forbids the routing system from doing 145 so once a packet has left the source node. Considering normal IPv6 146 traffic, the fact that the flow label is typically zero means that it 147 would add no value to an ECMP hash. But neither would it do any harm 148 to the distribution of the hash values. If the community at some 149 stage agrees to set pseudo-random flow labels in the majority of 150 traffic flows, this would add to the value of the hash. 152 However, in the case of an IP-in-IPv6 tunnel, the TEP is itself the 153 source node of the outer packets. Therefore, a TEP may freely set a 154 flow label in the outer IPv6 header of the packets it sends into the 155 tunnel. In particular, it may follow the [RFC3697] suggestion to set 156 a pseudo-random value. 158 The second two rules quoted above need to be seen in the context of 159 [RFC3697], which assumes that routers using the flow label in some 160 way will be involved in some sort of method of establishing flow 161 state: "To enable flow-specific treatment, flow state needs to be 162 established on all or a subset of the IPv6 nodes on the path from the 163 source to the destination(s)." The RFC should perhaps have made 164 clear that a router that has participated in flow state establishment 165 can know, rather than assume, properties of the resulting flow label 166 values. If a router knows these properties, rule 2 is irrelevant, 167 and it can choose to deviate from rule 3. 169 In the tunneling situation sketched above, routers R1 and R2 can rely 170 on the flow labels set by TEP A and TEP B being assigned by a known 171 method. This allows a safe ECMP method to be based on the flow label 172 without breaching [RFC3697]. 174 2. Guidelines 176 We assume that the routers supporting ECMP (R1 and R2 in the above 177 figure) are unaware that they are handling tunneled traffic. If it 178 is desired to include the IPv6 flow label in an ECMP hash in the 179 tunneled scenario shown above, the following guidelines are 180 suggested: 181 o Inner packets should be encapsulated in an outer IPv6 packet whose 182 source and destination addresses are those of the tunnel end 183 points (TEPs). 184 o The flow label in the outer packet must be set by the sending TEP 185 to a pseudo-random 20-bit value in accordance with [RFC3697]. The 186 same flow label value must be used for all packets in a single 187 user flow, as determined by the IP header fields of the inner 188 packet. 189 o Thus, the TEP will need to classify all packets into flows, once 190 it has determined that they should enter a given tunnel, and then 191 write the relevant flow label into the outer header. A user flow 192 could be defined most simply by its {destination, source} address 193 pair (coarse ECMP) or by its 5-tuple {dest addr, source addr, 194 protocol, dest port, source port} (fine ECMP). This is an 195 implementation detail in the TEP. 196 o It may be possible to make this classifier stateless, by using a 197 suitable hash of the inner 5-tuple as the pseudo-random value. 198 o In router(s) liable to perform ECMP for packets whose source 199 address is a TEP, the ECMP hash should minimally include the 200 triple {dest addr, source addr, flow label} to meet the [RFC3697] 201 rules. In practice, since the routers are assumed to be unaware 202 of tunneled traffic, this means adding the flow label to the 203 existing 5-tuple hash. 204 * For tunnel packets, it is harmless for the hash to also include 205 {protocol, dest port, source port}, which will be constant. 206 * For non-tunnel packets, it is harmless for the hash to also 207 include the flow label, which is currently zero in normal 208 traffic, and could only improve the hash if set. 210 3. Security Considerations 212 The flow label is not protected in any way and can be forged by an 213 on-path attacker. Off-path attackers are extremely unlikely to guess 214 a valid flow label. In either case, the worst an attacker could do 215 against ECMP is to selectively overload a particular path. 217 4. IANA Considerations 219 This document requests no action by IANA. 221 5. Acknowledgements 223 This document was suggest by corridor discussions at IETF76. Joel 224 Halpern made crucial comments on an early version. The author is 225 grateful to Qinwen Hu for general discussion about the flow label. 226 Valuable comments and contributions were made by Shane Amante, Jarno 227 Rajahalme, and others. 229 This document was produced using the xml2rfc tool [RFC2629]. 231 6. Change log 233 draft-carpenter-flow-ecmp-01: updated after comments, 2010-02-18 235 draft-carpenter-flow-ecmp-00: original version, 2010-01-19 237 7. References 239 7.1. Normative References 241 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 242 (IPv6) Specification", RFC 2460, December 1998. 244 [RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. Deering, 245 "IPv6 Flow Label Specification", RFC 3697, March 2004. 247 7.2. Informative References 249 [Lee09] Lee, D., Carpenter, B., and N. Brownlee, "Observations of 250 UDP to TCP Ratio and Port Numbers", Technical Report , 251 2009, . 254 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 255 June 1999. 257 Author's Address 259 Brian Carpenter 260 Department of Computer Science 261 University of Auckland 262 PB 92019 263 Auckland, 1142 264 New Zealand 266 Email: brian.e.carpenter@gmail.com