idnits 2.17.1 draft-leymann-banana-load-rebalance-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 6, 2018) is 2270 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'BIC' is mentioned on line 208, but not defined -- Duplicate reference: RFC3168, mentioned in 'RFC3168', was also mentioned in 'RFC6040'. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BANANA N. Leymann 3 Internet Draft C. Heidemann 4 Intended Category: Informational Deutsche Telekom AG 5 L. Geng 6 China Mobile 7 J. Shen 8 China Telecom Co., Ltd 9 M. Zhang 10 L. Chen 11 Huawei 12 M. Cullen 13 Painless Security 14 Expires: August 10, 2018 February 6, 2018 16 BANdwidth Aggregation for interNet Access (BANANA) 17 Load Rebalance for Bonding Tunnels 18 draft-leymann-banana-load-rebalance-02.txt 20 Abstract 22 BANdwidth Aggregation for interNet Access (BANANA) makes use of a 23 subscriber's multiple points of attachment to the Internet to provide 24 the subscriber with higher bandwidth and reliability than what is 25 provided by any single one of these attachments. 27 Various tunnel based methods have been developed to realize BANANA. 28 This document specifies a throughput-increasing mechanism that can be 29 commonly adopted by bonding tunnels methods. Basically, ingress node 30 adaptively adjusts its load distribution function according to the 31 quality of the bonding tunnels so as to make best use of the bonding 32 bandwidth. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as 42 Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2018 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 73 2. Acronyms and Terminology . . . . . . . . . . . . . . . . . . . 3 74 3. Problem: Bonding Reordering Buffer Bloating . . . . . . . . . . 3 75 4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 5 76 5. Load Rebalance . . . . . . . . . . . . . . . . . . . . . . . . 5 77 5.1. Adaptive Splitting Ratio . . . . . . . . . . . . . . . . . 6 78 5.2. Adaptive Sequence Alignment . . . . . . . . . . . . . . . . 6 79 6. Protocol Extensions . . . . . . . . . . . . . . . . . . . . . . 7 80 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 82 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 83 9.1. Normative References . . . . . . . . . . . . . . . . . . . 7 84 9.2. Informative References . . . . . . . . . . . . . . . . . . 8 85 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 87 1. Introduction 89 BANdwidth Aggregation for interNet Access (BANANA) enables 90 subscribers to make use of multiple access technologies to achieve 91 reliable and high bandwidth Internet access. Various bonding tunnel 92 technologies have been proposed to realize BANANA [GREbond] [GTPbond] 93 [MIPbond]. Since per packet traffic distribution is adopted by 94 bonding tunnels, latency difference of the two tunnels may cause 95 packet disorder to a single traffic flow that is being split across 96 these two tunnels. Therefore, a reordering buffer for the bonding 97 tunnels is used at the egress node to restore packet disorder. It is 98 referred as "bonding reordering buffer" afterwards in this document. 100 The egress node places a limit (see OUTOFORDER_TIMER in [RFC2890]) on 101 the time that a packet can wait in the bonding reordering buffer and 102 places a limit on the number of packets in the bonding reordering 103 buffer (MAX_REORDER_BUFFER, see MAX_PERFLOW_BUFFER in [RFC2890]). Any 104 packet that would cause violation of either of the two limits MUST be 105 forcibly delivered by the egress node. The bonding reordering buffer 106 bloating issue may break these two limits, which lead to the 107 mandatory packet delivery therefore causes mass loss of TCP packets. 108 The throughput of the bonding tunnels may decrease dramatically. It 109 is always important to minimize the usage of the bonding reordering 110 buffer (or "Bonding Reordering Buffer Size") in order to reduce the 111 possibility of breaking the above two limits. 113 BANANA may measure the Round Trip Time (RTT) and data rate of each 114 tunnel and monitor the usage of the bonding reordering buffer. Based 115 on the measurement, the ingress node may dynamically adjust the 116 traffic distribution function in order to achieve a higher throughput 117 of the bonding tunnels. For example, it may adaptively update the 118 splitting ratio or adaptively arrange the packet sequence into the 119 bonding tunnels. 121 2. Acronyms and Terminology 123 CIR: Committed Information Rate [RFC2697] 125 RTT: Round Trip Time 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in RFC 2119 [RFC2119]. 131 3. Problem: Bonding Reordering Buffer Bloating 133 Latency difference of the two tunnels causes packet disorder to a 134 traffic flow that is split across these two tunnels. The bonding 135 reordering buffer based on the bonding sequence number at the egress 136 is used to "absorb" this latency difference. Figure 3.1 illustrates 137 the operation of the reordering. 139 +-+ 140 |7| Bonding Sequence Number 141 +-+ 142 .4. Sequence Number 143 ... 144 +-+ +----->----Tunnel 1---->------+ +-+ 145 |8| | | |1| 146 +-+ +--------------+ +---------------+ +-+ 147 ---->| Distribution | | Recombination |----> 148 +--------------+ +---------------+ 149 | | ^| 150 +----->----Tunnel 2---->------+ |v 151 +-+ +-+ +-+ +-------+ 152 |6| |4| |2| | +-+-+ |Bonding 153 +-+ +-+ +-+ | |5|3| |Reordering 154 .3. .2. .1. | +-+-+ |Buffer 155 ... ... ... +-------+ 157 Figure 3.1: Bonding Tunnel Reordering Operation 159 [RFC2890] places two limits on the reordering buffer of a tunnel. One 160 is the timer limit: OUTOFORDER_TIMER and the other is the size limit: 161 MAX_PERFLOW_BUFFER. For bonding tunnels, the first limit is reused 162 while the second parameter becomes the maximum bonding reordering 163 buffer size of the entire bonding tunnel rather than a specific flow. 165 [RFC5681] defines Flight Size as the amount of data that has been 166 sent but not yet cumulatively acknowledged. In this document, the 167 Flight Size of a tunnel indicates the amount of data that has been 168 sent by the ingress node noto this tunnel but not yet pass through 169 the reordering buffer (which is not shown in the figure) of this 170 tunnel. The Flight Size of the entire bonding tunnel indicates the 171 amount of data that has been sent by the ingress node by either 172 tunnel but not yet pass through the bonding reordering buffer. From 173 the sequence number of the last packet sent by the ingress node and 174 the latest sequence number acknowledged by the egress node, the 175 ingress node can monitor the Flight Size of a tunnel. For the entire 176 bonding tunnel, the egress node might acknowledge the bonding 177 sequence number via either of the two tunnels. The maximum bonding 178 sequence number acknowledged by both tunnels is the latest 179 acknowledged bonding sequence number. 181 As shown in Figure 3.1, the Flight Sizes of the tunnels can be used 182 to estimate the load of the tunnels and the usage of the bonding 183 reordering buffer. Suppose the Flight Size of tunnel_1 is F_1, the 184 Flight Size of tunnel_2 is F_2, the Flight Size of the entire bonding 185 tunnel is F_B while the Bonding Reordering Buffer Size is B. B can be 186 calculated as 187 B = F_B - F_1 - F_2 188 = 6 - 1 - 3 189 = 2 191 The bonding reordering buffer may bloat due to the large delay 192 difference of the two tunnels. This bonding reordering buffer 193 bloating issue might lead to the violation of the timer and/or the 194 buffer size limit. The egress node has to deliver the violating 195 packets, which will cause mass packet loss and retransmission of the 196 carried TCP traffic. Throughput of the bonded tunnels will drop 197 dramatically. Therefore, it always important to minimize the size of 198 the bonding reordering buffer. 200 4. Related Work 202 Several TCP congestion-avoidance algorithms are implemented for 203 congestion control in the Internet. TCP New Reno, defined by 204 [RFC6582], improves retransmission during the fast-recovery phase. In 205 the absence of SACK [RFC2018], TCP New Reno responds to partial 206 acknowledgments (ACKs that cover new data, but not all the data 207 outstanding when loss was detected) and sends the next packet beyond 208 the ACKed sequence number. The TCP [BIC] uses binary search to 209 iteratively find the proper congestion window size in each time 210 interval of RTT. [CUBIC] is a less aggressive and more systematic 211 derivative of BIC, in which the window is a cubic function of time so 212 that RTT fairness is guaranteed. 214 However, traditional TCP congestion-avoidance algorithms are not 215 applicable to bonding tunnels due to the following reasons. Bonding 216 tunnels adopt per packet other than per flow load balancing. Bonding 217 tunnels are established between a pair of network devices rather than 218 host-to-host. The ingress node of bonding tunnels is not capable to 219 alter the traffic sending rate. It does not keep sending buffers so 220 it is not capable to retransmit lost packets either. 222 Explicit Congestion Notification (ECN [RFC3168]) notifies impending 223 network congestion by setting a mark in the IP header instead of 224 dropping packets. When the receiver echoes the congestion indication 225 to the sender, the sender should reduce its transmission rate 226 accordingly. The ECN mechanism could be applicable to tunnelling 227 scenarios, but the mechanism itself must be specifically designed 228 [RFC6040]. 230 5. Load Rebalance 232 Parameters such as the Round-Trip Time and the packet loss rate of 233 each tunnel, the usage of the bonding reordering buffer and the data 234 rate of the tunnels might be measured. The measurement could be done 235 in either an one-way or two-way manner. The ECN is a special case of 236 such measurement. If the underlying network infrastructure of the 237 bonding tunnels support ECN, the congestion indications of ECN could 238 be used as measured information as well. The measured information 239 might be carried either by data packets or control messages. 241 Based on the measured information, the ingress node can judge whether 242 one tunnel is already congested so that the traffic proportion to be 243 loaded on it should be decreased. The ingress node therefore can 244 timely adjusts the traffic distribution function to realize a "load 245 rebalance". This load rebalance helps the BANANA system to make best 246 use of the bandwidth of the two tunnels, and to reduce the queue 247 length in the bonding reordering buffer before the congestion control 248 of user's TCP traffic react. 250 5.1. Adaptive Splitting Ratio 252 Coloring mechanism is used to achieve per-packet traffic distribution 253 across bonded tunnels [GREbond] [GTPbond]. Coloring mechanism is 254 defined by [RFC2697] and [RFC2698]. The Committed Information Rate 255 (CIR) determines the traffic rate distributed into a give tunnel. The 256 CIR of the primary tunnel is fixed while the CIR of the secondary 257 tunnel can be tuned dynamically. The ingress node may monitor the 258 latency of the two tunnels via the measurement of RTT. If the latency 259 difference of the two tunnels exceeds a pre-configured threshold (a 260 value in the range from 0 to 100ms), the CIR for the secondary tunnel 261 is decreased (e.g., by a half). Otherwise, its CIR is additively 262 increased as high as to the maximum traffic rate of the secondary 263 tunnel. As the ingress node tunes the CIR, the traffic splitting 264 ratio will be adaptively changed as well. 266 5.2. Adaptive Sequence Alignment 268 The usage of the bonding reordering buffer is timely monitored and 269 reported to the ingress node. A threshold for this usage is pre- 270 configured according to bandwidth or calculated in real-time 271 according to the traffic sending rate. Whenever this threshold is 272 detected to be violated, the ingress node intentionally splits the 273 next incoming packet parade to the lightly loaded (or faster) tunnel 274 until the usage of the bonding reordering buffer drops below the 275 threshold. 277 Alternatively, a RTT difference threshold could be used in the same 278 way, i.e., the ingress node will temporarily stop sending packets to 279 the heavily loaded (or slower) tunnel when the RTT difference of the 280 two tunnels is detected to be larger than that threshold. 282 6. Protocol Extensions 284 TBD. 286 The specification about protocol extensions in this document is 287 intended to be applicable to various bonding tunnel protocols. 289 7. Security Considerations 291 Security should be considered by specific bonding tunnel protocols. 293 8. IANA Considerations 295 This document does not require any allocations by the IANA and 296 therefore does not have any new IANA considerations. 298 9. References 300 9.1. Normative References 302 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 303 Requirement Levels", BCP 14, RFC 2119, DOI 304 10.17487/RFC2119, March 1997, . 307 [RFC2697] Heinanen, J. and R. Guerin, "A Single Rate Three Color 308 Marker", RFC 2697, DOI 10.17487/RFC2697, September 1999, 309 . 311 [RFC2698] Heinanen, J. and R. Guerin, "A Two Rate Three Color 312 Marker", RFC 2698, DOI 10.17487/RFC2698, September 1999, 313 . 315 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 316 RFC 2890, DOI 10.17487/RFC2890, September 2000, 317 . 319 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 320 Notification", RFC 6040, DOI 10.17487/RFC6040, November 321 2010, . 323 [RFC6582] T.Henderson, S.Floyd, A.Gurtov, Y.Nishida, "The NewReno 324 Modification to TCP's Fast Recovery Algorithm", RFC 6582, 325 DOI 10.17487/RFC6582, April 2012, 328 [CUBIC] I.Rhee, & L.Xu, "CUBIC: A New TCP-Friendly High-Speed TCP 329 Variant", 332 9.2. Informative References 334 [RFC2018] M.Mathis, J.Mahdavi, S.Floyd, A.Romanow, "TCP Selective 335 Acknowledgment Options", RFC 2018, DOI 10.17487/RFC2018, 336 October 1996, 338 [RFC3168] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of 339 Explicit Congestion Notification (ECN) to IP", RFC 3168, 340 DOI 10.17487/RFC3168, September 2001, . 343 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 344 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 345 . 347 [GREbond] N. Leymann, C. Heidemann, M. Zhang, et al, "GRE Tunnel 348 Bonding", draft-zhang-gre-tunnel-bonding, work in progress. 350 [GTPbond] P. Muley, W. Henderichx, G. Liang, H. Liu, "Network based 351 Bonding solution for Hybrid Access", draft-muley-network- 352 based-bonding-hybrid-access, work in progress. 354 [MIPbond] P. Seite, A. Yegin and S. Gundavelli, "Multihoming support 355 for Residential Gateways", draft-seite-dmm-rg-multihoming, 356 work in progress. 358 Author's Addresses 360 Nicolai Leymann 361 Deutsche Telekom AG 362 Winterfeldtstrasse 21-27 363 Berlin 10781 364 Germany 366 Phone: +49-170-2275345 367 Email: n.leymann@telekom.de 369 Cornelius Heidemann 370 Deutsche Telekom AG 371 Heinrich-Hertz-Strasse 3-7 372 Darmstadt 64295 373 Germany 375 Phone: +4961515812721 376 Email: heidemannc@telekom.de 378 Liang Geng 379 China Mobile 380 32 Xuanwumen West Street, 381 Xicheng District, Beijing, 100053, 382 China 384 EMail: gengliang@chinamobile.com 386 Jun Shen 387 China Telecom Co., Ltd 388 109 West Zhongshan Ave, Tianhe District 389 Guangzhou 510630 390 P.R. China 392 EMail: shenjun@gsta.com 394 Mingui Zhang 395 Huawei Technologies 396 No.156 Beiqing Rd. Haidian District, 397 Beijing 100095 P.R. China 399 EMail: zhangmingui@huawei.com 400 Lihao Chen 401 Huawei Technologies 402 No.156 Beiqing Rd. Haidian District, 403 Beijing 100095 P.R. China 405 EMail: lihao.chen@huawei.com 407 Margaret Cullen 408 Painless Security 409 14 Summer St. Suite 202 410 Malden, MA 02148 USA 412 EMail: margaret@painless-security.com