idnits 2.17.1 draft-leymann-banana-load-rebalance-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 15, 2017) is 2447 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'BIC' is mentioned on line 207, but not defined -- Duplicate reference: RFC3168, mentioned in 'RFC3168', was also mentioned in 'RFC6040'. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BANANA N. Leymann 3 Internet Draft C. Heidemann 4 Intended Category: Informational Deutsche Telekom AG 5 L. Geng 6 China Mobile 7 J. Shen 8 China Telecom Co., Ltd 9 M. Zhang 10 L. Chen 11 Huawei 12 M. Cullen 13 Painless Security 14 Expires: February 16, 2018 August 15, 2017 16 BANdwidth Aggregation for interNet Access (BANANA) 17 Load Rebalance for Bonding Tunnels 18 draft-leymann-banana-load-rebalance-01.txt 20 Abstract 22 BANdwidth Aggregation for interNet Access (BANANA) makes use of a 23 subscriber's multiple points of attachment to the Internet to provide 24 the subscriber with higher bandwidth and reliability than what is 25 provided by any single one of these attachments. 27 Various tunnel based methods have been developed to realize BANANA. 28 This document specifies a throughput-increasing mechanism that can be 29 commonly adopted by bonding tunnels methods. Basically, ingress node 30 adaptively adjusts its load distribution function according to the 31 quality of the bonding tunnels so as to make best use of the bonding 32 bandwidth. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as 42 Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2017 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 73 2. Acronyms and Terminology . . . . . . . . . . . . . . . . . . . 3 74 3. Problem: Bonding Reordering Buffer Bloating . . . . . . . . . . 3 75 4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 5 76 5. Load Rebalance . . . . . . . . . . . . . . . . . . . . . . . . 5 77 5.1. Adaptive Splitting Ratio . . . . . . . . . . . . . . . . . 6 78 6. Protocol Extensions . . . . . . . . . . . . . . . . . . . . . . 6 79 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 80 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 81 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 82 9.1. Normative References . . . . . . . . . . . . . . . . . . . 7 83 9.2. Informative References . . . . . . . . . . . . . . . . . . 7 84 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 86 1. Introduction 88 BANdwidth Aggregation for interNet Access (BANANA) enables 89 subscribers to make use of multiple access technologies to achieve 90 reliable and high bandwidth Internet access. Various bonding tunnel 91 technologies have been proposed to realize BANANA [GREbond] [GTPbond] 92 [MIPbond]. Since per packet traffic distribution is adopted by 93 bonding tunnels, latency difference of the two tunnels may cause 94 packet disorder to a single traffic flow that is being split across 95 these two tunnels. Therefore, a reordering buffer for the bonding 96 tunnels is used at the egress node to restore packet disorder. It is 97 referred as "bonding reordering buffer" afterwards in this document. 99 The egress node places a limit (see OUTOFORDER_TIMER in [RFC2890]) on 100 the time that a packet can wait in the bonding reordering buffer and 101 places a limit on the number of packets in the bonding reordering 102 buffer (MAX_REORDER_BUFFER, see MAX_PERFLOW_BUFFER in [RFC2890]). Any 103 packet that would cause violation of either of the two limits MUST be 104 forcibly delivered by the egress node. The bonding reordering buffer 105 bloating issue may break these two limits, which lead to the 106 mandatory packet delivery therefore causes mass loss of TCP packets. 107 The throughput of the bonding tunnels may decrease dramatically. It 108 is always important to minimize the usage of the bonding reordering 109 buffer (or "Bonding Reordering Buffer Size") in order to reduce the 110 possibility of breaking the above two limits. 112 BANANA may measure the Round Trip Time (RTT) and data rate of each 113 tunnel and monitor the usage of the bonding reordering buffer. Based 114 on the measurement, the ingress node may dynamically adjust the 115 traffic distribution function in order to achieve a higher throughput 116 of the bonding tunnels. For example, it may adaptively update the 117 splitting ratio or adaptively arrange the packet sequence into the 118 bonding tunnels. 120 2. Acronyms and Terminology 122 CIR: Committed Information Rate [RFC2697] 124 RTT: Round Trip Time 126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 128 document are to be interpreted as described in RFC 2119 [RFC2119]. 130 3. Problem: Bonding Reordering Buffer Bloating 132 Latency difference of the two tunnels causes packet disorder to a 133 traffic flow that is split across these two tunnels. The bonding 134 reordering buffer based on the bonding sequence number at the egress 135 is used to "absorb" this latency difference. Figure 3.1 illustrates 136 the operation of the reordering. 138 +-+ 139 |7| Bonding Sequence Number 140 +-+ 141 .4. Sequence Number 142 ... 143 +-+ +----->----Tunnel 1---->------+ +-+ 144 |8| | | |1| 145 +-+ +--------------+ +---------------+ +-+ 146 ---->| Distribution | | Recombination |----> 147 +--------------+ +---------------+ 148 | | ^| 149 +----->----Tunnel 2---->------+ |v 150 +-+ +-+ +-+ +-------+ 151 |6| |4| |2| | +-+-+ |Bonding 152 +-+ +-+ +-+ | |5|3| |Reordering 153 .3. .2. .1. | +-+-+ |Buffer 154 ... ... ... +-------+ 156 Figure 3.1: Bonding Tunnel Reordering Operation 158 [RFC2890] places two limits on the reordering buffer of a tunnel. One 159 is the timer limit: OUTOFORDER_TIMER and the other is the size limit: 160 MAX_PERFLOW_BUFFER. For bonding tunnels, the first limit is reused 161 while the second parameter becomes the maximum bonding reordering 162 buffer size of the entire bonding tunnel rather than a specific flow. 164 [RFC5681] defines Flight Size as the amount of data that has been 165 sent but not yet cumulatively acknowledged. In this document, the 166 Flight Size of a tunnel indicates the amount of data that has been 167 sent by the ingress node noto this tunnel but not yet pass through 168 the reordering buffer (which is not shown in the figure) of this 169 tunnel. The Flight Size of the entire bonding tunnel indicates the 170 amount of data that has been sent by the ingress node by either 171 tunnel but not yet pass through the bonding reordering buffer. From 172 the sequence number of the last packet sent by the ingress node and 173 the latest sequence number acknowledged by the egress node, the 174 ingress node can monitor the Flight Size of a tunnel. For the entire 175 bonding tunnel, the egress node might acknowledge the bonding 176 sequence number via either of the two tunnels. The maximum bonding 177 sequence number acknowledged by both tunnels is the latest 178 acknowledged bonding sequence number. 180 As shown in Figure 3.1, the Flight Sizes of the tunnels can be used 181 to estimate the load of the tunnels and the usage of the bonding 182 reordering buffer. Suppose the Flight Size of tunnel_1 is F_1, the 183 Flight Size of tunnel_2 is F_2, the Flight Size of the entire bonding 184 tunnel is F_B while the Bonding Reordering Buffer Size is B. B can be 185 calculated as 186 B = F_B - F_1 - F_2 187 = 6 - 1 - 3 188 = 2 190 The bonding reordering buffer may bloat due to the large delay 191 difference of the two tunnels. This bonding reordering buffer 192 bloating issue might lead to the violation of the timer and/or the 193 buffer size limit. The egress node has to deliver the violating 194 packets, which will cause mass packet loss and retransmission of the 195 carried TCP traffic. Throughput of the bonded tunnels will drop 196 dramatically. Therefore, it always important to minimize the size of 197 the bonding reordering buffer. 199 4. Related Work 201 Several TCP congestion-avoidance algorithms are implemented for 202 congestion control in the Internet. TCP New Reno, defined by 203 [RFC6582], improves retransmission during the fast-recovery phase. In 204 the absence of SACK [RFC2018], TCP New Reno responds to partial 205 acknowledgments (ACKs that cover new data, but not all the data 206 outstanding when loss was detected) and sends the next packet beyond 207 the ACKed sequence number. The TCP [BIC] uses binary search to 208 iteratively find the proper congestion window size in each time 209 interval of RTT. [CUBIC] is a less aggressive and more systematic 210 derivative of BIC, in which the window is a cubic function of time so 211 that RTT fairness is guaranteed. 213 However, traditional TCP congestion-avoidance algorithms are not 214 applicable to bonding tunnels due to the following reasons. Bonding 215 tunnels adopt per packet other than per flow load balancing. Bonding 216 tunnels are established between a pair of network devices rather than 217 host-to-host. The ingress node of bonding tunnels is not capable to 218 alter the traffic sending rate. It does not keep sending buffers so 219 it is not capable to retransmit lost packets either. 221 Explicit Congestion Notification (ECN [RFC3168]) notifies impending 222 network congestion by setting a mark in the IP header instead of 223 dropping packets. When the receiver echoes the congestion indication 224 to the sender, the sender should reduce its transmission rate 225 accordingly. The ECN mechanism could be applicable to tunnelling 226 scenarios, but the mechanism itself must be specifically designed 227 [RFC6040]. 229 5. Load Rebalance 231 Parameters such as the Round-Trip Time and the packet loss rate of 232 each tunnel, the usage of the bonding reordering buffer and the data 233 rate of the tunnels might be measured. The measurement could be done 234 in either an one-way or two-way manner. The ECN is a special case of 235 such measurement. If the underlying network infrastructure of the 236 bonding tunnels support ECN, the congestion indications of ECN could 237 be used as measured information as well. The measured information 238 might be carried either by data packets or control messages. 240 Based on the measured information, the ingress node can judge whether 241 one tunnel is already congested so that the traffic proportion to be 242 loaded on it should be decreased. The ingress node therefore can 243 timely adjusts the traffic distribution function to realize a "load 244 rebalance". This load rebalance helps the BANANA system to make best 245 use of the bandwidth of the two tunnels, and to reduce the queue 246 length in the bonding reordering buffer before the congestion control 247 of user's TCP traffic react. 249 5.1. Adaptive Splitting Ratio 251 Coloring mechanism is used to achieve per-packet traffic distribution 252 across bonded tunnels [GREbond] [GTPbond]. Coloring mechanism is 253 defined by [RFC2697] and [RFC2698]. The Committed Information Rate 254 (CIR) determines the traffic rate distributed into a give tunnel. The 255 CIR of the primary tunnel is fixed while the CIR of the secondary 256 tunnel can be tuned dynamically. The ingress node may monitor the 257 latency of the two tunnels via the measurement of RTT. If the latency 258 difference of the two tunnels exceeds a pre-configured threshold (a 259 value in the range from 0 to 100ms), the CIR for the secondary tunnel 260 is decreased (e.g., by a half). Otherwise, its CIR is additively 261 increased as high as to the maximum traffic rate of the secondary 262 tunnel. As the ingress node tunes the CIR, the traffic splitting 263 ratio will be adaptively changed as well. 265 6. Protocol Extensions 267 TBD. 269 The specification about protocol extensions in this document is 270 intended to be applicable to various bonding tunnel protocols. 272 7. Security Considerations 274 Security should be considered by specific bonding tunnel protocols. 276 8. IANA Considerations 278 This document does not require any allocations by the IANA and 279 therefore does not have any new IANA considerations. 281 9. References 282 9.1. Normative References 284 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 285 Requirement Levels", BCP 14, RFC 2119, DOI 286 10.17487/RFC2119, March 1997, . 289 [RFC2697] Heinanen, J. and R. Guerin, "A Single Rate Three Color 290 Marker", RFC 2697, DOI 10.17487/RFC2697, September 1999, 291 . 293 [RFC2698] Heinanen, J. and R. Guerin, "A Two Rate Three Color 294 Marker", RFC 2698, DOI 10.17487/RFC2698, September 1999, 295 . 297 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 298 RFC 2890, DOI 10.17487/RFC2890, September 2000, 299 . 301 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 302 Notification", RFC 6040, DOI 10.17487/RFC6040, November 303 2010, . 305 [RFC6582] T.Henderson, S.Floyd, A.Gurtov, Y.Nishida, "The NewReno 306 Modification to TCP's Fast Recovery Algorithm", RFC 6582, 307 DOI 10.17487/RFC6582, April 2012, 310 [CUBIC] I.Rhee, & L.Xu, "CUBIC: A New TCP-Friendly High-Speed TCP 311 Variant", 314 9.2. Informative References 316 [RFC2018] M.Mathis, J.Mahdavi, S.Floyd, A.Romanow, "TCP Selective 317 Acknowledgment Options", RFC 2018, DOI 10.17487/RFC2018, 318 October 1996, 320 [RFC3168] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of 321 Explicit Congestion Notification (ECN) to IP", RFC 3168, 322 DOI 10.17487/RFC3168, September 2001, . 325 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 326 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 327 . 329 [GREbond] N. Leymann, C. Heidemann, M. Zhang, et al, "GRE Tunnel 330 Bonding", draft-zhang-gre-tunnel-bonding, work in progress. 332 [GTPbond] P. Muley, W. Henderichx, G. Liang, H. Liu, "Network based 333 Bonding solution for Hybrid Access", draft-muley-network- 334 based-bonding-hybrid-access, work in progress. 336 [MIPbond] P. Seite, A. Yegin and S. Gundavelli, "Multihoming support 337 for Residential Gateways", draft-seite-dmm-rg-multihoming, 338 work in progress. 340 Author's Addresses 342 Nicolai Leymann 343 Deutsche Telekom AG 344 Winterfeldtstrasse 21-27 345 Berlin 10781 346 Germany 348 Phone: +49-170-2275345 349 Email: n.leymann@telekom.de 351 Cornelius Heidemann 352 Deutsche Telekom AG 353 Heinrich-Hertz-Strasse 3-7 354 Darmstadt 64295 355 Germany 357 Phone: +4961515812721 358 Email: heidemannc@telekom.de 360 Liang Geng 361 China Mobile 362 32 Xuanwumen West Street, 363 Xicheng District, Beijing, 100053, 364 China 366 EMail: gengliang@chinamobile.com 368 Jun Shen 369 China Telecom Co., Ltd 370 109 West Zhongshan Ave, Tianhe District 371 Guangzhou 510630 372 P.R. China 374 EMail: shenjun@gsta.com 376 Mingui Zhang 377 Huawei Technologies 378 No.156 Beiqing Rd. Haidian District, 379 Beijing 100095 P.R. China 381 EMail: zhangmingui@huawei.com 382 Lihao Chen 383 Huawei Technologies 384 No.156 Beiqing Rd. Haidian District, 385 Beijing 100095 P.R. China 387 EMail: lihao.chen@huawei.com 389 Margaret Cullen 390 Painless Security 391 14 Summer St. Suite 202 392 Malden, MA 02148 USA 394 EMail: margaret@painless-security.com