idnits 2.17.1 draft-fairhurst-tsvwg-buffers-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document updates RFC3819, but the abstract doesn't seem to directly say this. It does mention RFC3819 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. (Using the creation date from RFC3819, updated by this document, for RFC5378 checks: 1999-10-14) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 11, 2013) is 4063 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC-ED' is mentioned on line 324, but not defined -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Working Group G. Fairhurst 3 Internet-Draft University of Aberdeen 4 Updates: RFC 3819 (if published) (if approved) B. Briscoe 5 Intended status: Best Current Practice BT 6 Expires: September 12, 2013 March 11, 2013 8 Advice on network buffering 9 draft-fairhurst-tsvwg-buffers-00 11 Abstract 13 This document proposes an update to the advice given in RFC 3819. 14 Subsequent research has altered understanding of buffer sizing and 15 queue management. Therefore this document significantly revises the 16 previous recommendations on buffering. The advice applies to all 17 packet buffers, whether in network equipment, end hosts or 18 middleboxes such as firewalls or NATs. And the advice applies to 19 packet buffers at any layer: whether subnet, IP, transport or 20 application. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 12, 2013. 39 Copyright Notice 41 Copyright (c) 2013 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 3. Updated Recommendations on Buffering . . . . . . . . . . . . 4 59 3.1. Recommendations Applicable to Any Buffer . . . . . . . . 4 60 3.2. Buffering recommendations for end hosts . . . . . . . . . 5 61 3.3. Buffering recommendations for edge routers and switches . 5 62 3.4. Buffering recommendations for core routers and switches . 6 63 3.5. Recommendations on Flow Isolation . . . . . . . . . . . . 6 64 4. Buffer Management Methods . . . . . . . . . . . . . . . . . . 6 65 4.1. Examples of subnetwork buffering . . . . . . . . . . . . 6 66 4.2. Examples of methods for active buffer management . . . . 7 67 5. Security Considerations . . . . . . . . . . . . . . . . . . . 7 68 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 69 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7 70 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 8.1. Normative References . . . . . . . . . . . . . . . . . . 8 72 8.2. Informative References . . . . . . . . . . . . . . . . . 8 73 Appendix A. vious IETF guidance for configuring network buffers 9 74 Appendix B. Revision notes . . . . . . . . . . . . . . . . . . . 10 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 77 1. Introduction 79 [RFC3819] provides guidance on the design of subnetworks and 80 networking equipment. This document updates this guidance for the 81 topic of Internet buffer configuration and control. The guidance is 82 aimed at both equipment designers and network operators. 84 All networking devices use buffers to temporarily store packets that 85 are waiting for transmission on an out-going link during traffic 86 bursts or at times when the capacity of the ingress/egress changes. 88 The congestion control algorithms in TCP (and derivatives of TCP) are 89 designed to try to fully utilise the link that has the least 90 available capacity on the path across the network. This is called 91 the bottleneck link. Network link capacities are typically arranged 92 so that it will be rare for a bottleneck to arise in the network 93 core. However, depending on prevailing patterns of traffic, any link 94 might become the bottleneck (within the host, at an edge router, at a 95 core router, at a switch in the subnet between routers or at some 96 middlebox such as a firewall or a network address translator). 97 Modern TCP stacks are capable of filling a link of any capacity. 99 A buffer that simply discards incoming packets when it is full is 100 called a tail-drop buffer. A long-running TCP flow will fill a tail- 101 drop buffer and keep it full, so that there is no longer any space to 102 absorb bursts. This is called a standing queue. Packets arriving at 103 the tail of a standing queue still work their way through the buffer 104 until they emerge onto the link, but this introduces unnecessary 105 delay to every packet, including those from other sessions sharing 106 the link. This can intermittently add intolerable delay to a real- 107 time interactive media session (e.g. voice or video). Also, most 108 Web pages involve dozens of short back-and-forth exchanges, so adding 109 even a small amount of queuing delay to each round can accumulate 110 considerable delay in the completion of the whole task. 112 The recommended way to avoid these problems is to use an active queue 113 management (AQM) algorithm in every potential bottleneck buffer 114 (subnet, router, middlebox or host), and to enable explicit 115 congestion notification (ECN). However, if AQM has not been 116 implemented in existing equipment, the next best option is to at 117 least size the buffer so that it is no larger than needed to absorb 118 bursts. 120 This document gives advice on using and configuring AQM algorithms 121 and ECN, and advice on buffer sizing in the absence of such 122 algorithms. 124 The correct buffer size depends on the link rate, so a common problem 125 is where equipment auto-adjusts its rate, often over a wide range, so 126 the buffer size can be badly incorrect. Advice is also given on how 127 to relate buffer auto-sizing algorithms to rate-adjusting algorithms, 128 and the best static buffer size to configure if auto-sizing has not 129 been implemented. 131 It is difficult to test whether a network might exhibit these 132 problems. They only appear intermittently, because they depend on 133 four pathologies co-inciding: i) a particular buffer has become the 134 bottleneck for a long-running TCP flow, which depends on relative 135 traffic levels in other links, ii) the TCP flow has run for long 136 enough to fill this buffer, iii) the buffer lacks AQM or the AQM is 137 badly configured and iv) the buffer has been badly over-sized. When 138 all four conditions co-incide, the delays can be bad enough to lead 139 to support desk calls. 141 This document updates section 13 of RFC 3819, which gave guidance to 142 subnet designers on the use and sizing of buffers. Appendix A 143 reviews that guidance, which now requires considerable revision in 144 the light of subsequent research. Also, whereas RFC 3819 addressed 145 subnet designers, the advice in this document is relevant to a wider 146 audience, because it concerns buffers wherever they are, including in 147 end-systems and middleboxes not just in subnet technology. 149 2. Terminology 151 The document assumes familiarity with the terminology of RFC 3819 152 [RFC3819]. 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 156 document are to be interpreted as described in [RFC2119]. 158 The term active queue management (AQM) has been applied to 159 technologies that work only at the packet level as well as 160 technologies that identify and police flows with above average rates 161 or that enforce flow-level or user-level policies such as fair 162 queuing. For this document, we will use the term 'AQM' for 163 technologies or parts of technologies that treat packets 164 indiscriminately, and the term 'policing' for the additional 165 technologies that attempt to enforce some level of behaviour or 166 isolation at the flow or user level of granularity. 168 3. Updated Recommendations on Buffering 170 This section updates the rules for network buffers in section 13 of 171 RFC 3819. 173 3.1. Recommendations Applicable to Any Buffer 175 XX Work in Progress, to be included in next revision XX 177 AQM is strongly recommended recommended for any buffer. Auto-tuned 178 configuration is recommended. 180 Explicit Congestion Notification (ECN) [RFC3168] is also strongly 181 recommended for any buffer (this avoids delays due to timeouts after 182 loss). It is safe to enable ECN for routers and servers. If 183 concerns arise over the use of ECN, this can be fully addressed by 184 turning off ECN support at the endpoint. If routers and servers were 185 not to enable ECN, where it is deemed safe, it will not be possible 186 for endpoints to turn it on. 188 Buffer size: if AQM is implemented, there is no harm in having a 189 large buffer to absorb bursts. However, if there is no AQM, it is 190 important to keep the buffer small. 192 o Too little buffering can result in poor utilisation of the egress 193 link, since many traffic flows are not smooth-paced and bursts of 194 traffic may fail to be buffered. 196 o Large buffers can help ensure full utilisation of the egress link, 197 but excessive buffering results in slow response to congestion and 198 in unnecessary delay experienced by any flow that shares the 199 egress link. Such events are not uncommon, since a single long- 200 lived connection using a modern TCP stack can fill any size of 201 network buffer. 203 Auto-sizing is recommended if the line rate is adjustable or auto- 204 adjusts (e.g. setting buffer time, not byte-size). If auto-sizing 205 has not been implemented, a large buffer is not best. Too small a 206 buffer reduces link utilisaiton. If it is necessary to find a 207 compromise size for adjustable line rates, should consider 208 sacrificing some utilisation at lower rates to keep the buffer delay 209 reasonable. 211 3.2. Buffering recommendations for end hosts 213 XX Work in Progress, to be included in next revision XX 215 Large buffers are not best. AQM and auto-tuning/auto-sizing are as 216 applicable in end hosts as in network equipment. 218 ECN may even be appropriate (e.g. on a subsystem such as a NIC), but 219 within a host it should be possible to use back-pressure messages 220 instead. 222 Buffer sizing recommendations specific to end-systems. 224 3.3. Buffering recommendations for edge routers and switches 226 XX Work in Progress, to be included in next revision XX 228 Large is not best. 230 AQM and ECN are strongly recommended. 232 Buffer sizing recommendations specific to edge routers, switches & 233 middleboxes. 235 3.4. Buffering recommendations for core routers and switches 237 XX Work in Progress, to be included in next revision XX 239 Large is not best. 241 Buffer sizing recommendations specific to core routers & switches. 243 3.5. Recommendations on Flow Isolation 245 XX Work in Progress, to be included in next revision XX 247 Still a subject of debate and research. May be able to recommend 248 something here, but more likely will commentate on the debate. 250 4. Buffer Management Methods 252 This section provides informative documentation of current practice. 254 4.1. Examples of subnetwork buffering 256 This section provides informative examples of buffer configuration 257 and their impact on network traffic {TBA: to consider whether to 258 bless, deprecate or merely state each of these practices}. 260 o An Ethernet subnetwork may operate over a range of speeds from a 261 shared 10 Mbps of capacity to over 40 Gbps. The buffering 262 required depends on the link speed and many Many device drivers 263 and operating systems do not adjust their buffering to the 264 available capacity. The first hop link from a host often has a 265 higher speed than the subsequent links along a network path. 267 o Subnetwork flow-control can be triggered when a subnetwork link 268 suffers congestion. An example is the use of Ethernet Pause 269 frames (e.g. by consumer Ethernet switches) to slow a sender 270 emitting traffic towards a congestion switch port. These methods 271 can increase the buffering experienced by the end-to-end flow. 273 o Docsis 3.1 supports transmission up to 300Mbps. A current modem 274 can be plugged into a current network. Then suppose a customers 275 service only supports 10 Mbps, the network equipment may be 30 276 times over-buffered (assuming buffers are dimensioned based on the 277 maximum bit rate). The buffer control amendment may be 278 implemented in the modem, and in its provisioning system to 279 address this type of issue. Similar issues apply for other link 280 technologies, were the offered service is often less than the 281 maximum supported rate. 283 o On wireless, bandwidth (and hence network capacity) is often 284 highly variable, unless you have a fixed point to point link. 285 Even fixed links may use adaptive methods and propagation 286 conditions can cause the capacity to var 288 4.2. Examples of methods for active buffer management 290 This section provides informative examples of active buffer 291 management. 293 While large buffers can lead to an increase in experienced network 294 delay, they do not necessarily impact the flow delay. The issue is 295 not how how much buffering is provided, but how the provided buffers 296 are used to manage the flow of traffic. 298 Several active buffer/queue management methods have been proposed 299 that can significantly improve performance of flows using a 300 (potentially) congested bottleneck. 302 o RED 304 o CoDel 306 o Pi 308 o etc 310 5. Security Considerations 312 Decisions on queue management and buffer sizing are neutral to 313 security considerations if they act indiscriminately over all 314 packets. Recommendations on treatment or lack of treatment at the 315 flow or user-level can have security considerations, which are TBA. 317 The question of whether end-systems respond to congestion signals is 318 a valid security concern, but outside the scope of this document. 320 6. IANA Considerations 322 This document does not require any IANA considerations. 324 [RFC-ED]: Please remove this section prior to publication. 326 7. Acknowledgments 327 This work was part-funded by the European Community under its Seventh 328 Framework Programme through the Reducing Internet Transport Latency 329 (RITE) project (ICT-317700). The views expressed are solely those of 330 the author. 332 The authors acknowledge contributions from: Jim Gettys. 334 8. References 336 8.1. Normative References 338 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 339 Requirement Levels", BCP 14, RFC 2119, March 1997. 341 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 342 of Explicit Congestion Notification (ECN) to IP", RFC 343 3168, September 2001. 345 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 346 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 347 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 348 RFC 3819, July 2004. 350 8.2. Informative References 352 [Appenzeller] 353 Appenzeller, G., Keslassy, I., and N. McKeown, "Sizing 354 router buffers; ACM SIGCOMM '04, pages 281-292, New York, 355 NY, USA.", 2004. 357 [Ganjali] Ganjali, Y. and N. McKeown, "Update on Buffer Sizing in 358 Internet Routers; ACM SIGCOMM Computer Communication 359 Review 36 ACM", October 2006. 361 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 362 793, September 1981. 364 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 365 Jacobson, "RTP: A Transport Protocol for Real-Time 366 Applications", STD 64, RFC 3550, July 2003. 368 [Villamizar] 369 Villamizar, C. and C. Song, "High Performance TCP in 370 ANSNET; ACM Computer Communications Review, 24(5):45-60", 371 1994. 373 [Wischik] , "TCP Buffer Sizing Advice", . 375 Appendix A. vious IETF guidance for configuring network buffers 377 This section reviews previous guidance for configuring network 378 buffers and motivates the need to update these recommendations. 380 Guidance for the use of buffers was provided in section 13 of RFC 381 3819: 383 "each node should have enough buffering to hold one 384 link_bandwidth*link_delay product's worth of data for each TCP 385 connection sharing the link." 387 However, in today's Internet, a deployment following this 388 recommendation would overly allocate buffering for a network link 389 that supports multiple flows. This is discussed in the observations 390 below: 392 o This buffering recommendation is appropriate for a device that 393 supports a single or small number of bulk TCP flows [Villamizar]. 395 o The buffering is unduly large when there are more than a small 396 number of flows (e.g. >10). The goal of sharing between TCP 397 flows requires only that the buffering is sufficient to hold one 398 link_bandwidth*path_delay product's worth of data for the longest 399 path flow. The more flows share a link, the less buffering is 400 needed [Appenzeller], unless the egress link becomes congested 401 with so many flows that there are only a few packets per flow 402 buffered. 404 o Many egress links have a higher level of multiplexing (e.g. >100 405 of uncorrelated flows). This is often found beyond the edge of a 406 network. In this case, the buffer size may be inversely 407 proportional to the square root of the number of flows (for medium 408 numbers . For still higher levels of multiplexing, this may be of 409 the order of the logarithm of the number of flows 410 [Wischik][Ganjali]. 412 o Note that while optimal buffering may be a function of the number 413 of concurrent flows, it is not recommended to tune buffering by 414 dynamically estimating the number of flows sharing a network 415 device or path, or by attempting to classify flows as "long", 416 "short", etc. Such estimates are difficult, due to the wide 417 variety of flow behaviours and the use of aggregation methods 418 (such as tunnels) that hide the traffic of individual flows. 420 o In deployed scenarios (apart from restricted deployments in 421 operator-controlled subnetworks), it is usually impossible for a 422 router or other network middlebox to know the experienced by a 423 flow. In the Internet service model this information is only 424 available to end points (e.g. using feedback provided by TCP 425 [RFC0793] or RTCP [RFC3550]. It is therefore not usually possibly 426 for operators to use the end-to-end path delay calculation to 427 determine the size of buffering when configuring network 428 equipment. 430 The discussion in section 13 of RFC 3819 summarises: 432 "In general, it is wise to err in favor of too much buffering rather 433 than too little." 435 While this advice may have been appropriate when routers and 436 subnetworks with small numbers of flows and low buffer memory 437 [Villamizar], this advice is now not appropriate for many modern 438 networks. 440 Section 13 of RFC 3819 also motivates using methods such as Active 441 Queue Management, AQM and [RFC3168]. However, at the time of writing 442 there was little deployment experience, and little understanding of 443 how to configure these methods. We now argue that these methods 444 should be considered for deployment in operational networks. 446 Appendix B. Revision notes 448 RFC-Editor: Please remove this section prior to publication 450 Draft 00 452 o This contains the first draft for comment. 454 Authors' Addresses 456 Godred Fairhurst 457 University of Aberdeen 458 School of Engineering 459 Fraser Noble Building 460 Aberdeen, Scotland AB24 3UE 461 UK 463 Email: gorry@erg.abdn.ac.uk 464 URI: http://www.erg.abdn.ac.uk/~gorry 465 Bob Briscoe 466 BT 467 B54/77, Adastral Park 468 Martlesham Heath, Ipswich IP5 3RE 469 UK 471 Phone: +44 1473 645196 472 Email: bob.briscoe@bt.com 473 URI: http://bobbriscoe.net/