idnits 2.17.1 draft-ietf-lmap-router-buffer-sizes-ksubram-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- == The document has an IETF Trust Provisions of 28 Dec 2009, Section 6.c(i) Publication Limitation clause. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 27, 2014) is 3468 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 793 (ref. 'Postel') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2893 (Obsoleted by RFC 4213) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force K. Subramaniam, Ed. 3 Internet-Draft D. Loher 4 Intended status: Informational Microsoft 5 Expires: April 30, 2015 October 27, 2014 7 Router Buffer Sizes In The WAN 8 draft-ietf-lmap-router-buffer-sizes-ksubram-00 10 Abstract 12 This draft identifies the set of data that needs to be collected, and 13 analyzed to quantify router buffer sizes used in routers in the Wide 14 Area Network (WAN). The scope of this draft is limited to WAN links 15 that have link latencies of 40 to 150 milliseconds. 17 Reducing router buffer sizes has many advantages, the most important 18 being cost. However, there is not much data available today to 19 effectively calculate this. This draft details use cases for the 20 study, and lists data that needs to be taken into consideration to be 21 able to quantify the size of router buffers. The details of the 22 individual measurement metrics are beyond the scope of this document. 23 Neither does the draft identify methods to gather the data. What it 24 identifies is a need to be able to collect, and report this empirical 25 data in a readable fashion thus providing the ability to study and 26 compare data in a more standardized method. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on April 30, 2015. 45 Copyright Notice 47 Copyright (c) 2014 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 This document may not be modified, and derivative works of it may not 61 be created, except to format it for publication as an RFC or to 62 translate it into languages other than English. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 3. Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3.1. Discards with small buffer sizes . . . . . . . . . . . . 4 70 3.2. Discards with large buffer sizes . . . . . . . . . . . . 4 71 4. List of required data for study of router buffer sizes . . . 4 72 4.1. Number of concurrent flows, N . . . . . . . . . . . . . . 5 73 4.2. Length of a flow, L . . . . . . . . . . . . . . . . . . . 6 74 4.3. Packet Discards, D . . . . . . . . . . . . . . . . . . . 6 75 4.4. Reason for Packet Discards, R . . . . . . . . . . . . . . 6 76 4.5. Resolution of time interval, T . . . . . . . . . . . . . 6 77 4.6. 5 Tuple Flow Identity, I . . . . . . . . . . . . . . . . 7 78 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 7 79 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 81 8. Security Considerations . . . . . . . . . . . . . . . . . . . 7 82 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 83 9.1. Normative References . . . . . . . . . . . . . . . . . . 8 84 9.2. Informative References . . . . . . . . . . . . . . . . . 8 85 Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . 9 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 88 1. Introduction 90 "How much buffering do core links need?" is a question that has been 91 under study for a while. The question boils down to quantify buffer 92 sizes and yet achieve 100% utilization on links with maximum 93 throughput at a feasible cost. 95 Buffer design could substantially increase costs. While over- 96 buffering seems intuitive it can complicate the design of high speed 97 routers, lead to higher power consumption, more board space, and 98 lower density. It can actually increase end-to-end delay in the 99 presence of congestion. This can make congestion more persistent. 100 Additionally, there is always a tradeoff between buffer sizes and the 101 capacity of a router. 103 On the other hand, under-buffering while doing away from the above 104 cons of over-buffering could lead us away from our primary goal of 105 100 percent link utilization. This could happen in a scenario using 106 a simple Additive Increase Multiplicative Decrease (AIMD) for TCP 107 flows when the sender has packets to send but the window size 108 advertised is less and as a result the receiver consumes far less 109 that it could. 111 The rule of thumb for router buffers has been defined as [Villamizar] 112 : B = 2RTT*C. Where B, was the buffer size, RTT the Round Trip Time, 113 and C the capacity of the bottleneck link. [RFC3429] also talks 114 about the buffer size being at least one TCP window size. 116 However later studies [Appenzeller], show that the rule of thumb 117 works either for a single flow or a perfectly synchronized large 118 number of flows. Further they postulate that the buffer size is 119 actually (2RTT * C)/sqrt(n), where n is the number of flows. This 120 indicates a significant reduction in the buffer chip promoting lower 121 costs. 123 As seen, there have been proponents for large buffers and small. 124 However, most of these studies are based on theoretical models and 125 simulations. Today, there is no model or protocol to mine big data 126 from a providers network to be able to answer this question 127 efficiently. The nature of WAN traffic can be uncertain and varying. 128 Furthermore the traffic could vastly vary between individual ISPs. 129 This document implored the need for a model of mining empirical big 130 data in a providers network to be able to build a network that drives 131 down the $/GB and at the same time maximizing link utilization. 133 This document outlines use cases for the study of router buffer sizes 134 in the WAN and identifies the data that needs to be collected and 135 analyzed. It could be further extended to the edge and datacenters, 136 but it is outside the scope of this draft. 138 2. Terminology 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 142 document are to be interpreted as described in RFC 2119 [RFC2119]. 144 3. Use Case 146 From an operator's perspective it is imperative to monitor discards 147 and link utilization over WAN links to be able to study the router 148 buffer sizes. But these alone will be unable to provide an operator 149 with enough information as to why the discards happened. The two use 150 cases outlined here argue that more data needs to be collected, 151 reported, and analyzed. 153 3.1. Discards with small buffer sizes 155 Trans-pacific and trans-atlantic links of latencies in the range of 156 150 ms and 90 ms respectively, with low link utilization of 30 157 percent, and small buffers have seen dropped packets. The most 158 intuitive method has been to increase the buffer sizes for these 159 links on noticing packet discards. While this might alleviate the 160 issue temporarily, unless the right problem has been identified this 161 could readily lead to buffer bloat which has many issues on its own. 163 3.2. Discards with large buffer sizes 165 Operators have also observed dropped packets on WAN links within 166 North America with as large buffers as 125 MB per port with link 167 utilizations of 60%. If this happens even if the router has not been 168 specifically configured to drop certain type of packets, or there are 169 no routing misconfigurations, then clearly the issue here is not the 170 size of the router buffer. 172 4. List of required data for study of router buffer sizes 174 This section talks about the absolute minimum requirements of the 175 type of data that needs to be collected to be able to effectively 176 quantify router buffer size. 178 +---+-------------------------+-------------------------------------+ 179 | | Data | Details | 180 +---+-------------------------+-------------------------------------+ 181 | 1 | Number of concurrent | For aggregate traffic | 182 | | flows, N | | 183 | 2 | Length of the flow, L | [Flow start time - flow end time] | 184 | 3 | Packet Discards, P | Per Interface | 185 | 4 | Reason for Packet | Buffer overflow, configuration, | 186 | | Discards, R | etc. | 187 | 5 | Resolution of Time | [Flow start time - flow end time] | 188 | | Interval, T | | 189 | 6 | 5 tuple flow identity, | Src IP, Dest IP, Src port, Dest | 190 | | I | Port, Protocol. | 191 +---+-------------------------+-------------------------------------+ 193 Table 1: List of required data for Router Buffer Sizes 195 A service provider needs to take into consideration several 196 attributes to determine the right buffer size for its WAN routers. 197 This section enlists the details as to why the five above have been 198 identified as the minimum essential data needed to aid the study of 199 router buffer sizes. 201 4.1. Number of concurrent flows, N 203 Studies [Feldmann] and [Stevens] show that 95% of flows in the 204 internet today are attributed to TCP [Postel] flows. The nature of 205 these flows can vary significantly not only with various time 206 periods, but also between providers. Flows that spend most of their 207 time in slow-start require significantly less buffering than flows 208 that live mostly in congestion avoidance. Due to this it is 209 important to identify the type of concurrent flows that can live on a 210 WAN link. 212 Short (non-persistent) flows are those that live for less than one 213 RTT, and large (persistent) flows are those whose lifetime is larger 214 that one RTT with congestion overhead. Internet measurements [Avra] 215 show that while a smaller number of large flows contribute to maximum 216 packet transfer, short flows dominate most TCP sessions and large 217 flows are known to have a larger effect on buffer sizes. These 218 combination flows could in turn have an effect on Round Trip Time 219 (RTT), loss probability and flow lengths. The ability to detect 220 large flows is necessary because while the flows can be constant in 221 steady state, the aggregate traffic can keep changing due to various 222 arrival and departure rates. There needs to be a way for the number 223 of concurrent flows to be collected and analyzed with the granularity 224 of the lifetime of short flows, as low as one millisecond. 226 4.2. Length of a flow, L 228 Length of a flow can be defined as its duration: [flow stop time - 229 flow start time], or the number of packets/bytes sent in this time 230 duration. Identifying the length of flow in a provider's network 231 will give information of the mix of short and large flows that are 232 present in the WAN. This will lead to modeling implications in TCP 233 flow control. 235 4.3. Packet Discards, D 237 Number of packet discards per interface is probably the most 238 important metric. Of this the number of outward (WAN) facing 239 interface discards would be more intuitive to the study of buffer 240 sizes. Interface discards can be referred to in [RFC2893] 242 4.4. Reason for Packet Discards, R 244 There can be several reasons for packet discards especially when it 245 is observed on less utilized links. Some of them could be due to 246 routing misconfigurations, or designed to drop certain packets due to 247 configurations. Clearly stating a reason as insufficient buffer will 248 help narrow down the data required. This is especially true in the 249 case of smart buffer allocations when some ports run out of buffers 250 but not others. We could observe that a port has been allocated 251 only, say, 30 percent of the available total buffer space but is 252 experiencing the highest utilization and as a result of that is 253 seeing packet drops pointing to the fact that dynamic buffers' smart 254 allocations scheme is not adaptive and predictive to the nature of 255 the WAN traffic. 257 4.5. Resolution of time interval, T 259 The time interval should be granular such that it captures not only 260 the number of concurrent flows in steady state but also the aggregate 261 traffic over the lifetime of a short flow. It should also be able to 262 correlate the discards per interface to the number of concurrent 263 flows. 265 Today via IPFIX we can calculate the number of concurrent flows. Via 266 Sflow counters or flows, we can calculate the discards. Using 267 counters requires upto two times the granularity set for any changes 268 to be visible due to Nyquist rate. Reducing the counter export 269 interval would increase the responsiveness, but at the cost of 270 increased overhead and reduced scalability. On the other hand, 271 packet sampling automatically allocates monitoring resources to busy 272 links, providing a highly scaleable way to quickly detect traffic 273 flows wherever they occur in the network. Responsiveness is 274 important for a more stable control. 276 4.6. 5 Tuple Flow Identity, I 278 5 tuple flows have a source IP, destination IP, source port, 279 destination port, and protocol to identify endpoints for 280 unidirectional flows. Having this functionality gives the network 281 operator a way to identify the offending flows, legitimate elephant 282 flows, and high priority flows which may happen at certain periods 283 during the day. Being able to separate traffic using the 5 tuple, 284 further increases the strength of the sample set of empirical data 285 available for the study of router buffer sizes. 287 5. Conclusion 289 We see that there are numerous issues at different layers that have 290 an effect (directly or indirectly) on the sizing of router buffers. 291 We also notice that there is no study that takes empirical data into 292 consideration. Ideally, what would be required is an all knowing 293 oracle that sees the traffic flow on an end-to-end network across all 294 layers. Due to a lack of the resource, the first step to the study 295 of router buffer sizes is to effectively mine the big data repository 296 of a provider for the data identified in this draft. 298 6. Acknowledgements 300 7. IANA Considerations 302 This memo includes no request to IANA. 304 All drafts are required to have an IANA considerations section (see 305 the update of RFC 2434 [I-D.narten-iana-considerations-rfc2434bis] 306 for a guide). If the draft does not require IANA to do anything, the 307 section contains an explicit statement that this is the case (as 308 above). If there are no requirements for IANA, the section will be 309 removed during conversion into an RFC by the RFC Editor. 311 8. Security Considerations 313 This document does not introduce new security issues. 315 9. References 316 9.1. Normative References 318 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 319 Requirement Levels", BCP 14, RFC 2119, March 1997. 321 9.2. Informative References 323 [Appenzeller] 324 G. Appenzeller, I. Klesassy, and N. McKeown, "Some 325 Internet Architectural Guidelines and Philosophy", 2004, 326 . 330 [Avra] Konstantin Avrachenkov, INRIA Sophia Antipolis, 331 "Differentiation Between Short and Long TCP Flows: 332 Predictability of the Response Time", 2004, . 335 [Feldmann] 336 A. Feldmann, J. Rexford, and R. Caceres, "Efficient 337 policies for carrying Web traffic over flow-switched 338 networks", Dec. 1998, . 341 [I-D.narten-iana-considerations-rfc2434bis] 342 Narten, T. and H. Alvestrand, "Guidelines for Writing an 343 IANA Considerations Section in RFCs", draft-narten-iana- 344 considerations-rfc2434bis-09 (work in progress), March 345 2008. 347 [Postel] J. Postel, "Transmission Control Protocol", Sep. 1981, 348 . 350 [RFC2893] K. McCloghrie, F. Kastenholz, "The Interfaces Group MIB", 351 Jun. 2000, . 353 [RFC3429] R. Bush and D. Meyer, "Some Internet Architectural 354 Guidelines and Philosophy", Dec. 2002, . 356 [Stevens] W. R. Stevens, "Transmission Control Protocol", 1994, 357 . 359 [Villamizar] 360 C. Villamizar and C. Song, "High performance tcp in 361 ansnet", 1994, . 364 Appendix A. Additional Stuff 366 This becomes an Appendix. 368 Authors' Addresses 370 Kamala Subramaniam (editor) 371 Microsoft 372 Mountain View, CA 94043 373 US 375 Phone: +1 919 345 8778 376 Email: kasubra@microsoft.com 378 Darren Loher 379 Microsoft 380 Redmond, WA 98052 381 US 383 Email: daloher@microsoft.com