ICCRG meeting minutes, IETF 80, Tuesday 29 March 2011, 9:00-11:30 Chairs: Michael Welzl, Murari Sridharan Murari Sridharan: Problems in a Multi-tenant Datacenter Questions: Murari Sridharan: Should ICCRG/TCPM look at it? David McDysan: Data centers moving into ISP networks? Murari Sridharan: Comment is from Bob Bob Briscoe: Similar experiences with very large data centers, need better controls. Chris ??: Content is getting closer to customer, get content as fast as possible; optimizing TCP connect delay is important. David McDyson: How many switches routers and computers do we go through? Support this. Both consumers/enterprise users shift to apps. With application emphasis (technical plenary) packet may traverse more processors in the future. Lars Eggert: As shown in the plenary. Data center is on the agenda to the IRTF. This work could be in scope for a new data center research group. Murari: This talk is partly motivating why IRTF should look into this. Because of a desire for high performance in data center, people are providing ad hoc solutions to shallow buffers; concern that these mechanisms will not be confined to consenting parties in the data center but will trickle out on the internet. Aaron Falk: Are these only adhoc solutions? Murari: Flow fairness is a fundamental issue. More than performance problems with ad hoc solutions. Building solutions in the Hypervisor under the TCP stack. These solutions should trickle up into TCP/IP with time: has to move into OS stacks and shipped. Aaron Falk: Why would bandwidth partitioning through VLANs not solve the problem? Murari: The limit for VLANs is in theory 4K, practice is 512K. Dealing with scale and dynamic configuration. e.g. for VM migration. Lots of research proposals, e. g., in SIGCOMM. [??]: So two issues, Nature of data center environment, mobility of virtual machines. Dinesh(Cisco): Trying to reduce buffers to improve latency. See DCTCP. Not just buffer reduction, but task completion time. Not about VLANs. Sawtooth is an issue. With buffering, this results in jitter. In DC, this is to be avoided. Application completion time, jitter sensitivity, etc., are all reasons to look at that. Murari Sridharan: Data Center TCP (DCTCP) Questions: Ken Colbert: What are the numbers for slide 8 (Data Center Transport Requirements?) Murari: They should be in the associated paper. Lars Eggert: Earlier slide said cut by 50%, slide 18 uses another algorithm: cut with probability alpha? Murari: Cut by 50%, but not always. Actual algorithm on page 19, earlier slides were examples of process. Slide 20 is empirical measurement, not a simulation. David McDysan: Looks interesting, might be a good idea for IRTF. Look at this in a wider scale environment with larger round trip times. Murari: Don't know if instantaneous works in larger scale. Aaron Falk: Please submit IPR and declare Issues. Murari: IPR disclosure is coming. Richard Scheffenegger: Paper lacks a discussion of flow fairness. This is crucial for the Internet. Plans to investigate this? Murari: Data from inside data center is available. Can be shared on the list. ?? from Google: Question about value of parameter K under various workloads, large RTTs. Murari: K is independent of the workload. Yes, may need to better understand behavior with variable RTTs. Nandita Dukkipati: Have you tried this with Linux? Murari: I am obligated not to look at Linux. Jim Gettys: Recently a longstanding bug has been found in Linux output queueing for ECN. Tim Shepard: What is the effect? Is this with Linux used as an end- host or as a router? Jim Gettys: Not clear on details, just a heads up. Andrew McGreggor: Patch available? Murari: At the moment research. Ivo Yaschenko, Yahoo: Have you looked at UDP? Murari: Yes, With UDP the ECN marks were not going up to the flow, our implementation fixes this. Michael Menth: What is the fairness of TCP vs. DCTCP? Murari: There will be fairness differences. Bob Briscoe: How do drops effect DCTCP? Murari: Drops affect TCP and DCTCP pretty much equally. Bob Briscoe: Taking DCTCP to the Internet - what happens with existing flows setting queues low in AQM would disrupt such traffic. Ivo ??: Why did you choose ECN? What about another signaling mechanism? Murari: ECN is there in silicon. We were interested in an end-to-end solution and did not look at other signaling mechanisms. Dinesh (Cisco): IEEE stuff requires state in the switches. Beauty of DCTCP is that the state is in the end. Murari: we would need a new packet format to provide ECN information to parties other than the sender or non-TCP parties. Jukka Manner: Comparison with AQM is not fair. AQM configuration caused underutilization. What is the effect of changing beta/delta in the MD (making it smaller) in DCTCP? Murari: Has not been varied. Means that the feedback will be delayed. Incast mitigation important. Lars Eggert: IRTF is subject to note well. IPR disclosure needed. If the IPR disclosure is delayed, a slide would be appropriate, just stating that an IPR disclosure will be made. Murari: An IPR disclosure will be coming. Mirja Kuehlewind: Chirping for Congestion Control Questions: ??: Single hop or multi-hop? Mirja: end to end. Phasor Anlon??: Simulator? How was the bandwidth bottleneck controlled? Mirja: Simulator similar to ns-2 ?? red shirt: Time period over which the rate is controlled is larger than the interpacket-gap -- but instantaneous rates vary when chirping. Real-world rate control count packets and become active afterwards, this will have an impact. In a simulated network this is different, these problems might pop up in real life. Thresholds can also mess up the timing. Mirja: using code from the real Linux stack in the Simulator. [discussion of delays introduced] Simon Leinen: This is a great idea. One application for this would be in 3G and wireless networks. Central server could manage mobiles. Consider looking at this area to validate the work. Mirja: We are looking at the simpler scenario with one round-trip time feedback for now. Andrew McGregor: Consider that path may not have a defined bandwidth. If you are going to use this in more general environment, look at more sophisticated statistical models. Chirps may run into bandwidth rate limiting problems, etc. Have you thought of something like a Gray code? Mirja: Chirping provides an estimate, do not need to chirp continuously. Dave McDyson: Interesting research. Regarding chirps interacting, what about checking on phasing to avoid synchronization? Perhaps someone could help model/simulate these problems to help your research. Mirja: Challenges are at different level. Bob Briscoe: Capacity Sharing Architecture Design Team update Discussion postponed. Gorry Fairhurst: A TCP Modification for Variable-Rate Traffic Ken Colbert: Why 6 minutes? Gorry: Longer than any application will go idle and then resume its rate? Mirja: if you can't come up with the right number, maybe it's the wrong approach? Bob Briscoe: What about looking at the history of the variation -- if it's been stable. Use the variability of the link (congestion) to determine the timeout. Gorry: This applies to links without congestion only. If everything is ok, twice the flight size worked better. Murari: This proposal is probably bad for the Data Centers. We tried similar things before looking at ECN. Transports tended to synchronize some behaviors -- bursts? Gorry: We looked at long internet paths. Data center case is different. Murari: You'll have to explain who should use this. Mark Handley: Reason for the decay in CWV is that the competing traffic changes. Restarting with an inflated window is harmful - if someone else has started sending during that period, everything goes horribly wrong, everyone gets congestion, everyone backs off. Gorry: We are proposing taking a risk that congestion will be caused (like that), in return for maximizing performance. If congestion is encountered, the window will back off again. That's why we bring this up in ICCRG, please provide feedback. Michael Welzl: Really use previous window after idle? Quick tought: If app stops sending data, decay the rate to still get some feedback. Gorry: This will be bad for VoIP, for example. We are trying to make it possible for Apps to go idle, not required to send traffic to keep feedback going, going back to slow start is unsatisfactory. Apps should not have an incentive to send idle traffic. Yuchung Cheng: Like this idea. CWV makes persistent connections useless. In the front-end we have to disable CWV. What about congestion management concepts -- could that help with deciding when to back off? Gorry: Comment referred back to Briscoe, once congestion arises, TCP congestion control applies and TCP backs off. Andrew McGregor: Stack can have a lot of information. When the app is idle, we loose information. In Linux we tried to weight old evidence lower than new evidence to improve response after an idle time. TCP should weight all kinds of evidence at once. Gorry: Related to Bob's talk Mirja: I think the problem may lie in slow start, and that this is not the right approach to solve the problem. Nandita Dukkipati: CWV is a real problem. Today's stacks are much better. Timeouts are less frequent. We need experiments for the Internet to determine whether CWV is useful. In DC, there is evidence that CWV is harmful. Gorry: Yes, large initial windows were disruptive, Agree more experimentation necessary to meet this challenge. Michael Scharf: Performance and Fairness Evaluation of IW10 and other Fast Startup Schemes ??: question about quick start. There are situations where quick start performs better.