DCLCRG - IETF90 - 25 July 2014 - Toronto, ON, Canada Chairs: Fred Baker (Cisco), Lingli Deng (China Mobile) Note taker: Dirk.Kutscher@neclab.eu Proposed Agenda at http://trac.tools.ietf.org/group/irtf/trac/wiki/DclcIetfNinty Fred: problem statement - Erik Nordmark: one aspect of management is TCP management (flow), other aspect is DC with multiple tenants -- are we thinking about both? - Fred: there is a variety of aspects, so yes Fred: data sharing -- what data can people share? - Richard Scheffenegger: some companies cannot share anything, some operators have very high security standards, might be difficult - Lars Eggert: floated this idea in London, RGs can give themselves different rules -- can relate to data sharing, perhaps an NDA could be signed. Also universities are running pretty large data centers - Dirk: carrier DC measurements could be feasible -- product-specific performance measurement is different. Hope someone finds this work interesting, so that they can share data - Lars Eggert: without bringing data, work in the group might be seriously impeded Bob Briscoe: Reducing Internet Latency -- a survey of techniques and their metrics - http://riteproject.eu/publications - http://riteproject.files.wordpress.com/2014/07/latency_preprint-31.pdf - http://www.bobbriscoe.net/presents/1407ietf/1407latency-survey.pdf - Lars: this is for Internet latency reduction not for DC,right? - Bob: there are different flows in the picture Vic Liu: Gap Analysis on Virtualized Network Test - http://tools.ietf.org/html/draft-liu-dclc-gap-virtual-test - Lars on Throughput slide: figure 1 surprising. Hypervisor not getting enough cycles to send packets? - Vic: VM does not get enough cycles. - Lars: timers inside VM cannot be synchronized at high accuracy? - Vic: cannot synchronized, becuase it base on two different linix systems - Karen Sollins on Troughput: how many physical CPUs? - Vic: 2 cores, 24 vCPUs - Karen: you don't pin vCPUs physical CPUs? - Vic: we do pin them - Dirk: are you sharing measurement data with vendors - Vic: currently one vendor Dapeng Liu: Latency Test Report - http://tools.ietf.org/html/draft-shi-dclc-latency-test - Lars: buffer threshold -- is that the queue depth? - Dapeng: yes - Lars: threshold of 1% means limiting to 1% of max capacity? - Dapeng: yes - Lars: would be nice to see absolute values, not just percentage - Lars: by using percentage, you are losing a bit of information - Lars (on "The Mininum Delay"): minimum latency is lowest latency during a test run? - Dapeng: yes - Richard Scheffenegger: did you have a ramp up phase? measuring after couple of seconds? - Lingli: there are different phases for each test round, a starting phase, a stable phase and a teardown phase, the statistics are drawn from the phase in the middle when the load keeps stable for 30 to 60 seconds. - Al Morton: how are repeatable are the results? can you repeat one week later or so? There might be processess sharing the buffer etc. - Dapeng: good point. different things affect the test. With different vendors' products, results could be different. Fred Baker: TCP Congestion Control as Latency Control (Tsinghua Work) Fred: maybe AQM is not the right place to optimize Alexander Zimmermann: surprised to see that newReno than SACK-TCP? Fred: no good answer Fred: conclusion: using delay-based procedure helps a bit but does not solve incast cold Fred: what are the research options, next steps? Lars Eggert: TCP pretends that all flows are unrelated? Ideas from integrated congestion control (congestion manager) might help here Lars (on map-reduce slide): with SSD you have less jitter, so effects are more tightly synched than before Fred: Conclusions: TCP and related protocols should use delay-based or jitter-based procedures such as FAST or CDG Fred: what to do to move away from Map-Reduce-based apps Al Morton : excited about researching and benchmarking connection Haibing Song: TCP parameter control - http://tools.ietf.org/html/draft-song-dclc-tcpdc - Lars: default TPC parameters were chosen to be safe in all possible scenarios. Assumming that there is an all-knowing entity that can derive optimal parameters for all hosts is a very strong assumptions -- traffic patterns might change drastically. How stale is info that centralized controller sees? - Fred: what parameters would you set? RWIN? effective windows? - Haibin: initial congestion window size - Richard Scheffenegger: what timescales? daytime/nightime seems rather coarse. Lemmings is more about microseconds. If you reduce timescales, what about overhead - Haibin: no answer to that - Q: if you change TCP parameters dynamically -- how about fairness? - Haibin: no answer to that - Fred: look at MIT Fastpass -- slotted Aloha in DC Bob Briscoe: Accurate TCP ECN - http://tools.ietf.org/html/draft-ietf-tcpm-accecn-reqs - "An Enabler for New E2E Transport Behaviours: More Accurate ECN Feedback Reflector (AccECN)" - Lars on DCTCP: safe operation problem is fixed in Windows and FreeBSD stacks - Lars on extended feedback: do we need to modify DCTCP for this? Would unmodified stacks still work? - Bob: because of cap neg, unmodified stacks would still work - Lars: do wo need to take as many bits as possible or now or better reserve some? - Bob: trying to make it modular - Bob: anyone has concerns regarding turning off delayed ACKs - Gorry Fairhurst: DCs do care about delayed ACKs - Lars: cannot make decision now - Gorry (on Open Design Issues): what are the implications? - Bob: with delayed response, it may take longer to learn that there is specific codepoint - Richard Scheffenegger: that's an issue of receiver behviour -- not of wire protocol - Lars: super-efficient design may be a problem -- difficult to understand, review, debug... Discussion: Lingli on data sharing: better start thinking about what data you actually need Dirk: yes Lars: there may also be existing data that could be leveraged if we specify better what we need Fred -- show of hands: - most of us find this interesting - many want to have another meeting (perhaps IETF91) - people should be on mailing list Bob: currently more random presentations -- should we give the work a purpose? NOT DISCUSSED: Bob Briscoe: Network Performance Isolation in Data Centres using Congestion Policing - http://tools.ietf.org/html/draft-briscoe-conex-data-centre