BOF: Low Latency Low Loss Scalable throughout (L4S) Draft Agenda for Berlin TUESDAY, July 19, 2016 1400-1600 Afternoon Session I Potsdam I Chairs: Lars Eggert, Philip Eardley == Draft agenda ==s 1. Introduction - Chairs Tim Chown: is there a mailing list? Yes: tcpprague@ietf.org 2. The problem and very high-level solution - Bob Briscoe - Yuchung (Google): do you need to track the per-flow state? - Bob: no, just ECN bit - Yuchung: Does the AQM for classic traffic turn the elephant into a rabbit? - Linda Dunbar (Huawei): How do you prevent the problem of queueing in a core router? Do you need this there? - Bob: Usually you don't get queueing in a core router. You don't want drop or delay in the core. - Bob: Once you deal with your primary bottleneck, in the longer term you could put a queue like this in, but queuing tends to get pushed to the edge. 3. Demo: L4S in action - Koen De Schepper - Jana Iyengar: explain the second graph - Koen: One cluster of delay is retransmit timeout of 200ms - Dave Taht (bufferbloat.net): what was the baseline RTT? - Koen: 7 msec - Dave Taht: do you have data with larger (> 100msec) RTTs? - Koen Yes 4. L4S Applicability to Mobile, without flow inspection - Kevin Smith - no questions/comments 5. L4S in a 4G/5G context - Ingemar Johansson - no questions/comments 6. DCTCP evolution - Praveen Balasubramanian - no questions/comments 7. Discussion about the technology - Yuchung: Curvy Red on classic traffic doesn't solve the whole problem, still need DCTCP - Bob: DCTCP gives lower latency. Want to enable new applications to motivate deployment - Yuchung: can't you tune your AQM - Bob: if you tune too hard, lose throughput - Magnus Westerlund: interactions with video interface (e.g., frame intervals) - Koen: concern about rapid changing throughput. We can guarantee low latency, but throughput might change rapidly. Application must adapt. - Magnus: how well does the congestion signal interact, different from what a TCP connection would see. - Koen: allows bursts (unlike FQ). - Dave Taht: work is an outgrowth of the AQM working group. What AQM has specified is vastly superior to what has been deployed today. Would find your presentation more convincing if you compared it to FQ-Codel. - Koen: could show you. - Dave Taht: FQ-Codel show near-zero latency for many forms of traffic. - Mat Ford: How does the end system know that it is on a network with Dual-Q coupled AQM? - Praveen: end host doesn’t know - Bob: DCTCP just reverts to TCP Reno upon loss. With classic ECN end system would see delay before ECN signal and could guess that class ECN would be in use. - Yuchung : should we change routers before we update hosts? - Koen: in parallel, gradually. DCTCP exists today. You can make a TCP that works in both. - Yuchung: Does this exist in the drafts? - Bob: need fallback if classic ECN is in use instead of L4S ECN - Pat Thaler (Broadcom): cover datacentre as well as carrier? - Bob: yes. original reason I started doing this was that there were too many datacentres to deploy DCTCP (in BT). Bottleneck is likely to be on the ToR switch; solution for multi-tenant DCs. - Pat Thaler: some apps care about low lost /latency; others don't. - David Black: Where are the incentives to keep the other senders out of the low latency queue? - Bob: you've misunderstood: all large flows can also be in the L4S queue ... - David Black: Assumes only DCTCP is supposed to use ECT(1), what is the incentive for not using this codepoint by other applications that are not DCTCP? - Koen: same as today, you could always just ignore ECN signal. Can employ other techniques such as rate limiting to stop cheating. - Lars: can someone misbehaving destroy the queue for everyone? - Praveen: will react badly to loss. - David Black: some degree of trust that ECT1 will be set appropriately. - Bob: This needs to be mentioned in the security considerations section of draft. - Jana: It's not just a security consideration - it does not need to be an attacker, just a mistaken use. - Koen: This is the same situation as with current network with classic ECN. Eventually you just start dropping packets, as you do in any other AQM. - Jana: The AQM WG may be about to close, what do you think your recommendation will be to address this? - Lars: Hold this question until the end of the session. - Roland Bless : AQM can achieve low delay if you have more than a few flows. Otherwise, you need to change TCP. - Koen: tcpprague has to decide how to be scalable, future safe. Updates may require changes to the queues in routers. - Roland Bless : It's not so easy to update the network routers after a change. - Christian Huitema: There is a lot of discussion about the relation of ECN feedback with two queues vs. multi-queue (e.g., FQ-Codel). With flow isolation, each flow can have its own feedback. If you look at what you are doing, you are specifying a new network feedback mechanism (please warn me very quickly if the queueing is going up). I have reservations about embedding the square root coupling into the mechanism; TCP is not square root. It's a hack. - Koen: with slow start, short flows don't get any feedback. With short RTT you can respond quickly. Not so much with longer RTT flows. There are other mechanisms for short flows. - CH: very small fraction of flows are long flows. - Koen: video quality performance depends on throughput and low latency. VR video needs lower latency/higher throughput. - CH: be careful about hacks - Bob: I agree it is worrying to put a formula into the architecture. Having a shallow threshold on L4S side allows you to test the capacity more quickly to get your short flows going faster. - Colin Perkins: good story for not marking a class TCP traffic with ECT(1). Not sure about RTP conferencing application; incentives might not apply in that case. - Bob: I want those flows in the L4S queue. - Colin: If I build a non-adaptive video app I may just ignore all the CE-marks, that's an important case to consider for other apps. - Bob: Koen has done experiments. If this thing moves to loss, will self-control itself. Problem is the same as what we have now. - Koen: If someone sends non-CC traffic above bottleneck share, same problem happens. - Colin: The disadvantage is that you push the TCP flows away quickly. - Koen: today as well. Don't necessarily have a bigger effect because the queues are smaller. - Yuchung: What about delay-based CC? - Koen: There is some discussion ongoing. Hard for delay based CC to coexist with loss based CC (fairness). - Yuchung: ? - Koen: fairness, I don't know - Bob: I want operators to change things to the right way quickly. - Yuchung: So, you think ECN is a more straightforward approach? - Bob: if delay is a problem, using delay measurements to control the system won't help. - Koen: A combination of loss and delay can help (improve ECN feedback). - Lars: don't want to go into loss vs. delay debate. - Jana: The model here is a separation of queues in bottleneck routers; one low latency, one marking ECN bits. Sender response to that also needs to be considered - - you could look at doing something different in TCP-Prague. - (no answer) - Phil: question about FQ-Codel comparison; see Koen at the end. 8. Work required by the IETF - Marcelo Bagnulo 9. Discussion of the work required by IETF - Colin Perkins: think you are missing some significant components. Only mentioned TCP and IP. What about non-TCP transports - Marcelo: some mention of others. Think we should initially focus on TCP - Colin: existing RFCs of non-TCP transports using ECN. You are violating MUSTs - Marcelo: and they also violate TCP MUSTs. Just updating RFC 3168 (?) is enough - Colin: if you are updating the response at the IP level; you need to update all of the other RFCs. - Bob: we have just written text that does what you suggest we do. - Colin: list missing things like circuit breakers in AVT. - Bob: updating existing RFCs. New work can adapt. - Lars: you are changing things underneath then. - Mirja: ongoing work we are doing anyway - Gorry: not just RFCs, they are deployed. Other areas of the IETF that will be impacted. - Bob: I checked ECN over RTP over UDP. Nobody using ECT(1). - Colin: this affects existing work. Not listed. Needs to be added. - Tim Shepard: because you are changing things underneath them, they need to change. But if they are using ECT0 they don't need to change. - Lars: - Tim: what changes for ECT(0)? - Colin: discuss based on this that says nothing about ECT(1). Impacting other drafts even if they don't use ECT1. - Lars: proponents need to check other work that is standardized and ongoing to see what will be affected. - Mirja: Some changes to ECN are changes that are independent of L4S. These need to be changed anyway. - Christian H: want to change to have high-volume feedback from the network. Need to do the checks of other work. - Colin: impacts more WGs than tsvwg. - Lars: already a problem that the tsvwg folks need to take into account. - David Black: author of RFC 3168 and tsvwg chair - agree with Colin. Experimentation is called for. RFC3168 prohibits experimentation? Draft to change ECN response is being split into two drafts. What do we need to do to enable experimentation. If we open this up, we are tinkering with things that could break the Internet. What are the appropriate controls to allow us to experiment responsibly. - Magnus W: feels that the assumption that everyone is using ECT(0) today is wrong. Some folks may be using ECT(1) today. APIs allow applications to set ECT will allow more applications to use ECT(1). - Koen: Apps can detect ECN-CE response behaviour that has delay and adapt. - Magnus W: If you have an API to set the ECN bits for UDP, then you may immediately expect to see more use of this codepoint... 10. Polls - Chairs - Lars: do people believe that they understand the proposal being brought forward? * rough consensus for understanding - Lars: do you believe this is something the IETF should take on * strong consensus for yes - Lars: show hands if you want to help with the work * approximately 20 hands up. Send an email to tcpprague@ietf.org if you are willing to help. State if you are willing to review documents. - Jana: question about who is willing to deploy. - Lars: please show your hand if you build equipment or run networks * 15-20 hands - Mirja : do people believe that the IETF is able to do this work? - Lars: if you believe, please hum * strong consensus yes - Jana: different pieces: dual-Q, response - Jana: does tcpprague belong in iccrg? - Mirja: the only thing that really needs standardization is the marking to discriminate DCTCP vs. classic - Phil: Koen will show demo now.