TCPM meeting, IETF-93, Prague, Czech Republic, Wednesday, July 22, 13:00 - 15:30 -------------------------------------------------------------------------------- Chairs: Michael Scharf Yoshifumi Nishida Pasi Sarolahti Note takers: Andrew McGregor Michael Welzl WG Status --------- * Chairs giving WG Status Gorry Fairhurst: announcement: presentation of TCP stealth draft in TSVWG Jana Iyengar: announcement: bar BOF on QUIC tonight at 8:20 in Congress 1 TCP Extended Data Offset Option (draft-ietf-tcpm-tcp-edo) Speaker: chairs for the authors ---------------------------------------------------------- Mirja Kühlewind: was there a large measurement study (of Pasi's implementation)? Pasi Sarolahti: No. Testing work continues by Joe's student. Chairs recommend comments on the list. No comments from room. Analysis of the TCP EDO option (draft-bonaventure-tcpm-edo-analysis) Speaker: Olivier Bonaventure ---------------------------------------------------------- Aaron Falk: Announcement: RG BOF on Friday about measurement of middleboxes and how they affect transport protocols. Mirja: tracebox will be presented there. Michael Scharf: fallback possible with Joe's solution? Olivier: The fallback should not generate an RST, but should end up with working communication. Olivier: in MPTCP we need a checksum in the option and ACKs for the options. Not clear if it can be done without this. Michael: measurement data by next IETF? Olivier: tracebox is ready, we need people to run it. It's a user space app, you must be root. We have a server. Michael: suggest to inform mailing list RFC793bis (draft-ietf-tcpm-rfc793bis) Speaker: chairs for authors ---------------------------------------------------------- Karen Nielsen: as an implementer we gave some feedback on draft-gont-tcpm-tcp-seq-validation. No cycles to do anything about it but I can resend the comments I gave. Gorry: About reserved bits: there's a BCP saying it was known behavior (set to 0, ignore on reception) as a correction to 793. Also, will this be a full standard? Michael S: It should be an Internet Standard. Mirja: In favor of documenting what's deployed in the wild. Could add in the draft what's newly added because of deployment. Michael S: We may add non-normative description of variations Aaron: It's desirable to have consistency on what standards levels mean; maybe you should consult AD and IESG about some of these issues. Room may not have the right breadth of expertise about the standards process. I don't think you can take something that wasn't already a standard and say 'MUST'. Michael S: challenge is 793 doesn't have many MUSTs Aaron: I'm talking about the new stuff Gorry: 6man is updating IPv6 spec at this moment. IESG will be thinking about this sort of thing before we get to it. Karen: Unfortunate if we end up with features that we all agree should be there all being in the appendix. Propose we try to get consensus and then move these things up. Jana: seems like a good idea (slide on 2119 requirements language in a 1122 style table) Mirja: not sure it's helpful but it doesn't hurt to have it Karen: our implementators love this Tim Shepard: Such a table would be great if it's accurate. Should call for volunteers to proofread the table. Michael S: already called for volunteers for whole document, table simpler than the rest Mirja: wouldn't it be sufficient to have a pointer to the roadmap? Michael S: there will be a pointer to it but the roadmap is not normative. Probably not sufficient. I guess it will be referenced as informative. Datacenter TCP (DCTCP): TCP Congestion Control for Datacenters (draft-bensley-tcpm-dctcp) Speaker: Praveen Balasubramanian ---------------------------------------------------------- IPR disclosure: Microsoft has granted royalty-free license. Mirja: Clarification, accurate ECN idea comes from ConEx and earlier work (Re-ECN) Jana: CWND restart, what is that? Praveen: That's reset to initial CWND after idle. There is data to suggest this helps. Jana: Is it paced? Praveen: No Jana: Data would be really nice, because this seems counterintuitive. Praveen: By default windows server will use a lower RTO (10ms) Gorry: When you negotiate DCTCP, you base this on delay? Praveen: No explicit negotiation in the SYN, if RTT < 10 ms and ECN negotiated will automatically do DCTCP Gorry: Can SYNs be resent in case of blackholes? Yes, first two are ECN enabled then following are not. Gorry: Could be long path. 5s? Praveen: correct but this is for datacenters. Not happening if latency is large. Gorry: draft doesn't say anything about that. Can it be in the abstract? Praveen: It's in the intro. Gorry: Should be stronger in abstract saying it shouldn't be used outside the datacenter Richard Scheffenegger: How do you figure out latency on handshake in the receiver side? Praveen: because 3-way handshake you get it on both sides. 10ms is a large enough... Richard: corner case. 3rd ACK delayed, one measurement below, one above. Just pointing out. Praveen: known problem. Bob: Problem if one side has implemented and the other not. Do you intend this draft to freeze your description of DCTCP even though you could evolve beyond that? Praveen: I think we're close to completion. This is successfully deployed and we see good results. Bob: But if there is an evolution on TCP itself to do this, what do you plan to do? Michael S: we can document algorithms if they are deployed. At a certain stage. Cubic would also evolve after we published it here. Dave Thaler: could use different queues or whatever .. snapshot of DCTCP can be deployed and can work together. For future version, we have to worry similar to know: some do DCTCP and some don't. Future considerations. We don't worry about this now. Joanna Kulik: If we wanted to play with this in a datacenter, how does it play with existing TCP? Praveen: this would not coexist. Joanna: to play with it we need to bring up the whole thing. Or you do what Morgan Stanley suggested, to use DiffServ. Bob: to answer your question, in AQM yesterday showed another way to do that with Alcatel-Lucent. There was a bar Bof yesterday, mailing list will be called tcp-prague@ietf.org to evolve DCTCP so that it can coexist with other traffic. Point about negotiation: in accurate ECN, we have suggested negotiation of accurate feedback would be implicit. Give me a different meaning for the feedback, that would imply you're doing DCTCP. Aaron: procedural question, are you bringing this to the IETF to document and let the community know what it is, or are you handing change control to the IETF? Praveen: document, want to get good interop. Intended to reflect the Microsoft implementation of this protocol. Michael S: already did adoption call a few months ago, feedback was weak - insufficient. For us as chairs, to adopt this, feedback must be much stronger than what we've seen before. Andrew McGregor: given this is a informational documentation of something that exists, should the barrier not be much lower? Michael S: ok but we still had way to little feedback. 3 emails or so. Jana: work that's going forward, people are actively working on items that are based on it, it's very important and useful to have an Informational RFC that documents what it does. Seems silly to not have documentation. Mirja: Missed the last call, replied afterwards, did review in the meantime, well written document. I think we're ready to move forward. Gorry: Worried about the omission of loss as a signal, but otherwise seems like the right thing. Brian Trammell: sceptical if publishing this would be harmful, so I'm in favor Lars Eggert: not strong support when we did adoption call, but we got more private feedback from implementers than for other things that are standards track here Aaron: if WG doesn't want to spend time on it I would fully support publishing it as informational individual submission. Michael S: title: Microsoft's DCTCP. That's a bit unusual. Ask for feedback. Lars: 1st version didn't have it, but we added it. If you document something that isn't an ietf document you put the name of the company in the title. But don't care. Dave Thaler: Microsoft doesn't care about title, but what Lars says. There are multiple implementations, after all. Mirja: could go for individual submission but we discussed it enough so we should just go for it. WG document should be faster, surely? Michael S: question is, if it becomes a WG item, what's the group's preference. Mirja: both are fine. Aaron: no strong opinion. Would like to see testing the group for WG review of the document. WG cycles is the more limited resource. Jana: let's get rid of Microsoft in the title so we avoid this silly debate Tim: If we're going to bikeshed the title, we should fully do it. DCTCP is a much bigger deal than MS, because it's useful in many places. Disappointed that it's called Data Center, because it's useful far more than in DCs. Should take Microsoft out of title. Mirja: That would be a different document. Lars: Unfortunately, that's what it's called now, everyone knows it by that name. But that's why the tcp-prague list is called that. But let's keep this DCTCP as name. Michael S: who has read a recent version? 10 to 15 Who would be willing to review for clarity: many hands - chair says more than 5, so good Hum: should adopt as Informational RFC. Strong hum. Should not adopt: no hum. Title: two options, hum if you think that it should be with Microsoft in title. Weak hum. Removal of company name: significantly stronger hum. Conclusion: adopt for Informational. Bob: that still means that it's about the Microsoft thing, it's just not in the title Michael S: yes The TCP Echo and TCP Echo Reply Options (draft-zimmermann-tcpm-echo-option) Speaker: Alex Zimmermann ---------------------------------------------------------- Jana: Given that we have the timestamp option, what are the use cases for this? Alex: Syncookies, idle detection. This can be more compact as well, for some of these cases. Improved PAWS is possible. Timestamps don't much improve RTT estimation. If we can deprecate timestamps, that would lead to more option space. Tim: can't see the value, this doesn't save bytes, does it? Richard Scheffenegger: this is answered in my talk. Tim: Bob's inner space would also be a use case. There are many possible use cases. Michael S: alone this draft doesn't make sense, comes along with use cases, adoption would require use cases. Request thinking about this in the WG. Only makes sense to discuss with use cases. Jana: Agree. The way this works will depend on the use case. Richard: echo option reflects most recently received, timestamps doesn't; both make sense and have value. Depends on use case. Jana: certainly nice to have. Matt Mathis: did you think about encodings where to do this symmetrically you need to use these two options all the time. One could think of more condensed methods. Richard: Still a proposal. if you have such ideas that's very valuable feedback. Using the TCP Echo Option for Spurious Retransmission Detection /draft-zimmermann-tcpm-spurious-rxmit) Speaker: Richard Scheffenegger ---------------------------------------------------------- Jana: I really like the fact that we get away from retransmission ambiguity with this. But: Echo happens only on one packet, right? Richard: No state at the receiver. Echo data that was in data segments that you did not ack to is lost. Jana: I'll have to think about this. Not clear to me how you can get away from loss-of-acks problems etc. Richard: not perfect Jana: you indicated it is Ian Swett: As long as you follow the standard rules for delayed ack, it comes very close to perfect, but reality can be very different and those rules are often violated. Richard: thought about this problem, please read the draft. In re-ordered state at receiver, if you don't lose acks all the time you get all the signal you need Alex: and during loss recovery we don't have delayed ACKs. Michael S: made 3rd party IPR disclosure because Eifel algorithm has IPR disclosure. Since this is closely related, WG must be aware that there might be related IPR. Have to say this as chair. Lars: I think you're over-interpreting the rules. You're only asked to disclose if you think it applies. Michael S: but I have looked at the patent. Richard: use cases extend beyond what Eifel can do, never worse than Eifel Mirja: Good idea, but how much overhead compared to performance you gain? Echo Option is very generic, receiver does not understand what's inside, receiver may not have incentive to support this because it doesn't know it's useful, so is this an incentive problem? Karen: SCTP comments not always welcome here but FYI we have made room in chunk header for a corresponding Echo option. Expired draft will come as a new draft in next IETF where we intend to use this kind of option for the same purpose. Jana: Answering Mirja, to eliminate retrans. ambiguity, not sure this fully does it but it hugely simplifies reasoning, allows recovering from losses of retransmissions etc., so problem worth solving if we can. Fast Open IPv6 extension (draft-hawari-tcpm-tfo-ipv6-prefixes) Speaker: Mohammed Hawari -------------------------------------------------------------- Andrew: Not sure I buy your motivating example; but then there are many reasons to do this. No opinion on mechanism, but the goal seems reasonable. (much nodding from the back of the room) [missed some things] Tommy Pauly: idea to limit what you need to share: if you had a previous cookie you could just try it. Mohammed: how do you limit the size? Tommy: could be fairly large. That's what a standard is for, max size could be written in the standard. Michael S: An option number is not a low-hanging fruit, so will require a lot of discussion. TCP Alternative Backoff with ECN (ABE) (draft-khademi-alternativebackoff-ecn) Speaker: Naeem Khademi -------------------------------------------------------------- Mirja: Slide 11: What is buffer size? You only show cwnd, you don't show link utilization of delay. If you don't utilize fully you get an early notification. For Slow Start you should have a BDP of buffering. If you double cwnd in SS, you have to halve afterwards. That might help, but here you induce a lot of congestion and delay after slow start. Naeem: possible to configure beta after slow start and in congestion avoidance separately, but it might also contribute to performance because the number of steps in the bad case is limited, every time you get a CE mark and you will eventually reach the capacity line. Mirja: after presentation: what's the intention of the draft? Naeem: will update draft and intend to ask for adoption. Mirja: unnecessary if it doesn't change an RFC Michael Welzl: it updates a SHOULD Mirja: it's a lowercase should Mirja: from our measurements we've seen that basically noone is using ECN, and Cubic uses beta=0.7, in Linux this is already deployed? Michael W: no, cubic does it also for loss, now 0.7. So this is 0.8 for ECN, 0.7 for loss. Michael S: relationship to loss factor? Naeem: thresholds are not connected in buffers. Michael S: it's only about how you describe it but the effect is the same. I might be wrong but key point is: doing this for Reno is one thing. Is the same idea to other congestion control. Cubic? Does it have a benefit or not? Naeem: all results in tech rep are for Cubic and Reno. Recommended betas are slightly different. Michael S: from a standards perspective, do we have to issue yet another Reno update, or can we do Reno and others at the same time. This would be much more interesting than just updating Reno. Bob: I didn't think it was a lower-case should in RFC3168. It says "essentially MUST". Brian Trammell: I think it's a technical erratum to 3168. Richard: Insight is that marking threshold is typically lower than drop threshold, therefore buffer sizes are smaller, therefore you want to have a reduced decrease. Work is valuable, Mark Allman's comment is valuable, i would like to see this converge. Michael S: seems like a perfect topic for ICCRG. Can be discussed there, clarify Mark Allman's proposal of binding beta to a loss multiplication factor, see if it is applicable to more congestion controls than just Reno. Coming up with a beta factor is in ICCRG's scope. TCP Sendbuffer Advertising (draft-agache-tcpm-sndbufadv) Speaker: Costin Raiciu -------------------------------------------------------------- Stuart Cheshire: Regardless of how you encode it, in Apple's OS we make this irrelevant. Backlog in the kernel is bloat, there's no reason that e.g. screensharing benefits from having its bytes mature in a buffer before they can be sent. We made a whole bunch of stuff. What you need is to tell the sender I can go faster if you let me. Queue is really binary, empty or not empty. We just made it empty. And this is in Linux too. Matt: I believe your algorithm conflates network bottlenecks with receiver bottlenecks. RFC 4898 has a MIB for a set of timers, so it's sender instrumentation. Costin: you have to look at rwnd also to get the whole picture [missed some in rest] Matt: RFC1948 has timers ... to know when you exit ... [missed some]. Different, more robust solution. It's meant to be accessed through an ICMP type message, it's a side channel. Tim: if you turn off the ACK bit... Mirja: flag "app-limited or not" would be useful for more general purposes than only TCP. ECN Data from Apple Speaker: David Schinazi -------------------------------------------------------------- David: Apple has turned on ECN in beta releases for testing. In IOS 23% of connections successfully negotiated ECN end-to-end. 22% in Mac OS. Alexa 1M: 59% support Jana: percentages of what? Users? David: yes users everywhere CE markings confirms what we thought: noone marks right now. IOS 0.08%, 0.01%. Cell phone carriers: some reorder all packets if ECN is enabled. Stuart: to clarify, was only one carrier. Reordering seemed to be induced by having CE marks didn't actually hurt performance. Side effect with rwnd autotuning, so we were sender-side or receiver-side buffer limited. 3rd order effect. Jana: ECT0 or 1? David: just ECT0. Half of carriers cleared the TOS byte. That's unfortunate. Aaron: hiccups paper: mode where client thought it wasn't doing ECN, server thought it was, different state. Did you see this? David: we own the server Richard: active measurements made by you, you induced CE marks etc. David: correct Richard: look out for stuff you wouldn't dream of. Stuart: point of announcing this at developer conf and David announcing here is that we have the betas available for people to download, the more we get the more informed the decision we make. David: we need to get the word out that people should not be clearing the TOS byte. If you notice it in your home net, call customer support. Michael S: please post the numbers on mailing list. Richard: there is gear that takes out TOS byte with SYN and rewrites the TOS byte for the entire session. SYN CE marked => entire flow CE marked.