TSVAREA: Transport Area Open Meeting Room 520DEF 9:00 AM Area Directors: Magnus Westerlund Lars Eggert Marshall Eubanks [scribe] Gorry Fairhurst [scribe] Agenda: Welcome (chairs, 10 minutes) - agenda bashing - scribes RTP over TCP [Lars Eggert] FECFRAME Introduction (Marshall Eubanks, 15 minutes) NSIS Update (John Loughney, 15 minutes) PMTUD Method (Matt Mathis or John Heffner, 25 minutes) Fragmentation Considered Harmful (John Heffner, 25 minutes) TCPx2 (Mark Allman, 25 minutes) Transport-Enhancing Refinements to the Network Layer (Lars Eggert, 25 minutes) Agenda bashing ----- Lars Eggert: Why TSVAREA? The other areas have open meetings which are not loaded with documents, but our open meeting [tsvwg] has a full load of documents for various reasons. So we felt the need for another meeting that did not focus on documents. This is the trial balloon. If we have this in the future there will be no draft work done - that will be done in tsvwg. ----- First is Interactive RTP over TCP - from the RAI area There is no document about the tradeoffs involved, what you might do instead of TCP, etc. I know that there is something done with DCCP. We need to scope the document for using RTP over TCP - what works, what does not. This would be a BCP, is there interest? ----- Intro to FecFrames Marshall Eubanks: Why FecFrame? Described use of packet FEC, and history from RMT. There are applications for IPTV and 3GPP. Need to look at various issues there are several issues to be worked on, and setting up the architecture. The group had its first meeting and is defining requirements. Matt Mathis: How does this fit relative to transport protocols? Marshall Eubanks: Are you talking about procedure of protocol? Matt Mathis: Is this above or below TCP? - Can we make this below TCP to change loss characteristics? Marshall Eubanks: Could use this. Magnus Westerlund: It works on the transport protocol payload. Marshall Eubanks: Yes, not the transport packets... This is normally used over UDP. Matt Mathis: In TCP space, congestion loss is never isolated loss (with large windows). - We could make huge use of such things with a segmentation level below the TCP layer (Accept a 64K MTU and then apply redundancy below this). Gorry Fairhurst: Doing FEC above the transport protocol is OK, but doing it below has implications on how you interpret the loss and its relationship to congestion control. It is very hard to differentiate packet loss resulting from the link rather than congestion. ?: The stack could provide Feedback from TCP downwards Tim Shephard: It is hard to tell the difference between loss and congestion, that makes it hard to solve. Sally Floyd: There are different IETF-sanctioned ways of doing audio and video to work in environments with loss. There are different assumptions that you have when you use these. TCP halves sending rate in response to loss. Marshall Eubanks: Equation-based congestion control and loss - streaming methods do not necessarily get packets resent, and need ways to recover from loss. Sally Floyd: You have a playback time and an unreliable method. You can layer on top of DCCP. David Black: In some ways use of FEC with TCP has some uses that resemble ECN, in some ways this can be used with RFC 3168. Congestion control is solid. This is potentially turning this into a tar-pit problem and PWE3 they need to define new congestion control. Magnus Westerlund: The primary (chartered) method is above the application. ----- Matt Mathis: - An overview draft on path MTU discovery We have updated this and are currently in WGLC. The original goal was to replace RFC1191 and 1981. Now it is an extension to those RFC's It gives you faster convergence and is more robust - our original goal was to maximum MTU robustness. There are some corner cases that the earlier documents didn't deal with. You do a classic search algorithm - periodically send one that is too large - if it is not delivered, and if only it is not delivered, you can suppress congestion control. This is the only part that ignores standard language. This makes path MTU discovery parallel to congestion control based on how it sits in the stack - it uses the same mechanism as the congestion control. It fits better than the ICMP model. What we changed is that the document starts with the interface MTU - the earlier document The algorithm is insensitive to the start MTU in terms of robustness. STTP made a change for us which is going forward in another packet. For STTP you can send a probe plus a pad and only the probe comes back. IP fragmentation itself doesn't have a way to do MTU probing - the application protocol ultimately determines the segment boundaries. Open issues: It's not clear how important these are. We have occasionally in the field run into inverse MUXes which stripe data across different links. This means that loss may not be for EVERY packet if the links have There are some secure tunnels that ignore DF bits. Joe Touch: These devices are IPsec devices, because that says that you can override the DF for policy reasons. This is different from the 2003 IP-in-IP specs. This is a problem in the meaning of the DF bit. Matt That will be discussed in the next talk. There are bridges which are not store and forward bridges which have rate adaption FIFO's. This has too be big enough to deal with clock instability, and depending on the parameters set then packets > 1500 bytes can be fragmented. Opportunities: It is dangerous to increase the paranoia of your ICMP implementations if you depend on 1191 to determine your ICMP - this frees up this problem. I personally feel that ICMP probing is a layer violation - this fits better on the stack. Finally, the Internet cell size is 1500 bytes. All of the things that were wrong with ATM 10 years ago are wrong with the Internet today. Jumbograms are really hard to deploy because they require 4 party reconfiguration, that is both end systems, all the routers, all other systems sharing broadcast domains. Scott Bradner: Is this the right field to be tilting at these windmills? There is a standards body for Ethernet. Matt Mathis: The purpose of the work in the IETF is to undue the horrible mistake with the current Once the current algorithm is deployed it is a minor tweak to fix it. ?: Every host needs to know what the MTU of every other one on the LAN Matt Mathis: Right - ARP needs to know. Bob Briscoe: If right from the start we had talked about MTU in terms of time and not bytes... Matt Mathis: Right. My choice would be 125 microsec. Tom Fallon: Why not DCCP? Matt Mathis: We could do that if there was really clean text but it would Lars Eggert: Does this need a tight specification? Matt Mathis: It doesn't affect interoperability. The details are not so important. This is already shipping in Linux 2.6.17. It is out there now. ?: This is about the sequence number checking - in Cisco we have done the sequence number checking, and also do the duplicate packet checking. A.J. Pistrani [?]: Is there a problem with different protocols using different paths? Matt Mathis: One could imagine bizarre things, but most of them are broken already. John Heffner: We see in IPv4 where packets set the DF bit but the application doesn't know what to do and sends packets too big. It would be nice to include in the draft which says to don't do this. ----- John Heffner: Fragmentation Considered Very Harmful This is based on a SIGCOMM 87 paper, and we have documented some additional problems. With IPV4 fragmentation you set a 16 bit ID frame and all fragments are matched up based on that. So, if you send more than 2^16 datagrams within a timeout, then different fragments can be mis-associated Joe Tesh: Doesn't the spec say that you are not allowed to wrap this? John Heffner: Right. The spec is correct - you cannot send more than ~100 Megabits per sec in 30 seconds. But many people do - for example, many NFS installations. The problem is actually worse than that. If the first fragment is lost, the rest sit in the buffer until it wraps, and then they get combined incorrectly, which causes a bad packet and another orphaned fragment, which creates a cycle which can last for the entire flow. And there can be more than one flow. The UDP check sum is not so good, and is not always used, so UDP is especially at risk. You need to be sending at a fairly high rate per protocol, not per flow. One work around is to adjust fragment boundaries Another work around is to shorten the time out, but this may be hard to do for The Internet spans more than 5 orders of magnitude. An approach that Linux is doing right now is to have an adoptive time out on a per peer basis. This requires per peer states, and the source has no idea if the flow is safe or not. The draft includes none of these workarounds - it is strictly informational. IOPv6 uses a 32 bit ID field, so it doesn't experience this, yet. ----- Mark Allman I don't have a draft, just a topic for discussion. TCP has served us well, but there are new things coming up which are changes to TCP, wanting bits. We don't have a lot of bits in TCP, with the 60 byte header. For example, there is a 16 bit field for window scaling - so this was solved for RFC 1323 More recently, people are saying that we don't have enough option space. So, there are drafts on extending the option space. Another is port numbers - there are two schemes to allow more efficient use of the port numbers. Also, there are on 3 reserved bits. Could another ever be allocated? So, what to do? 0.) Do nothing. These problems are not all that bad. 1.) As these come up, we do the engineering we can on each problem. 2.) Maybe we shouldn't extend TCP - use DCCP and STCP 3.) OR, we could move for TCP-ng 4.) We could put a new header on an old protocol - TPCx2 - - just double the size of every header field - get a new protocol number - keep semantics other wise the same - so you get to leverage the code and maturity of TCP Of course, there are plenty of issues. 5.) Or, something else may come up... ?: Please explain why STCP doesn't cut it. Mark Allman: That is not my opinion, it is just a possible approach. ?'2 from Cisco: I love the idea. I think that TCPx2 would be easy to implement. Randall Stewart: We've spent a lot of energy and a lot of work perfecting another transport. I have a hard time believing that we are running out of ports. If there is something lacking in TCP, refer them to DCCP or STCP, and extend them if I spent 6 years pushing SCTP. Matt Mathias: Deja vu all over again. RFC1263 - TCP extensions - is exactly this. The thing that we blew in TCP was version negotiation on the fly. One idea is to roll out a lightweight shim protocol. But the real issue is you can't get to the legacy systems. If STCP is sufficiently rich, then it is the right answer. Michael Tuexen: You laid out some problems, one is which you want to extend TCP. But you have a new protocol - you are starting from zero for a deployment point of view. From STCP this is just an extension. Luis [?]: SCTP is a protocol designed for something completely different for what applications expect for TCP. [cries of wrong]. STCP is much more complex. I think it is a good idea to think about this. Sally Floyd: I think that this is worth exploring. DCCP in particular is not the answer. my guess is that STCP is too complex for what you want, but that is not my area of expertise. Mark Allman: I put up SCTP and DCCP because I think they span the space. Michael Tuexen: We designed the STCP API so that you could easily switch back to TCP if you need to. ?: We have some 30+ TCP based applications. The TCP option space is a serious thing, and it needs to be addressed. I think we should take this in a piecemeal fashion - option 1 - and also continue the discussion of TCPx2 Joe Touch: For all of the reasons why people say changing TCP will change a lot of code - I don't believe that STCP is the solution. The problem is that TCP is not extensible. If we could build that into a new version that would be good. Of course, we have had so much success with the IPv6 transition. [laughter] John Droper [?]: Why does extending the TCP header hurt DCCP and SCTP? ?: What are you trying to achieve? How does this help the real world? Kevin Fall: You might look at RFP 1705, as that has a new TCP6 header, using 64 bit addresses. Jim [of HP]: This is only focused on the wire right? If it is not just the wire, then I would vote for 2. Lars Eggert: We are going to use TSVWG for the mailing list for this. [suggestion from the floor to set up a new mailing list - note from Lars: use tsv-area@ietf.org] ------ Lars Eggert: Transport and network. Layers are good, as they hide internal operations. For example, network layer says - I will deliver your packets in some order, maybe not deliver some, etc. BUT, in practice, network layer assumptions are common. Here are some: - Hosts remain at IP addresses for a long time - paths change slowly - path characteristics change slowly The reality is that many of the assumptions are no longer generally true everywhere on the network - for example, running TCP over mobile IP. Traditional methods may not work well in these cases There are bunch of solutions, which are not suitable for standards because they are too narrowly focused. What could be appropriate? The idea is to extend the communication abstraction so that the network layer provides information to the transport layer. This is not a new idea: ECN, Quickstart, XCP. Can we think of anything else that works on this principle. We have started a list, ternli@ietf.org. The goal is to start people talking. Matt Mathis: [scribing interrupted] Lars Eggert: ICMP is used to get network signals into the end system. Matt Mathis: Right. They violate the layers and the end to end principle, but they can be extremely valuable. ?: ICMP v6 is almost a different protocol. Lars Eggert: ICMP is just one way of providing signals. It matters what the information is and what you do with it. Gordi [?]: ECN, Quickstart and XCP, all have end to end verification with what's going on. You don't have that with Source quench. Matt Mathis: There are a small number of causes where the transport might want to signal the network layer. Lars Eggert: You could think of very elaborate sets of information, but maybe a more simple thing would be more deployable. ?: I wanted to comment on the violation of the end to end principle. I am not sure it is. It's wrong to think that you just make a call to something and it returns. It's Ted Faber: I am concerned with taking unsolicited advice from the network when I don't know know where it comes from. You are going to act on information you should have some assurance that it is valid. [?]: In Shim6 packets can go different paths, so we are working on failure detection protocol. This is very relevant to what SHIM6 is doing. Lars Eggert: We have the same with HIP. In HIP there is no way to tell changes to the transport. ------ At the end of open mike, there was a hum for having another TSVAREA meeting in San Diego - the consensus was that there should be.