Transport Area Open Meeting (TSVAREA) Tuesday, 2007-7-24, 15:20-17:20, "Monroe" Room IETF-69, Chicago, IL, USA =============================================== This session was chaired by Lars Eggert and Magnus Westerlund. Matt Zekauskas acted as scribe. AGENDA: * Licklider Transmission Protocol * Saratoga: Efficient Transport over Short-Lived Links * Experiences with Implementing Quick-Start in the Linux Kernel Lars opened the meeting by noting that the first two presentations are to expose the greater IETF transport community to two alternatives to TCP that are under development to fit specific needs. The final presentation shows the results of implementing one of the experimental IETF protocols, Quick-Start. Licklider Transmission Protocol Scott Burleigh Licklider Transmission Protocol (LTP) provides an ARQ mechanism that is designed to work well over extremely long and/or variable round-trip times, in part to enhance the reliability of delay-tolerant networks. See the accompanying slides for details of the talk; these minutes reflect explanations given that were not captured in the slides and group discussion. The origin of LTP is work on a "convergence layer protocol" for delay-tolerant networking stack in DTNRG for an interplanetary Internet. What it does is simple, although the way it does it is complex. The protocol provides delay-tolerant retransmission for two adjacent points. It has found some practical application for testing water quality in Ireland; Steven Farrell implemented it over UDP, and he is in the audience. Convergence layer protocols enable transmission of "bundles" in an overlay network. The target RTT is in the range of eight to 40 minutes. The design tolerates lengthly, irregular interruption of a link, as would occur if a spacecraft went behind a planet. Some transmitted bundles are designated as checkpoints, which trigger reports from the receiver. If checkpoint is not acknowledged in a timely fashioned, it is retransmitted. If a report is not acknowledged in a timely fashion it is retransmitted. The timeouts are the tricky part, and the protocol uses known changes in transmission state to start and stop timers. This is shown in a figure. The heavy lines are the checkpoints. There is one missing in the middle, which triggers the green line. With the blue line, the timer expires and the checkpoint is retransmitted. The timeout is twice the one-way packet lifetime. With the yellow line, the checkpoint doesn't arrive until the recipient is unable to transmit. The sender knows this, and suspends its timer. The slides give a number of points comparing LTP to TCP. There are three drafts that have been produced by the IRTF working group. The drafts started going through the IRSG in April. For more information, start with the LTP home page: There were no questions at the end of this presentation, but look after the next presentation for some discussion. Saratoga: Efficient Transport over Short-Lived Links Wes Eddy This presentation will outline some of the more unique features in the design of Saratoga -- a transport protocol designed for high utilization of short-lived and extremely asymmetric links -- and will discuss the class of scenarios where it can be (or already is) gainfully employed. Saratoga is another delay-tolerant networking protocol. Wes helped write the document, which was done after the protocol was already well developed. The protocol was actually designed several years ago. The use-case is with low-Earth orbiting satellites, which are taking images of the Earth. The protocol has been in daily use since 2004, and uses IP. With these LEO satellites, there is severe link capacity asymmetry. The current set has 40Mbps down, with 9.6kbps up; the next revision is set to have 210Mbps down, with 38.4k up, which is a worse ratio. These are point to point links, but they encompass the entire path, and the traffic is scheduled. There is potential for corruption-based loss, and the data is large (on the order of several gigabytes). They want to transfer it quickly, before the link vanishes because the nodes have moved away from each other (e.g., LEO satellite vanishes over horizon). The slides describe the basic operation of the protocol, illustrated with a sample packet exchange. Many aspects are configurable, and there are optional packets. There is an optional beacon packet describing capabilities, metadata describing the data that will follow, including a name and length, and then a large number of data packets, transmitted at line rate. These could be lost due to corruption. Some of the data packets ask for feedback. A "HOLESTOFILL" packet comes back giving selective negative acknowledgments, and the missing packets are retransmitted. There is also a cumulative acknowledgment similar to TCP. Many fields in the protocol can have multiple sizes (something like the DCCP 24-48 bit sequence number); small fields if header efficiency is needed, and 128 bit fields for really large things (like the example satellite image transfers). Wes described unique features of the protocol, including the beacon, the use of named content, flexibility in base transport, usefulness for large and small files, and the ability to push or pull. Wes made the point that the reliability aspects of the protocol are interesting, including the breadth of the packet the checksum can cover. This is optional if desired and thanks to leveraging UDP-Lite as well as UDP. The protocol can provide for the delivery of errored content, or have MD5 checksums cover the whole object. The utility of this protocol has been demonstrated by practical experience. It can fully utilize links, which was the primary goal. TCP would have had problems: feedback may not have arrived in time, and the constrained ACK path bandwidth would have been an issue. Here the forward/back channel capacity ratio is 833:1, soon to be 5469:1, and TCP's accepted bound is around 50:1. The goal of the document describing this protocol is an experimental RFC. A few companies depend on Saratoga for day-to-day business operations (namely www.sstl.co.uk and www.dmcii.com), and it is likely that the protocol is applicable to similar network situations: asymmetry, disrupted or unidirectional flows, or lots of bursty data. Perhaps it could be used by personal area networks, proximity networking, or in the proposed radio astronomy Square Kilometer Array. If it is found to be more generally useful, the document could be retargeted to proposed standard. Lars asked about the experimental RFC status. There is a new process to ask the ICCRG to review experimental congestion control schemes. While it has been in use for several years, letting it be used on the general Internet is an experiment. Wes is also ICCRG chair. The transport area of the IETF has done work related to satellites in the past. Nothing new has been done for a while now. Were you thinking about asking for a working group in the IETF? Wes said that they are not asking for experimental RFC now, it would probably be a year or more before they are ready. This lead to a discussion of the ICCRG review process. The idea is to review protocols in the ICCRG, then it would be sent back to the IETF, and the IETF (assuming there was interest) could put it in a working group or just submit it as an AD-sponsored document. Is there a current WG with the right expertise? TCPM isn't exactly the right place. TSVWG has ben the transport catch-all. Gorry Fairhurst noted we have seen two different foci in the two presentations. It was interesting to see both together. One shows a DTNRG protocol as an Internet service, and the other is something on top of UDP. Actually, both presentations mentioned UDP transport. Lars said he had no idea if LTP might eventually go to IETF from the IRTF. Gorry said that the concept of bundles didn't quit fit with IP. Scott said that Stephen ought to talk to that point. He wasn't sure if LTP made sense in the IETF at this point. However, the research group has been wrestling with what to call LTP, and where to place it in the network stack, for many years. If you view LTP from the link layer, it looks like a thin layer of link-layer ARQ. If you view it from above, and see the application sitting on it or the DTN bundling protocol on top, it looks like a transport protocol. The canonical 7 or 5 layers of stack don't map well to LTP. It would be hard to say not only which IETF working group would be appropriate, but if it is even a transport area concern. Lars felt that the same held for Saratoga. Both of them are used in niche scenarios currently. Do they need an IETF specification? Who would benefit from an RFC? Maybe they are not really a topic for the general Internet, but for link-based concerns. Sally Floyd said she reviewed LTP for the IRTF last week, and has sent out some mail. She thinks the abstract needs a very strong applicability statement; stronger than what is there currently. LTP is being proposed as experimental within the IRTF. If the context in the abstract (point-to-point links in an interplanetary realm), or just talked about a link layer, then it is plausible to just hold the document in the IRTF, and never pass it to the IETF. If someone uses it with UDP over arbitrary Internet paths, that moves it to the IETF. Scott said that made a lot of sense, and being more expansive in the abstract is a good idea. There have been a number of issues that have surfaced over the last couple of weeks. Steven Farrell said that the specific changes to the abstract are reasonable. However, why should it be the case that every IRTF experimental RFC needs to be reviewed by the IETF? Sally said that in her view, if it's just for space the IRTF is fine. If you want to use it in the global Internet (including terrestrial), then the IRTF is not the right place for the final specification. Lars noted that if you do that, the difficulty in producing a document would be much higher than something tightly scoped for the IRTF. Aaron Falk, the IRTF Chair, said that he was in agreement with most of the discussion so far, with the exception of the suitability of using ICCRG for these kinds of protocols. While these may be experimental transport protocols, he didn't see a congestion control algorithm in either. So he doesn't think it's in-scope for ICCRG. The closest thing within the IRTF is the DTNRG. Magnus felt the issue is having an experimental RFC. Which algorithm and use of congestion control should one start with? Aaron said that the whole reason for ICCRG to review congestion control algorithms is that there is not a depth of expertise in that area within the IETF. Since there isn't any congestion control, it doesn't require great depth of expertise to express that; and therefore any review by ICCRG would be scope creep for ICCRG. Basically, if the document specifies a new congestion control mechanism or tweaks an existing one, then ICCRG review makes sense. Kevin Fall said that from what we've seen it the protocols don't fall necessarily into niche-versus-global. It could be something done within a well-known general Internet or private network. He's not sure if it falls in any particular domain for a review. However, he thinks it is fine (and should be a requirement) to say the intention is to run within a certain environment, and that if you don't do that you're in trouble. The TSV chairs said that they didn't see a need to answer the questions proposed quite yet, because they hadn't seen the slides before, and hence hadn't thought about the suggestion of an experimental status RFC. An audience member thought that these documents were way to early to publish as experimental RFCs out of the Transport area of the IETF. Why can't the IRTF do experimental RFCs? Saratoga has been submitted as a convergence layer draft in DTNRG (http://www.ietf.org/internet-drafts/draft-wood-dtnrg-saratoga-01.txt) Saratoga submitted as convergence layer draft in DTNRG just now digested to see if it fits think it can be made to fit if that's what people like to do Wes said that if others were using LTP or Saratoga in something other than a niche environment, please contact him. Aaron added that as an individual, clearly there is a spectrum between running things over The Internet and using Internet technologies in constrained environments. The struggle in the IRTF is to figure how much of the organization's time is spent on stuff that is not destined for the global Internet. [Sally?] The transport area has historically been conservative, and involved in protocols that might find a way into global Internet. Applicability statements are good, but it is important to make sure that if a protocol is developed that is easily applied in global Internet, and dangerous and disruptive if there, the document has clear warning statements, or maybe not published at all. The scope of applicability, and the ease with which something can be abused is the issue. Experiences with Implementing Quick-Start in the Linux Kernel Michael Scharf and Haiko Strotbek Quick-Start is a recent experimental TCP extension that requires modifications in TCP/IP stacks. This presentation addresses some of the lessons learnt from a Quick-Start implementation in the Linux kernel. Michael Scharf started with a quick overview of Quick-Start. One of the things the experimental RFC calls for is experience, and they wanted to try it out. Their implementation is in the Linux Kernel; both host and router functions have been implemented in 2.6.20.11, although only for TCP and IPv4. The implementation has been ~1700 lines of code, about 5 person-months of effort. 20 files have been changed, and they added about 20 different additional variables to the TCP state, and about ten new sysctl options. The changes touched most of the files related to TCP and v4. Anantha Ramaiah from Cisco was concerned about the use of "host" and "router" -- routers today have many host functions. He felt host and router are outdated terms. There are explanations in the slides about why so many different parts had to be touched; pacing functionality affected a lot. One decision is when to trigger Quick-Start, and how. The implementation has a new socket option, and gives the application the possibility of requesting Quick-Start and the data rate. However, perhaps this is better done through heuristics. Rate pacing is the newest functionality. It wasn't as bad to implement as they had expected; Linux has good timer mechanisms in the kernel. However, it wasn't straightforward; how do you distribute timers over the RTT. There are carryover segments, if the congestion window increase is not exactly the multiple of the number of timers. Joe Ishac asked why so many timers required; couldn't this be done with one or two? The reply was that the issues is the maximum burstiness you are willing to tolerate. This is a parameter; you have to smooth the packet flow over the RTT. If you do this with few timers, then the data rate is not as precise. [Perhaps the issue is not with the "number of timers" but the "number of timeouts" that are tolerated?] They were worried about timer overhead, but it is not too significant; the total code is lightweight. If you look at the IP layer, the effort per packet increases when Quick-Start is used, and this increase is measurable. The problem is not in Quick-Start functions, but traffic measurements. Quick-Start functions themselves don't add much. The conclusion is that if the router has some knowledge of link utilization, then any other requirements of Quick-Start are light. The benefit of Quick-Start is quite clear in the experiments they did. 10Mbps Ethernet links; 100ms RTT realized by the netem module, and a bandwidth-delay product of about 84 segments. This is written up in a paper to be presented at IEEE Broadnets this September (Performance Analysis of the Quick-Start TCP Extension. Michael Scharf, University of Stuttgart, Germany. http://www.broadnets.org/2007/internetprogram.html) The experiments show a substantial improvement in transfer time, by lowering the time spent in slow start. If the amount of data is very small, or very large, the effective improvement is not that great, though. The experiments also show that Linux is better than models used in simulation, because it uses fast acknowledgment mechanisms. Michael then mentioned a number of lessons learned. These include some interference by Linux mechanisms (see draft-scharf-tsvwg-quick- start-flow-control-01); option handling is tricky; you need information on link capacity and this isn't exported well by Linux drivers. Tim Shepherd noted that it's not just link capacity. You need to determine the entire topology and find where there are queues between routers. Network Administrators might know this, but it would need to be different for different destinations. This seems to be a real challenge to hope to get reasonably through configuration. Sally said that the Quick-Start RFC has a good example of the issue raised about last 2 talks: the abstract clearly states deployment restrictions. It is clear that Quick-Start has problems if there are link-level queues, or any non-IP queues. You can't expect RTO to work ubiquitously in the global Internet. The expectation is that if Quick-Start is enabled, it should be enabled for specific paths only. Sally really appreciated the work done here, though. Implementation is the only way to learn what you really need. The problem is that then you do have to deal with complexities of the real world. But the first step is to see if Quick-Start is useful anywhere. David Borman asked for a clarification. You show that the TCP MSS must be reduced... are you referring to the calculated MSS or what is actually put in the MSS option? This refers to the actual MSS used to fragment data. When you fragment application data, you need to take into account that you might have IP options in the packet. The first MSS is smaller than the rest, because the first needs room for options. David said that he just wanted to make sure this was the adjusted size as opposed to some requirement to modify the MSS sent in the SYN. Back to the presentation, Michael mentioned that perhaps path capacity probing technologies could be used to avoid feedback from lower layers. ssthresh selection is an important detail that is hard, and not specified by the RFC. The current open issues deal with an interaction between rate pacing and the Nagle algorithm, implementation using IPv6, and getting path MTU discovery right. There is code that works, but is highly experimental. It wasn't too complex to implement, but there are some layering violations, new concepts, and today's TCP stacks are not really optimized for it. The specification is OK. The only major issue is described in previously mentioned draft. In the future a patch will be made available, and they are looking at placing code into a network processor. Sally said that she really appreciates this work, and likes it. Lars agreed, and was happy that Michael offered to present. This led to the final item: Please send Magnus and Lars suggestions for topics and presentations for the TSVAREA meeting in Vancouver!