Monday, Dec. 7, 1998
Chaired by Scott Bradner & Vern Paxson
Minutes by Aaron Falk & Vern Paxson
Scott Bradner began by discussing that many IETF efforts develop custom transport protocols, especially for signaling, because TCP is thought insufficient to meet their requirements. He explained that the goal of the BOF is to understand the requirements leading to these decisions. Discussing what a new protocol might look like was explicitly out of scope; the BOF was to remain focused on requirements.
The first speaker was Shai Herzog, discussing the COPS Open Policy Service. COPS, being developed by the RAP WG, outsources policy to a server, which means it has a client/server model (query/response). During development, the WG evaluated TCP vs. UDP, and determined that TCP determined was close enough to suffice for their requirements, which were: reliable, in-sequence delivery (critical), transactions are not numbered, so the protocol cannot afford to lose transactions; real-time response/quick delivery (TCP not so good here - slow start hurts for real-time priority traffic, waiting for the window to open). One requirement TCP does not meet well is providing a dependable connection failure facility.
Next, Carl Rigney discussed RADIUS. Radius chose to build on top of UDP rather than TCP for reasons discussed in RFC 2138. The most important requirements were: control over retransmissions to ensure data gets through in 3 seconds or less, due to human factors; smooth failover to a secondary server when the primary server doesn't respond; very little connection state. In addition the single, small query/response patterns are well suited for using datagram.
Bill Palter discussed L2TP, which has two types of communication, a control channel and a data channel. The control channels needs sequenced, reliable delivery; is datagram-oriented by design; needs aggressive connection timeout to support fast recovery from failure; and needs to be lightweight enough to support 1000's of tunnels. The data channel is sometimes sequenced, but not reliable (that's the job of the upper level protocols). The WG chose UDP because it's easily scalable to very large numbers of connections; it allows rapid failover to other servers; there's very little overhead (kernel state) for idle sessions); and it's closer to the service provided by Frame Relay and ATM, domains where there's experience with protocols similar to L2TP.
Jim Gettys discussed HTTP-NG, an effort to address the perceived shortcomings with HTTP 1.X. The requirements are: record marking; should be able to abort a data stream without closing a connection; need to multiplex data across a transport path (but some data is more valuable than others, so need to avoid buffer deadlock problems, and need some way for application to identify priority); buffered pipelining to avoid RTT connection establishment costs; most request/responses are for very small objects; visibility into network conditions as measured by the transport protocol (e.g., RTT, available bandwidth), so applications can adapt. However, important not to lose the TCP feature of its efficiency going *up* as load increases.
Jonathan Rosenberg discussed SIP, the Session Initiation Protocol developed by MMUSIC. SIP looks like HTTP in terms of being request/response and text-based. It runs on top of both UDP and TCP, and makes extensive use of proxies. The basic requirements are: low transaction latency, congestion control, small enough footprint for implementation in stand alone devices, and support for multicast. SIP has a requirement for a mix of end-to-end and hop-by-hop reliability (forking proxies), and requests can have multiple responses (provisional and final), with provisional not reliably sent. Proxies can be stateless, which makes using TCP problematic. SIP conveys reliability and message semantics together (an ACK = "I got response" and "yes, I want to talk with you"). Requests can be pipelined. A SIP INVITE response takes a substantial time to be generated but once it is, must arrive rapidly. Requests and responses are small, a few hundred bytes. Requests are sent reliably hop-by-hop, while ACKs are end-to-end.
Brent Callaghan discussed the NFS version 4 effort. He recalled that NFS was developed for UDP because TCP performance was pretty poor in the target environment, Ethernet LANs. NFS mount options were added for controlling timeouts and transfer sizes. The adaptive timeouts worked okay, but the transfer size adjustment didn't. As NFS started being used over WANs, it evolved towards also being run over TCP, certainly a win on congested WANs, and fitting better with NFSv3's potentially very big reads and writes. The NFSv4 Transport Wish List: As fast as UDP on LANs; optimized for fast RPC transactions (fast binding, reduced TIME-WAIT); integrated record marking.
Paul Lin discussed SS7. In the SS7 reference architecture, a "call agent" implements multiple protocols, such as SS7/ISUP standards or H.323. It can use MGCP (Media Gateway Control Protocol) to "remote control" the gateways. Several signaling exchanges occur between MGCs in network to set up a call. One key requirement is time expectation: the time between dialing last digit and the time to hear ringing. Performance requirements or expectations are: layer 3 loss < 10E-7; misordered or duplicated packets < 10E-10; user expectation of call setup = 1-2 sec. This translates into transport requirements of: reliable rendezvous between entities; flow and congestion control without performance degradation; ability to monitor performance (are requirements being met?); confidentiality, integrity & reliability.
Scott Petrack discussed IP Telephony. The usual model is message based, so the transport should respect message (PDU) boundaries, and provide in-order delivery of PDUs. PDUs come in small bursts at call setup and teardown, which makes RTTs added by TCP connection setup expensive. A single request/response can contain several PDUs.
Reliability is the key. TCP reliability is too strict, in the sense that we can in fact tolerate the loss of certain bytes/PDUs (i.e., for DTMF, setup-ack, overlap). But it's also too loose, in the sense that we can't tolerate a single point of failure: need to be able to fail over to a backup IP address (perhaps a different interface on the same host) to emulate "link sets". Timing requirements make retransmission back-off problematic, as it can trigger PSTN timers. In general, the timing requirements are: determinable delay; synchronization with other streams (associated and quasi-associated signaling); synchronization with external clocks. Other requirements are security (PSTN signaling security is built around physical wires authenticating the wire ends) and working well over over wireless and mobile links (i.e., corruptive channels).
Dave Oran discussed BGP. BGP peers exchange large messages, requiring some form of segmentation. Because BGP maintains synchronization from startup through shutdown, and long-term associations, it's natural to introduce connections as opposed to datagrams. BGP requires liveness checking, which it does using application-level keep-alives. Since BGP has to rate limit its traffic (to prevent route flaps), it has inherent large-scale flow control, but because the updates are bursty, it needs smaller-scale transport flow control. The utility function is smooth with bandwidth degradation, so TCP congestion semantics are okay. Updates are self-describing, so no sequential ordering or duplicate detection is needed. Multicast would have been very useful; not having it led to kludges like route reflectors. All told, though, going with TCP was a win, because it saved significant development effort, and its deficiencies in retrospect were not that grievous.
After Dave's presentation, further discussion was solicited from the floor. Christian Huitema commented that, while DNS was designed to be transaction-oriented over UDP, with today's very high packet loss rates (5-15%), a transaction may fall back to a retransmission timer, resulting in very poor performance. Furthermore, DNS queries are growing in size, increasing the problem of needing fragmentation for a very widely used protocol.
Joerg Ott commented that most telephony & multimedia conferencing applications are transaction oriented (not the actual multimedia stream, but the signaling to set it up and manage it). This leads to requirements for end-to-end reliability; implicit (for timing reasons) connection setup; robustness against denial of service attacks; tunable parameters per connection and per datagram (reliability level, timeout interval, number of retransmission attempts, initial congestion window, substitution or forced expiration of old data).
Other (unattributed comments):
(1) Not much was said about visibility at the socket layer into transport conditions. This is really more a control problem than a transport problem per se - an application might decide what to do differently based on information from transport. E.g., if a routing update fails, send out different information rather than just a retransmission.
(2) Earlier comments on high packet loss rate in the Internet should be tempered with the fact that this is in isolated parts. Most of the Internet operates at 1% loss rather than 15% loss. A small amount is due to reordering rather than loss.
(3) Dilemma: eliminating the three-way handshake to make for faster transactions reduces security [see below].
(4) Let's keep in mind we're unlikely to get a single new transport that meets everyone's requirements.
Steve Bellovin then discussed some security implications of the requirements. First, removing the three-way handshake opens up security holes. The issue of sequence number guessing attacks is serious. IPSEC is reasonably cheap for 'over the wire' security, but a key question is where do you get the IPSEC keys? Unfortunately, multiple RTTs are needed to connect with a key manager, and one needs loosely-synchronized clocks (to address replay attacks). Other public key management systems will be similarly expensive. The best you can hope for is to cache key management state. But this doesn't work if you talk to a lot of other entities over a short time.
However, it might be that object security is in fact cheaper than transport security (though you still need to watch for replays).
Some transport requirements from a security perspective: a standard multiplexing mechanism for all control channels simplifies the job of enforcement mechanisms such as firewalls; it's important to avoid servers keeping state until they know they can communicate in both directions, in order to prevent flooding attacks; it's important to not wait until the end to add security, as that may kill some potentially good ideas; servers handling a very large number of connections need to be able to manage a very large number of security associations, but also need a transport failover function. SKIP supported this but lost in the IPSEC wars.
Vern Paxson finished with a summary of the discussion. First, a mailing list for further discussion of requirements will be announced on ietf-announce and e2e-interest. He then sketched the requirements heard during the BOF:
- quick establishment/activation, which is in tension with security considerations (authentication, flooding - not holding state)
- support for application level framing:
- visibility into network conditions- control over reliability
- the ability to supercede previous application messages
- want to deal with transport at a 'frame' granularity (record marking)
- per-message priority control
- minimize state requirements; think of servers with 1e6 connections
- muxing: PDUs muxed, delivered ASAP. Want ACK aggregation across the different communication streams; isolated flow-control; QoS consciousness between the streams.
- failover: transport connection can survive across change in IP address
- on connection attempt, SYN timeout are viewed as expensive
- mid-stream, need to switch to backup interfaces
- Congestion control
- slow start hit (bursty vs. even flow)
- snappy after idle (bursty vs. even flow)
- optionally becoming aggressive during loss (for control traffic whose job is to stem the congestion); e.g., FEC
- transaction-level reliability
- small footprint
- ability for application to indicate "a reply is coming" versus
- "no more coming now, go ahead and ack, don't delay"