IETF July 2000 Proceedings

Current Meeting Report
Slides

2.6.3 Endpoint Congestion Management (ecm)

NOTE: This charter is a snapshot of the 48th IETF Meeting in Pittsburgh, Pennsylvania. It may now be out-of-date. Last Modified: 17-Jul-00

Chair(s):

Vern Paxson <vern@aciri.org>

Transport Area Director(s):

Scott Bradner <sob@harvard.edu>
Allison Mankin <mankin@east.isi.edu>

Transport Area Advisor:

Allison Mankin <mankin@east.isi.edu>

Mailing Lists:

General Discussion:ecm@aciri.org
To Subscribe: ecm-request@aciri.org
In Body: (Un)Subscribe ecm your_email_address
Archive: ftp://ftp.ietf.org/ietf-mail-archive/

Description of Working Group:

The Endpoint Congestion Management (ECM) Working Group has two goals:

1) To provide a standard set of congestion control algorithms that transport protocols can take advantage of rather than having to develop their own

2) To develop mechanisms for unifying congestion control across an appropriate subset of an endpoint's active unicast connections.

For ECM, we define a "congestion group" as one or more unicast connections communicating with a common destination, and for which a decision has been made to bundle the connections together into a single flow for purposes of congestion control.

For each congestion group, a Congestion Manager (CM) will track the capacity currently available along the network path(s) used by the group, based on congestion indications such as packet loss information, in a manner analogous to TCP's "congestion window". The CM will then pass this information along to a Scheduler module that determines how that capacity is to be partitioned among the connections in the congestion group.

Determining the granularity of "common destination" (e.g., particular subnet, CIDR mask, specific address, or specific address and range of ports), and making the decision as to which connections should be bundled together into a single congestion group and which go into separate groups, are both difficult problems, because they are heavily influenced by the specifics of both the remote network topology and by possibly-remote policy decisions. It is the hope of the IESG that the IRTF will undertake research into exploring how to address these issues. In the interim, the working group is charged with devising near-term solutions to these problems with sufficient flexibility to accommodate those possible alternative schemes sufficient flexibility to accommodate those possible alternative schemes that can be anticipated. The working group is also charged with maintaining good communications with the IRTF effort, should one materialize.

The CM architecture will stress separation of the mechanisms of determining the current available capacity from the policies of how to then schedule that capacity. It is a requirement that the architecture eventually apply both to TCP and to UDP-based (and other IP-based) transport that includes feedback information concerning congestion (e.g., packet loss or explicit congestion notification). The initial WG deliverables will focus on developing CM for unifying connections that either use TCP or use UDP in a style comparable with TCP in terms of detecting loss and measuring the round-trip time. It is possible that the scope of work will be extended in the future to also include mechanisms for applying congestion control to transports that do not include such feedback.

The WG is also tasked to investigate the architecture's security implications; and the degree to which network stability will be entrusted to correct operation of applications using ALF transports, rather than operating system kernels.

The WG will initially produce four documents:

- An Informational RFC on congestion control principles. The goal of this document is to explain the need for congestion control and what constitutes correct congestion control. One specific goal is to illustrate the dangers of neglecting to apply proper congestion control, including the sometimes elusive argument that congestion control is not needed because the protocol will only be run over well-provisioned paths.

- A Standards Track RFC describing the behavior of a standard-conforming Congestion Manager (how it decides the current available given congestion feedback from the members of a congestion group).

- A Standards Track RFC giving an abstract API for communication between Congestion Manager clients, the Congestion Manager, and the Scheduler. This initial work is confined in scope to supporting congestion groups made up of either TCP connections and/or UDP connections that incorporate the same sort of feedback (per packet loss information; RTT computation) as TCP.

- An Informational RFC giving an example of the behavior of one or more Schedulers.

Goals and Milestones:

Feb 00



Congestion control principles documented submitted to IESG for publication as Informational

Feb 00



Description of required behavior for Congestion Managers submitted to the IESG for publication as Proposed Standard

Feb 00



Abstract API for congestion management submitted to IESG for publication as Proposed Standard

May 00



Example description of behavior of ECM Scheduler submitted to the IESG for publication as Informational

Jun 00



Working Group rechartered with revised deliverables or shut down

Internet-Drafts:

· Congestion Control Principles

· The Congestion Manager

No Request For Comments

Current Meeting Report

Draft minutes for Endpoint Congestion Management WG meeting Monday, July 31
IETF 48, Pittsburgh

Notes by Aaron Falk, with editing by Vern Paxson.

The chair began the meeting with an overview of the WG's deliverables. The congestion control principles document is complete and in the RFC pipeline. The abstract CM API has completed WG last call. However, it appears that few people have read the document, leading the chair to wonder if there's been sufficient WG review to constitute consensus. A third deliverable is a document specifying correct behavior of a congestion controller. The chair proposed that the API document has sufficient discussion and pointers to previous documents to serve as such. No comments were heard in support or counter to this proposal.

Hari Balakrishnan then lead an extensive overview of draft-ietf-ecm-cm-01.txt, using viewgraphs available from http://nms.lcs.mit.edu.

Summary of the draft: the desire is to integrate CM across all applications -- not just TCP-based ones. It goes just above the IP layer and exposes an API that allows applications to get information about the state of the network.

One issue that needs more discussion is: what should the granularity of a macroflow be? This was discussed at the Nov. 99 IETF. The default is to aggregate all streams to a given addresss. The grouping and ungrouping API allows this to be changed by an application program.

Suggestion from the floor: why not let the application (cm_update) tell CM whether it's getting receiver feedback or not, rather than using terms for reporting loss like PERSISTENT and TRANSIENT, which allow too much ambiguity? Suggestion: give applications simple and non-ambiguous signals - e.g., it's receiving feedback; no-feedback; non-congestion-related loss.

Vern asked why the grant time is in terms of RTT rather than RTO? Hari replied that RTO would not be appropriate because it's possible to build a TCP-friendly app without a notion of RTO.

Hari proposed removing the notion of rttdev from the cm_query() call. Joe Touch suggested that any mechanism collecting data and reporting aggregate values should be calculating deviations to give a 'credibility weighting' to reported values. (I.e., to distinguish wildly varying values from stable ones). Joe also questioned the utility of reporting a rate if there's no information given as to the interval over which the rate is computed. Hari clarified that rates are over a small time window (i.e., one RTT). Two folks suggested that a jitter measurement would be useful in determining the aggregate jitter for things like RTP streams. This would allow applications to make adjustments using more information than just their individual RTP jitter measurements. Mark Handley suggested keeping rttdev to allow new TCPs access to the info when they start. Vern asked whether it should be defined the way TCP currently computes it (including using a deviation rather than a variance). Joe suggested that good statistics for the CM to report would be those pertaining to a group/macroflow rather than a single stream.

Vern asked will there be part of the scheduler API the defines the scheduling discipline? The answer is Yes.

Joe thought that an application should be able to use a single call of cm_get/setmacroflow() to forward some data to the scheduler, rather than requiring two calls from the app, one to the CM and one to the scheduler. He further emphasized that we really need a scheduler API in order to evaluate the completeness/adequacy of the CM API. cm_get/setmacroflow depends significantly on the choice of scheduler and it's not possible to evaluate the proposal without understanding the scheduler better. Vern stated that the concept was to nail down the CM part now to enable experimentation with schedulers. This is appropriate for Proposed Standard documents - there's plenty of leeway to change them, including recycling them at Proposed.

Unattributed question: what will be the incentive for applications to use CM? Hari replied that they will attain better performance in situations such as slow start. Vern added that in the future the the IETF may require new protocols to use CM to for congestion management rather than inventing their own.

Hari then raised a pending issue regarding temporarily overriding cwnd restrictions. Suppose a TCP loses a packet due to congestion. The sender calls cm_update(). This causes the CM to cut the window. Now, the outstanding data exceeds cwnd. So what happens to the retransmission? How does it manage to go out? One solution (hack): add a priority parameter to cm_request(), perhaps with the restriction of you can request at most one high-priority packet per RTT?

Tim Shepard thought that solution was okay, but prefers FACK and rate-halving (which is more aggressive but allows you to keep sending packets). Hari agreed that if you use FACK, this isn't an issue, but we don't want to restrict implementors to doing a TCP-style congestion controller. Tim mentioned that another alternative would be to not change TCP, and only use CM for other apps.

Sally Floyd asked for clarification regarding what does a TCP sender tell the CM when it receives dupacks? Answer: any dupack is treated as feedback that packets have left the pipe.

Tim was also concerned with the default policy of grouping TCP connections together based on the same same src/dst IP addresses. NAT boxes may mask a lot of complexity behind a single IP dst addresses. Vern commented that this is a key issue and we are hoping to crystalize an IRTF effort to look at this. He also pointed out that the assumption of sharing network path properties based on common destination address is already in use today, in route caches that include ssthresh and rtt/rttvar information. Matt Mathis stated that if there's a NAT box the behavior will still be safe from a network stability perspective, though traffic may be slowed down unneccesarily. Joe mentioned that a slow link and fast link behind a NAT will result in two connections running at the average of the two rates. Matt countered that this will still result in behavior that is safe, because of loss incurred on the slow link slowing down the entire aggregate. There was further discussion about possibly pathological situations in which the slow link could in fact be overwhelmed; it was not clear how plausible these scenarios are. Sally mentioned that the CM could recognize when two connections sharing congestion control state have vastly different behaviors (RTT, loss rate), and could move one of the connections to a different macroflow.

Vern then asked about a scenario in which the sender transmits 10 packets with TCP, and they all get lost, incurring a timeout. When does the CM know that nothing got through, and that it should adjust its notion of how much data is outstanding? Answer: the app tells the CM that it sent 10 packets, and based on the feedback it received (i.e., only implicit feedback due to a timeout), none were received.

Joe Touch then raised the question of whether a more complex cm_request() interface is needed, one that can issue a request to send multiple packets. The context in which this comes up is attempting to run a connection very fast, say at 10 Gbps. In this case, a function call is as bad as a kernel crossing. There shouldn't be a correlation between the number of packets to send and the number of function calls. Matt countered that people who are running at that kind of rate are not going to want to do this kind of congestion aggregation anyway.

Vern pointed out that we are defining an abstract API and not a concrete API, so a key question is to what degree would this change affect the abstract API we're documenting? Joe said we would need to delete the wording that the app must call cm_request() per packet. But Hari argued for keeping one call per packet or per MTU, in order to elimiate bursts of packets from being sent that locks out other users. Joe thinks this overhead is excessive. Hari suggested we could add something about back to back bursts, and asked Sally whether the congestion control principals document includes discussion of bursts. Answer: no, that document isn't meant to be a set of specific mechanisms. Her view is that additional congestion control mechanisms can be defined, and should perhaps be vetted by IETF process, either in ECM or TSV.

The chair then made the following proposal for moving forward with ecm-cm-01:

1. Add opaque (scheduler) data to the API when creating a macroflow.
2. Add specification of how to compute the RTT variation.
3. Change CM_PERSISTENT to CM_LOST_FEEDBACK, etc.
4. Add a comment to the document that some key experience we don't yet have is with scheduling APIs, and that this may lead to possible changes in the CM API.
5. Resolve the issue Joe raised regarding sending multiple packets.

After addressing these, there would be one more WG Last Call. The chair then asked for a show of WG consensus for this plan, with it being understood that the chair would interpret consensus for the plan followed by a successful last call as consensus for the document. A show of hands revealed good consensus and no opposition.

The meeting finished with brief discussion of the Informational document the WG is tasked to produce giving example(s) of implementing one or more CM schedulers. The chair asked for volunteers to begin an outline of the document, right up a particular scheduling policy, or serve as editor, noting that the document is already overdue and the chair hopes to expedite it. There were no public volunteers, however.

Slides

The Congestion Manager
DIAMETER Protocol Evaluation
Agenda