Congestion & Pre-congestion Notification (PCN) Working Group Meeting Minutes IETF 68 - Prague March 19, 2007 Meeting Minutes taken by Dave Borman, Fred Baker, Anna Charny and Spencer Dawkins Chairs - Scott Bradner & Steven Blake o Administrivia Scott Bradner gave a brief discussion of the fact that pcn is now a working group and needs to work to its charter. An opportunity was given to bash the agenda, and no bashing occurred. o Charter review Steve Blake read the charter to the working group. Especially on restrictions. One diffserv domain with trusted members. Milestones are revised after consultation with ADs. A mix of Informational and Proposed Standard documents. Architecture document will be comprehensive, including security and OAM. Lots of "may consider after completing" items - concatenated diffserv domains, etc. Would recharter to work on these. Stuart Goldman pointed the group to a draft (draft-goldman-pcn) regarding PSTN interactions. The chairs suggested that this belonged on a non-working group list. Scott: you can have non-wg discussion on non-wg items on non-wg mailing lists. o PCN Flow Admission and Termination Architecture Kwok-Ho Chan (see slides) Kwok gave an overview of previous work by himself, Josef Babiarz, Bob Briscoe, and others. Scott noted that while this is prior work, it is not grandfathered as the approach the working group is to take. The working group needs to look at the various proposals and decide on an approach. The key concepts are given in the following documents: http://tools.ietf.org/html/draft-briscoe-tsvwg-cl-architecture "An edge-to-edge Deployment Model for Pre-Congestion Notification: Admission Control over a DiffServ Region", Bob Briscoe, 25-Oct-06, http://tools.ietf.org/html/draft-briscoe-tsvwg-cl-phb "Pre-Congestion Notification marking", Bob Briscoe, 22-Oct-06, http://tools.ietf.org/html/draft-chan-pcn-problem-statement "Pre-Congestion Notification Problem Statement", Kwok Ho Chan, 25-Oct-06, Architectural considerations came out while writing the problem statement that need to be considered in the architecture, including the distinction between functions of the data plane and functions of the end systems that interpret its results. Kwok commented that there was value in thinking this should be thought about in a trans-domain fashion. PCN architecture discussions talk about interior nodes and edge nodes. Interior nodes mark, edge nodes use marking for flow admission and termination decisions. Architecture and problem statements probably need to be merged. Have done some work on traffic measurement with simulation work. Done to provoke thought. Is the architecture the subset or the superset? Lars: The WG would focus on single diffserv domain. Need to get marking semantics right, will be difficult to change later. Can't change diffserv architecture, but can add to it. Tina Tsou - Will have general architecture as superset, right? Right. Bob Briscoe: is concerned about designing the architecture for a single domain. He would argue for designing for the Internet and figuring how to use it in a single domain. Lars: part of the issue is trust domains. We need to be able to design the mechanisms assuming that we are within a common trust domain first, and then figure out trans-domain. Bob Briscoe - thought we needed to look at multi-domain. Diffserv was multi-domain. Scott - focus was on what a router does, regardless of domain. Lars - if you have multi-domain, you have to talk about security story. IESG thought single domain was best understood. We can discuss when we see the single domain work. Scott - would be premature to work on multi-domain if you don't have single domain yet. You can publish an individual draft individual draft using "pcn" in the file name even if the topic was beyond the scope of the WG, although it would not be discussed in the working group. ?Woody?Govat? wants to know whether trans-domain work should go to the transport working group. Lars - not in TSVWG - too confusing to talk in two places at once. Bob Briscoe - the picture of PCN arch isn't the whole internet, just a (big) hop (conceptually an AS). Scott - there are a number of constructs on how the internet. E.g., a single ISP can have multiple domains. Our charter says one domain. We can say there are other scenarios and multi-domain situations, but leave it as that without going further. Scott: markings don't leak, but are interpreted by an edge box. Making a statement in the architecture document about leaking markers and trust between domains as issues can be stated, but we don't have to go into them. Bob: Assume when we get to multi-domain, we'll need a new architecture document: Question: what did we agree on? Scott: We agreed to follow the working group charter. Steven: we have an 18 month schedule. There'll be plenty of time to move onward after that. Kwok continued with a discussion of the duties of an interior node, which boil down to simply, scalably, and interoperably calculate throughput and congestion indicators and mark traffic accordingly. The edge node needs then to do the right thing with the aggregate of those marks, which may include dropping packets or triggering application behavior such as terminating sessions. Kwok noted that we need common terminology. A show of hands suggested that including a glossary in the architecture draft makes the most sense. Lars noted that using diffserv terminology where applicable would be wise. Anna Charny suggested that the people who may want to work on the terminology may not be the same people who will want to work on the architecture.. Lars - reuse diffserv terminology (please). Expecting impact on architecture from flow rate adaption, cheating detection, multi-domain, application control. Don't want Interior Nodes to change if we work on these items. Basically, the architecture and terminology are not complete until that issue is settled. Kwok also mentioned several non-goals need to be considered while designing the architecture. Lars noted that standards track outputs of this working group should not change when the matter goes trans-domain. This of course has implications - trans-domain needs to be thought about enough to prevent such rework. Joe Evers asked whether it would be appropriate to have individual submissions for various parts of the non-goal discussions. Scott indicated that doing non-goal work tends to disrupt the progress of the working group. Bob Briscoe, commented that the scoping issue may make it difficult to produce a stable transdomain solution in a single-domain effort. Kwok also discussed a survey of encoding. He indicated that the DSCP could be used if PHBs are reworked to allow for additional coding. Chris Christou noted the Hiccup BOF, where the applicability to emergency communications may be discussed. Tina Tsou asked whether the measurements occurred on aggregates between ingress/egress pairs. Kwok - diffserv traffic aggregates within a single diffserv domain are under consideration here. Scott suggested that the best way to compare marking strategies was multiple drafts by the proponents. Josef Babiarz indicated that it would be better in a common draft. Dave McDysan suggested that measures of comparison including performance would be good to include in the performance metrics. Need to agree on methods of comparison of different need to agree on criteria for comparison of different approaches. He suggested compiling a common draft from several separate drafts. Scott: idea of comparison criteria is useful; need to develop that o Performance Evaluation of CL-PHB Admission and pre-emption Algorithms Joy Zhang Joy discussed simulation experiments the authors have been doing to validate the architecture. They simulated CBR, on/off voice, synthetic video, and a real video trace. Her previously reported admission observations were that when correctly configured the algorithm worked well on links that had a high ratio of capacity to individual session size. Dave McDysan commented that the results should have some measure of the real effect. How did over-admission percentage affect real users whose calls may be dropped. Anna Charny indicated that the testing Joy was reporting showed over-admission percentage rather than over-preemption. Over-admission in these experiments should not result in dropping any calls unnecessarily, but over-preemption could. It is a milder statement than the above. Preemption/Termination observations were that over-preemption was possible, especially in a long RTT channel, since it depends on the probable view as seen by differing systems. If all edge devices preempt at the same ECN-CE density and all flows in an aggregate are marked with the same probability, one would expect all sessions to drop simultaneously with some probability when only a few need to (this is a theoretical observation, not something seen in the simulation results). With multiple bottlenecks, a situation an occur in which one bottleneck can see variation in load when a second bottleneck builds on its marks to mark traffic. She noted that in her simulations this was not as much of a problem as expected. Georgios Karagiannis - Have you looked into marked/unmarked packets are lost? Joy: No. Simulations assume all packets get through: Anna Charny: Only assume that some of the marked packets get through. Argenous ? wanted to know how many sources one needed to make PCN work. Joy said that estimates of such were in her draft. ???: How does a PCN enabled router mark with different RTTs? Steve: In the charter, we only assume inelastic flows. Anna Charny: Differences in RTT didn't have much of an effect o Pre-Congestion Notification Using Single Marking for Admission and Pre-emption Anna Charny Have written overview that follows PCN drafts closely. Want to talk about tradeoffs and next steps. Core just does marking, egress measures unmarked traffic. Works like preemption in CL drafts except that there's only one marking. Ingress computes sustainable flow termination rate. Ingress computes sustainable flow termination rate from the marking observed, and also uses this marking for admission decision. Good - save one codepoint (important with MPLS), one metering measurement. Bad - Must set K parameter consistently system-wide; excess-rate admission control is more sensitive to parameters and traffic patterns, conflicts with Bob Briscoe's anti-cheating mechanism. Does WG need to choose between two (or more) mechanisms, or is there a way to include them all in the standard? May be able to define single-marking as a subset of two-threshold marking behavior. If only some core devices support type 1 and type 2, must all revert to type 1 and be configured with the same K constant. Type 1 marking is based on "excess rate", and Type 2 marking is based on "virtual queue-based" marking, as already defined in draft-briscoe-cl-architecture. Single-marking appears technically viable - fewer implementation changes to existing core equipment, smaller performance impact in data path of core routers. Does anyone see any major holes? Joe Babiarz - How does it work with multi-path routing? Anna: It may be possible that just doing nothing may not be to bad from performance standpoint. Also, the problem with ECMP may be solved if the ingress only chooses amoung those flows that are already marked when it selects flows for pre-emption. Both single-marking and draft-briscoe-cl-architecture both have the same issue with EC Scott: if you hash the high order bits in route selection, is there a problem? Joe: yes - The assumption is that one route is congested, and the other isn't. Joe: Have you considered a disparity in the number of flows? Anna: This should not matter much. Worst case is when everyone has a small number of flows, and we simulated that. Joe: When you have on-off traffic, is during your measurements? Say 40/60, how do you address the error when the 40 percent isn't sending? Anna: We don't explicitly address that error. Our simulations already include such on-off traffic with long periods of silence, and our results include that on-off effect. Bob: Are you saying it's easier to use a token bucket for this? Anna: the marking here is the same as it is today. The token marking of the virtual queue will require more work. Jozef Babiarz: #flows is calculated at ingress. Another way is to calculate at egress. Is that a difference? Anna: it doesn't change things for these measurements, but it is a change to the currently proposed architecture. Anna: should the WG consider allowing two or more options? Or should there be just one? Steve: There are tradeoffs, and performance is not the only one, or even the most important one. Implementation is important. Scott: IPR consideration is another item. The Cisco IPR statement is fine, it's been accepted in some WGs, and rejected in others. Lars: One reason that the encoding is a proposed standard is for interoperability. If you have multiple versions, you fragment the solution space. The initial hope was for one solution that is good enough for most situations. Unclear if this is a clever renaming or if there is a conflict. o Explicit PCN Marking Jozef Babiarz This is only for flow termination, not admission control. Want to stay above Admissible rate, but below Supportable rate. Mark packets above supportable rate. The charts show two failure situations. One, fast failover, second slowly transition traffic (80% within one second, and the remaining 20% within 5 seconds). Ingress routers mark packets, egress routers monitor packets for congestion markings, and signal what flows are marked to ingress routers. Ingress routers perform flow termination. Have done simulation for voice, video, etc. Preemption does work, but really bursty traffic needs a large token bucket. Termination happens when preemption doesn't happen. This is a different tradeoff than previous presentation - react quickly, or avoid over-preemption. WG needs to discuss what we want to happen. As part of definition, need to decide how fast we need to converge. Before people timeout and hang up - that's too late. RTT doesn't matter - do different RTTs in the same network matter? Expect parameters would be selected based on longest RTT in the system, in which case RTT doesn't matter. Bob Briscoe: Preemption or flow termination is a last resort when other things haven't worked. Joy was describing a situation to preempt as fast as possible, Joe's is to do preemption slower. ???: Joe commented several times it works well, it is personal preference on what "works well" means. How fast do you need things to converge? If people hang up before things converge, that's a problem. Scott: you need to think about the human reaction time, not just the system reaction time. If the human time is faster, that's not good. Joe: simulated different RTTs (2 msec to 800 msec, and reaction time was different for each value. Results presented showed that for RTT of 50 msec, flow termination took less than one second to bring load down to a supportable rate. Anna: does it matter when you have different RTTs in the same network? Joe: we haven't tested that. Lars: The point is to alleviate congestion quickly. It would be useful to stick to that. Scott: The term "quickly" is not easily defined. o Next Steps for WG... Scott: How do we move forward? The authors that are up already, should make sure the documents are concurrent with the charter. If you think it is consistent with the charter, then send it to the chairs and the mailing list will be used to get consensus. Volunteers are wanted for the documents on the to-do list. Chris ???: BOF is at 11:50 AM in Congress I.