Pre-Congestion Notification BOF Monday, November 6 at 0900-1130 Room: Grande Ballroom C BOF Co-Chairs: Anna Charny and Scott Bradner 1. Agenda Bashing and Administrativia (5 min) Scott Bradner: Robert Schaffer will be talking about work in another SDO. No comments. 2. Overview and Introduction (20 min) Anna Charny: Details: http://www.ietf.org/internet-drafts/draft-chan-pcn-problem-statement-01.txt * What is PCN? - Admission control & pre-emption using packet marking - Separation of core and edge behaviors: core marks, edges measure and act on marking * Why it's a technical problem of interest * Why admission control (AC)? stretching traffic assumptions beyond current typical mix to large proportions of inelastic traffic, dealing with failures & anomalous traffic scenarios * Why yet another AC mechanism? A further scalability step beyond Intserv over DiffServ, aggregate reservations and reservation-based RMD (no core state but still signaling in core) * State of Maturity: builds on substantial body of prior theoretical work, and substantial recent work by those working on PCN 3/ Why an IETF problem? * Progressed to stage where standardization of packet marking would be reasonable * Some open issues, where consensus still hasn't converged (see Phil Eardley's presentation next) - marking behavior - marking encoding - how much aggregation is 'enough' - consensus best formed around discussing these issues wrt well-understood deployment models. * Connection with other groups: - RFC3168 ECN compatibility (tsvwg) - signaling extensions to be done in relevant groups - RSVP (tsv), NSIS (nsis), SIP (mmusic) - pcn over MPLS (tsv) - relevance to certain pseudo-wires? (pwe3) - Ensuring emergency services work across a PCN cloud (ieprep & ecrit) * Summary: - There is a legitimate technical problem - There is need of standardization effort - The solution is sufficiently mature - There are a bounded number of open issues needing resolution Hannes Tschofenig: There was no official design team working on PCN in TSVWG, correct? Scott Bradner: Correct. Stuart Goldman: Are you interacting with organizations outside the IETF? Scott Bradner: After forming a working group a message is sent to the new work mailing list where many other SDOs are subscribed to announce it and to determine overlaps. 3/. Key Open Issues (10 min) Phil Eardley * Standardization issues - packet marking (without defining algorithm, but ensuring interoperability) - mark encoding (ECN, DSCP?) - conversion of packet marking into flow-granularity AC & pre-emption decisions (exactly one behavior, or more than one - perhaps different deployment scenarios need different behaviors: current consensus: exactly one behavior is doable) * Focusing the work - community consensus required on deployment models - then community consensus required on assumptions (loosen assumptions for re-chartered phase II) underlying these deployment models that are necessary for PCN to work. * Main assumptions - PCN region is fully enclosed by edge nodes that ensure no PCN traffic can enter without AC. (controlled environment) - avoid cheating issues (but trust is the issue that is likely to come up after re-chartering) - specific handling of emergency and flow-level precedence out of w-g scope, but ensure we don't preclude what others are doing * Technical issues - compatibility of coding with RFC3168 - sub-optimalities: ECMP, bi-directional flows & pre-emption WG will need to reach consensus on the extent to which standardization effort should address these Yong Xue, DoD: Why to exclude the emergency services work on preemption? Philip Eardley: It would add a lot of complexity. They are not really in the expertise of the PCN group. Anna Charny: Not focusing on *standardizing* emergency services, but scope does include working with other groups who do Kwok Ho Chan: Clarification: Policy on MLP etc is an edge-edge issue, that we don't need to preclude in focusing on the interior to edge interface Robert Shafer: Chair of the Network Interconnection Interoperability Forum gives a brief overview of their organizations and the work they do. They started with work on PSTN and now focus more on IP. Their group is looking forward to interact with the PCN group. Scott Bradner suggests that members from their group subscribe to the mailing list. Stuart Goldman: ATIS BTSC SAG & ITU-T would be groups to talk to. Scott Bradner: Information from these groups would be useful. Robert Schafer: Emergency announcements to local areas over PSTN - ultimately NIIF may extend into this space. These calls look like regular calls, not specially marked. This would be a hard problem. 4. Overview of Existing Work (10 min) Kwok Ho Chan * Deployment Models - Controlled Load : RSVP as signaling protocol outside and between PCN edge nodes see http://www.ietf.org/internet-drafts/draft-briscoe-cl-architecture-04.txt - SIP controlled flow admission & pre-emption model see http://www.ietf.org/internet-drafts/draft-babiarz-pcn-sip-cap-00.txt James Polk: Where does a phone fit in the middle? Kwok Ho Chan: Phone on edge of PCN region Dave Oran: Find picture incredibly confusing but suggest to move on Joe Babiarz: PCN region extends to end-points David Black: Is the assumption that phones are trusted? Kwok Ho Chan: Yes Francois le Faucheur: the call-server is meant to be *above* the data plane, not on the edge of the PCN region Bruce Davie: Confusingly described. Also may not be legitimate. But the w-g can decide whether or not it is a legitimate model to take on board, don't have to decide in the BOF. Dave Oran: Is it that the real issue is what the data plane is doing to determine whether to admit the call, with some signaling to set up the call that may be SIP (or anything else) Dave Oran: This has nothing to do with SIP. There is something that is required to clue SIP together with PCN. David Black: Yes, you need a control plane and SIP may be an example. Kwok Ho Chan continues his presentation * Measurement and Encoding at Interior Router - http://www.ietf.org/internet-drafts/draft-briscoe-tsvwg-cl-phb-03.txt - http://www.ietf.org/internet-drafts/draft-zhang-pcn-performance-evaluation-00.txt - http://standards.nortel.com/pcn/simulation_results_00.pdf - Example packet marking method described in the cl-phb draft above for AC & flow pre-emption - Measurement of marking of aggregate for AC & flow pre-emption also described Lars Eggert: BCP upcoming on how you safely redefine ECN. Stuart Goldman: How fast does it take to get back to normal? Anna Charny: Depending on the parameters it might take more than 1 second, but for set of 'reasonable' parameters used in simulation it does preempt within 1 second. A few seconds is the target preemption time. Normally, a call is 2 min. But if quality of service degrades dramatically, then in a few sec most of the users hang up. And if all users are affected, all users can hang up. Therefore you want to preempt before that time, so that you only preempt enough calls to make sure that other calls are not affected. That defines the requirement that pre-emption must work within 1-2 seconds. Kwok Ho Chan: In some jurisdictions it is not allowed to preempt calls. Question from the floor(name nor recorded): Does preemption by the network allow information to be returned to the end host? Francois Le Faucheur: Yes. Bob Briscoe: You don't have to use preemption, but an operator could use it. Francois: The idea is to add some determinism when a failure happens. There will be signaling to tear down the call properly to give the end host (user) some feedback. Kwok Ho Chan continues with the slides: Repository of PCN drafts at is at http://standards.nortel.com/pcn/ Pre-Congestion Notification Problem Statement http://www.ietf.org/internet-drafts/draft-chan-pcn-problem-statement-01.txt Pre-Congestion Notification marking http://www.ietf.org/internet-drafts/draft-briscoe-tsvwg-cl-phb-03.txt An edge-to-edge Deployment Model for Pre-Congestion Notification http://www.ietf.org/internet-drafts/draft-briscoe-tsvwg-cl-architecture-04.txt SIP Controlled Admission and Preemption http://www.ietf.org/internet-drafts/draft-babiarz-pcn-sip-cap-00.txt Pre-Congestion Notification Using Single Marking for Admission and Pre-emption http://www.ietf.org/internet-drafts/draft-charny-pcn-single-marking-00.txt Performance Evaluation of CL-PHB Admission and pre-emption Algorithms http://www.ietf.org/internet-drafts/draft-zhang-pcn-performance-evaluation-00.txt 5. Open Discussion on PCN (20 min) David Black: The PCN region roughly corresponds to a DiffServ domain. However, when we did DiffServ, we looked at cross-domain. There is the risk of making the problem space larger than it initially was. Where this starts from is the trust discussion. The assumption was pretty good and corresponds to DiffServ. Failing to look at the cross domain stuff is too narrow scoped. It does not need to be on the MUST-to-be-solved items. Scott Bradner: We talked about this. We thought about rechartering to add this cross domain aspect later. Is this something to be solved already in the first round? I don't know. It needs to be addressed at some point in time. David Black: How much keeping in mind needs to be done to ensure that it can be done later? Bob Briscoe: Only brought PCN to the IETF a year or so ago once I had a solution for inter-domain deployment where they don't trust each other not to cheat. However, agree it is best to get the necessary parts standardized for single trust region most urgently, then working on multi-domain with no mutual trust after re-chartering. However, there are some issues that have to be decided for a single domain that may impact on the anti-cheating solution, e.g. PCN wire encoding. So we have to keep in mind the multi-trust-domain scenario during initial standardization. David Black: As long as 'keep in mind' is taken strongly. Yakhov Stein: What's all this for? Detect congestion because of a failure? Isn't there an easier mechanism when there is a failure? Anna Charny: Pre-emption deals with failure, Admission Control does not - it is more for the situations when provisioning may not be the right solution. Dave Oran: I assumed we could have multiple classes, all with admission control Bob Briscoe: Can isolate link capacities for each class, and can have borrowing from other classes, but not priority access by another class to the admission controlled resources Yakhov Stein: What is the timescale of these admission and preemption events? Bruce Davie: Timescale is that of flow admission arrivals. Yakhov Stein: Congestion causes packet forwarding to slow down is a performance hit. Anything that works on buffer level is already too slow. Anna Charny: We do not do that (do not work at the buffer level). The idea of pre-congestion is to get the notification before the congestion happens. Francois Le Faucheur: You trigger before you are congested. The mechanism is not to trigger the queue. You can use some sort of threshold or virtual queues. Dave Oran: I made the naive assumption that this stuff works on a DiffServ class domain only. Anna Charny: The assumption that it is applied to a DiffServ class is correct for the current scope. Another assumption is that the traffic in the DiffServ class is not elastic. Dave Oran: Is this necessary? Do we have mechanisms for elastic traffic that work just fine? Can we say inelastic traffic? Anna Charny: Yes. Dave Oran: The ability of bandwidth sharing between different classes is not provided. Bob Briscoe: You don't need to have complete isolation Liam Casey: I have read the simulation draft. Stability is a huge factor. I am not sure there are large scale simulations which show it is stable. This has to go to work with millions of calls. It is not only about simulations of 5 servers. Anna Charny: Yes, the current simulations do not come close to the scale of real-world systems. However, we need to talk off-line about what stability means here. We need to understand to what extent stability concerns for traditional congestion control are applicable here. Scott Bradner: You haven't tried large scale stability simulations? Anna Charny: No, not yet. Lars Eggert: How quickly do you react to congestion notification? Anna Charny: It has to react quickly before people hang up, within 1-2 sec. Lars Eggert: Is the detection of congestion slower than the detection of pre-congestion? Anna Charny: Yes Josepf Babiarz: The detection is fast but there is the question on how long it needs to take before taking some actions. Scott Bradner: Being very aggressive leads to instability. Lars Eggert: Is there any algorithm between the preemption vs. going down with admitting new flows? Anna Charny: There is an algorithm for each in the cl-architecture and cl-phb drafts. Speaker at the mike (name not recorded): Preempting is a rather a difficult thing to do. Have you thought about notifying users to fall back to a different codec rather than preempting the conversation. This, however, requires feedback to a user. Josef Babiarz: Yes. Anna Charny: There are two different questions: (a) Have we looked at it? (b) Should the working group do it? Regarding (a): yes. Regarding (b): No, not under the proposed charter. Scott Bradner: That's a reasonable thing to discuss on the mailing list. Dana Blair: Is there a formal definition of stability? Scott: Do you have some suggestions? Anna: I don't know how to answer this question in this context. Stability is typically applied to congestion control systems where the function of interest can adjust up or down. In the framework we propose, Admission Control can only stop admission, and pre-emption can only bring the amount of traffic down. So there are two questions of interest: (a) Admission: Does it stop admitting when expected or does it over- or under-admit? (b) Preemption: Does it pre-empt the right amount or does it over- or under-shoot)? Jim Bound: There is a lot of momentum for admission control. There is a lot of things important in the proposed scope. Scott Bradner: There are two things: (a) Propose work in the IETF, (b) Keep in mind when designing the core protocol. You need to know what the requirements are to design the protocol. 6. Proposed Charter (20 min) Anna Charny presents http://standards.nortel.com/pcn/PCN_draft_charter_v01.txt * Standards track docs to address 3 areas: 1. When should an interior router mark a packet (i.e. at what traffic level) in order to give early warning of its own congestion? 2. How should such a mark be encoded in a packet (in the ECN and/or DSCP fields)? 3. How should these markings (at packet granularity) be converted into admission control and flow pre-emption decisions (at flow granularity)? Jacob Stein: When should a router mark a packet sounds like implementation. Scott Bradner/Bruce Davie (both said same thing): Shouldn't describe hardware, but should describe behavior for interoperability. Anna Charny continues presentation: * Informational documents: 1. a Problem Statement James Polk: Don't see requirements doc. Problem statement isn't the same Scott Bradner: In general believe requirements doc is a good idea. Also worried about getting bogged down. Anna Charny continues 2. at least two deployment models, possibly from: - like Insert over DiffServ (RFC2998) but with PCN-enabled DiffServ region and edge nodes decide about admission and flow pre-emption - SIP Controlled Admission and Preemption: using trusted SIP endpoints (gateway or host) performing admission and flow pre-emption and PCN-capable routers within the DiffServ region James Polk: Sounds like a SIP doc for SIPPING Scott Bradner: This represents requirements to SIPPING Stuart Goldman: Is it reasonable to add a boilerplate to the charter to say that work will be done with other working group. Anna: Sounds good to me. Scott: Also useful. Anna Charny continues presentation: - Pseudowire: PCN may be used as a congestion avoidance mechanism for end-user deployed pseudowires (collaborate with the PWE3 WG) Lars Eggert: Not solving the whole pseudo-wire problem, so won't be useful. Yakhov Stein: Nowhere does it say PWE3 should be over provisioned paths Lars Eggert: There is a statement that does say this. Dave Oran: If it requires SIP to change, we have problems, but I doubt it does. It has a dependency on SDP, which has the QoS pre-conditions. Anna Charny: What would you suggest? Dave Oran: Change the wording to Application-controlled (rather than SIP-controlled) admission & pre-emption Stuart Goldman: As SIPPING says they take all changes to SIP, shouldn't this w-g say it takes all PCN-related aspects? Scott Bradner: Good point Bruce Davie: Reservation on using PCN for pseudo-wire is that concern with PWE3 is immediate deployment of Internet as-is. There is precious little ECN, let alone PCN deployment (given it's not standardized yet) in the current Internet. Anna Charny: Should pseudo-wires stay in the charter? Bruce Davie: No, too forward looking for a phase 1 charter Anna Charny continues presentation: 3. analysis of the signaling extensions required to support PCN - actual extensions in other w-gs (RSVP, NSIS, SIP) not PCN 4. at least one example solution implementing the framework and its performance evaluation (e.g. simulation) 5. tradeoffs between different encoding possibilities (e.g. ECN and DCSP marking) * Initial restriction of scope (assumptions): 1. Trust assumption: PCN region is a controlled environment David Black: Trust assumption: need to add 'keeping in mind' interconnection. I won't let you forget that. 2. aggregation: many flows on any link 3. inelastic traffic: QoS assurance for r-t apps generating inelastic traffic - voice & video low delay, jitter and packet loss, i.e. Controlled Load Svc [RFC2211]. 4. Specific handling of emergency and other precedence (911, GETS, WPS, MLPP etc.) calls out of scope, - *but* PCN must ensure using PCN technology doesn't preclude edge nodes from taking the appropriate emergency actions required by these standards/initiatives - PCN Internal Nodes may not be MLPP-aware but they are DSCP aware Bob Briscoe: Assumption 2 (aggregation) doesn't seem to include SIP endpoint Matt Mathis: Change assumption 2 to "There are sufficient flows where relevant bottlenecks" Anna Charny: Will do Francois le Faucheur: Assumption 2: Add applicability statement to the charter. Anna Charny continues presentation: * Re-chartering later May relax some of the assumptions when re-chartering (e.g. single trust domain assumption) Out of scope: general investigation of AC mechanisms - we build on considerable academic and prior IETF work on measurement-based admission control (MBAC) * Milestones Nov 2006 initial Problem statement Nov 2006 initial deployment models March 2007 initial router marking behavior (including encoding) March 2007 initial flow admission control and pre-emption mechanism (including edge node behavior) March/July 2007 submit Problem statement March/July 2007 submit deployment models Nov 2007 submit router marking behavior Nov 2007/Mar 2008 submit flow admission control and pre-emption mechanism Nov 2007 initial signaling analysis Mar 2008 re-charter or close Francois le Faucheur: How would applicability statement fit into milestones? Scott Bradner: Personally a stand-alone applicability statement would be my preference. David Black: Would need an applicability statement specific to the endpoint model. Scott Bradner: Or not signaling protocol specific David Black: Need to use SIP as an example app signaling protocol Dave Oran: Most important aspect is endpoint-endpoint in that deployment model. Georgios Karagiannis: SIP applicability statement. Yes, SIP just an example application signaling protocol. Also need end-to-concentrator, or concentrator-to-concentrator. 7. Discussion of Proposed Charter (20 min) No further discussion 8. Summary and Conclusions (10 min) Scott Bradner: Is this ready for IETF w-g? [About 150 people in the room.] Of which about 100 voted: ~98 for; 2 against Scott Bradner: Now the task is to work with the ADs on a charter, to post to list. This BOF does not presuppose who the w-g chairs are in any way. Lars Eggert: Show of hands please, on who is interested (other than the authors of the drafts) in helping in this w-g? 7 hands up. Question from the room: Is there a mailing list? Anna Charny: pcn@ietf.org