Differentiated Services WG Meeting Summer IETF 2000 Pittsburgh, PA Monday, July 31, 2000, 7:30pm -------------------- Agenda Bashing and Updates of Note - Kathie Nichols, Brian Carpenter (co-chairs). MIB and Model are near last call, PIB and PDB definition guidelines are in progress. The tunnels draft has completed its informal WG last call and is now in the hands of the IESG. There are diffserv-related activities going on in other WGs, mpls and policy are of particular note. Other groups that are doing things with the DS Field in the header include dhcp and pilc. There are also performance related activities in ippm and bmwg. Good liaison relationships are also needed with other pieces of the industry, e.g., third generation wireless. WG members need to help track things, and point out inconsistencies with/misperceptions of diffserv, as the chairs can't keep track of everything. -- Diffserv MIB - No presentation, Brian Carpenter (co-chair) led a discussion of open issues. One of the authors, Kwok Ho Chan (Nortel) was present. Four major open issues from the list: (1) DiffservActionEntry and DiffservActionCountEntry. Should these entries appear by magic? Proposed resolution is "no", avoiding the need for a status indicator in the MIB. No objections to the proposed resolution. (2) Issue 23: Do daughter entries of derived table entries need to exist independently of parent entries? This open issue will be taken to the list with a firm deadline for resolution. (3) Race conditions involved in concurrent creation of TCBs by more than one SNMP manager. There were no objections to the proposed resolution of ignoring such races because they should be sufficiently rare to not matter in practice. SNMP has similar race conditions elsewhere. (4) Issue 31: Inheritance - parent entry often points to daughter entry using an explicit attribute. Proposal is to make this a row pointer to match usage elsewhere. No objections to this. Joel Halpern raised the issue of ifDirection occurring throughout the MIB, and suggested a table indexed by it to factor that out; this also cleans up a related issue of how to find the first TCB for traffic moving in a particular direction. This should not change what can be represented in the MIB. Joel will send email to the list summarizing this proposal for review, which will also allow the other authors of the MIB to respond. On the list, this proposal was revised to a table indexed by both interface and direction, factoring both of these out of the MIB. Overall, there are still a few issues to be resolved, and this needs to be done quickly. There will be a fairly short list discussion followed by WG last call; everyone should expect this to complete well before the next IETF meeting. -- Diffserv Model - Brian Carpenter led the discussion. The only open issue is John Strassner's objection to the use of the TCB concept, based on policy WG difficulty in using it to manage policy. The underlying issue is that the policy WG's QoS device model will need to match the diffserv model/MIB; the policy WG needs to follow the diffserv WG's direction in this area, but is encountering difficulty in doing so. The TCB concept is in both the model and MIB due to requests from vendors who are implementing diffserv devices, and hence is likely to be necessary to manage policy for such devices. Diffserv and Policy WG chairs, along with the appropriate ADs will consult off-line about how to make progress. Results should be visible shortly, as the model needs to be done soon. A WG last call on the model is expected in the near future. -- Diffserv PIB - Michael Fine, Cisco Summary of changes from previous draft: - 802 PIB and DSCP to CoS mapping removed. - Terminology changed to match diffserv model and MIB. - Changes to track RAP SPPI (Structure for Policy and Provisioning Information) - Additional minor changes and cleanups, including combining QueueSet and DSCP assignment tables. - Filters and FilterGroups have been removed to generic framework PIB (work item in rap WG), ditto interface type table. The PIB is simpler than the full generality of the diffserv model in a number of aspects: - No counter objects (not useful for policy) - Use DSCP to determine drop preferences rather than generic classification. - No algorithmic droppers. Tail-drop and Weighted RED are the only allowed algorithms. - Meters are always specified via token buckets. - DSCP attribute specifies Mark action. - Datapath interconnection is fixed, not flexible as in model. These are motivated by desire to abstract and simplify as part of doing PIB. Whether all of these simplifications are appropriate will need to be discussed on the list. The current version of the PIB does not support Hierarchical Queuing or Shaping. Both will be required at some point in the future, and there was some discussion about supporting shaping in the first version. Hierarchical Queuing support will likely require adding the TCB concept from the model to the PIB - the current PIB does not use TCBs. Changes needed to the list of scheduling algorithms: - WRED should be listed as MRED because WRED is overly specific. - WRR should be explicitly listed as scheduler, as it's different from WFQ. - Plain round robin should be added, even though it's a special case of WRR. There was some discussion of abstraction of the scheduling mechanism to only deal with weights rather than indicating the scheduler algorithm. A note will be sent to the list proposing specific changes. There's an intellectual property issue in the PIB, and hence there's a disclaimer in the document. There are words in RFC 2026 that should be used to indicate this. The authors of the PIB will revise it and submit the next version in 3-4 weeks. The plan is to finish discussion of the model and MIB before taking up the PIB on the list. One consequence of doing this is that the PIB will have to follow the MIB in any areas where they differ. -- PDB Definition Draft - Kathie Nichols, Packet Design This is an update of the diffserv BA definition draft based on terminology changes; ignore the -00, as it's not a -00 version. More minor edits are coming to match diffserv terminology updates. This is not a diffserv applicability statement, and it's probably premature for the WG to attempt to write one. There was a fair amount of discussion of service definition. Significant work is taking place, currently outside the diffserv WG, on writing service definitions. In some cases, this is based on the mistaken belief that diffserv is about end-to-end services. Rather, diffserv to date has been about providing a toolbox from which such services can be built. Something needs to be done, and this PDB document could provide very useful guidance to such efforts. Worked examples are needed of how to build a service out of queues, classifiers, etc., although service definitions should be expressed in terms of parameters and attributes (e.g., bandwidth, queue size, drop thresholds, etc.) without dictating specific numbers for these parameters and attributes. It's important to keep the traffic-based service specification (SLS) distinct from the definition of the service offering (SLA), as the latter is much broader and out of scope for the WG. The PDB definition draft will be updated to talk about how to derive a SLS from a PDB, so a PDB is a service definition. It is not clear whether WG consensus exists for inclusion of the Bulk Handling PDB as an example in this draft. Anyone who cares about this issue one way or another MUST send comments to the list, soon. The PDB definition guidelines document is informational. The issue of whether PDB definitions themselves (written according to the guidelines in this document) should be standards track, informational, or experimental was explicitly deferred. Deployment experiments are in progress, and the WG hopes to see reports on results (suitably filtered to leave out proprietary details and identification of customers); initial multicast and ATM deployment experience did include reporting of these sorts of results back to the appropriate standards bodies. The next revision of this document will include a requirement for deployment experience with a PDB (more than a lab test, but less than a full scale service offering) before advancing the corresponding PDB definition document. Minor clarification on PHB group. A PDB can be constructed using multiple members of the same PHB group, and the draft will be clarified to make it clear. This draft will be revised and sent to list for further discussion. -- Stateless Prioritized Fair Queuing (draft-venkitaraman-diffserv-spfq-00) This is based on the dynamic packet state draft presented in Minneapolis. State in the packet header is modified by core routers, avoiding additional router state. The goal is fair allocation to flows sharing a network using a core stateless architecture, and providing support for intra-flow drop priorities. Packets are marked based on per-flow rates, each link has a calculated threshold used to determine whether to drop/forward based on packet marking. The example presented marks packets with a per-epoch count that resets to 1 at the start of each epoch. The threshold is a number calculated based on link loading; all packets whose mark is greater than the threshold number are dropped. More complex functionality is possible. A strong objection was raised to the name of this draft - this is really weighted dropping, not fair queuing, as the packets don't come out in the right order for fair queuing. Approximate fair bandwidth allocation is a better description. The authors have not done any work to determine how sensitive their results are to precision (number of bits) available for the packet marks. It may be possible to advance a future version of this draft as an Experimental RFC. -- Policy Based Differentiated Services on AIX - Ashish Mehra, IBM Research This is a report about the implementation of diffserv on AIX, supporting traffic management for both QoS aware and unaware applications. This implementation is policy-based, using the policy WG's QoS schema and an LDAP repository. A diffserv API is available to QoS-aware applications; the agent accessed by that API has command line and local config file interfaces to override the policy repository. This work will be updated to include the work underway in the snmpconf WG. The QoS manager in an AIX kernel extension. It classifies sockets, and has access to server-specific info (e.g., user id, application name). TCP and UDP implementations are specific to those protocols. This made the implementations easier to optimize, but doesn't provide support for protocols that sit directly on top of IP. Shaping is done via configurable timers and triggers (e.g., receipt of a TCP ACK is a trigger). Symmetric multiprocessor issues made this implementation tricky. A problem was encountered in that QoS-unaware apps aren't prepared to handle errors on socket syscalls caused by the presence of diffserv functionality. The authors plan to extend this work to IP-level packet classification and control, as well as inbound traffic control. AIX will do CBQ at IP level in future. See draft for experimental results. This implementation is capable of reasonably accurate policing and shaping from kilobit to megabit rates; it's being deployed on Internet2, including collaboration w/ICAIR. Work is also in progress on integrating with other AIX controls for cpu, memory, etc. Wednesday, August 2, 2000, 900am -------------------------------- -- Diffserv VW PDB draft - Van Jacobson, Packet Design This is the successor to an earlier VW BA draft - it's actually an -01 version, even though the name is -00 due to a change from Behavior Aggregate to Per Domain Behavior terminology. Goal is to carry circuit-oriented traffic across diffserv clouds without using dedicated virtual circuits. This requires more bandwidth in diffserv portion of path than actually used by VW traffic. The defining characteristic is the ability to reuse a physical wire of specified bandwidth at egress - this constrains arrival time of packets, as too big an inter-packet gap causes an output gap. An important point is that this requires bounding phase jitter (wrt a fixed-cycle clock), not just inter-arrival jitter, as the first packet that misses its phase deadline causes an output gap (disastrous if this is audio traffic). Inter-arrival jitter bounds can be derived from phase jitter bounds, but the math is more complex. One can have inter-arrival jitter of up to 2x the phase jitter in some cases and still have each packet show up in the right window. What MUST not happen is an accumulation of inter-arrival jitter in a fashion that causes the average arrival rate to fall below the specified output rate. Credit to Grenville Armitage for a good explanation of this on the list. A subtle point is that there must be some clock phase that puts one packet in each clock period, but not all clock phases must have this property. In essence, the egress router can insert delay to get to the right phase, and then there will be no output gaps. Practical deployment would be based on delay distribution measurements, either per-node (this draft) or entire path (draft-mercankosk). Both approaches have the same overall goal. Fixed delays don't matter much, as circuits still work when wires are lengthened, but the variable delays are crucial, as failing to accommodate them leads to an output gap. The result is that the amount of phase jitter leads directly to limits on the maximum supportable rate for this PDB - the fastest rate is 1 MTU packet per phase jitter window. This is end-to-end through network, which may involve summing things over a number of intermediate nodes. Draft has been updated to expand discussion of jitter accumulation due to multiplexing. The underlying idea is that each flow is jittered by a specific other flow exactly once, making it feasible to bound the total jitter. VW is defined in terms of a common rate and packet size across all flows involved, and things go wrong when this isn't the case. There are three possible ways to improve on this situation: - Treat all VW traffic as having the shortest service period (highest rate). - Have an instance of the EF PHB with its own DS codepoint for each traffic rate, served in rate-priority order. - Separate rate/jitter bounds in SLS for each service class. The first one may not be the best approach, and the second one may run into trouble if there are a lot of different traffic rates (e.g., from different voice codecs). Note that there are configuration restrictions in all three cases - there's an upper bound on how much VW/EF traffic can be accommodated, and one can't hand out more than is available; not all subdivisions of the capacity may work in practice. The approach to using VW is based on operational experience and actual measurements of the phase jitter. Sufficient a priori conditions to properly deploy VW aren't known in general, and the current version of the draft may be overly broad in implying otherwise. Van will respond to a specific example posed by Anna Charny on the list, and the next version of the draft will make the need for measurements of the actual network to be used clearer. Jon Crowcroft mentioned a prototype system that implements VW for intercontinental voice calls. -- VW PDB Analysis and Extensions - Guven Mercankosk, Australian Telecommunications CRC This is a "friendly amendment" to the original VW PDB draft (discussed above). New in this document: - Delay equalization of microflows at domain egress; this is a requirement, not an automatic consequence of other things. This is essentially the same as the discussion of phase jitter and traffic reconstruction at output above. - Discussion of route pinning, may be necessary to make this work - Jitter window may be independent of virtual wire rate - Delay due to non-EF flows does not reduce achievable VW bandwidth - Characterization of overall transfer delay bound. Equalization at output to recreate signal is important when connecting to another DS domain to avoid loss caused by policing. No conditioning is necessary inside a diffserv domain beyond that provided by EF PHB configured rate at each node. Ingress shaping is required to match inbound flows to the EF PHB configured rate. All the conclusions in this document are statistical and based on uniform packet sizes. The author believes that there are corresponding worst case results, and results for varying packet sizes, but these are not in the current draft. -- Next steps on VW PDB documents The basic VW draft will not be advanced without operational experience. Jon Crowcroft's work is welcome news, and the authors of the basic VW draft will look at how to merge in the work reported in draft-mercankosk. The result may be split into multiple documents (e.g., a separate informational one containing all the math). Comments on what should be done here are welcome on list or directly to the authors of both VW drafts. -- Precision of PHB definition - Brian Carpenter (co-chair) This has been called Issue 0 on the list: Must PHB definitions provide for mathematically rigorous conformance tests, or are intuitive descriptions good enough? The answer is a definite "it depends". At one end of the spectrum, the Default PHB is inherently uncharacterizable in this fashion, but since EF is intended for accurately characterizable services, an accurately characterizable PHB seems to be needed. -- Charny EF Draft - Anna Charny, Cisco RFC 2598 is not implementable as specified, and the definition is ambiguous as stated because different people have different opinions about what it means. It's also impossible to test for conformance given the current definition. The claim is that the proposed fix in this draft is required. The draft authors haven't found an alternative, and obvious attempts to fix it on the list haven't worked. The overall goal was to fix bugs in the RFC 2598 definition of EF, and the result is a more rigorous definition that yields rigorous conformance tests. The intuition behind the proposed fix is that an arriving EF packet must depart after all the EF packets in the switch/router are drained plus an implementation-dependent error term, E. That term is a figure of merit for an implementation that indicates how the actual device differs from the ideal device - it includes both latency and jitter. Determining E should be similar to determining RFC 2598's jitter-based rate restriction. Accuracy (smoothness) of the scheduler is involved in both cases. An important difference in approach is that RFC 2598 uses jitter to constrain the configured rates, whereas the Charny EF draft allows any rate, but requires the value of E to be declared. This removes rate restrictions of current EF definition that are not required for some (e.g., simple) networks. If these restrictions were sufficient to obtain VW PDB, that would be fine, but the VW PDB draft has to add restrictions. Removing EF rate restrictions would not affect VW's added restrictions - the PDB definition is the right place to put these restrictions, not the PHB. Another problem is that the current definition of EF disallows internal latency that is one or more MTU transmission times, which can be an issue at OC-192 speeds. This draft also corrects that problem. A lively discussion ensued, including Van Jacobson displaying a diagram of a scenario that is allowed by the Charny EF draft. There was agreement that the scenario shown on Van's slide is allowed by the Charny EF draft, but a lack of consensus on whether it represents an actual problem. Van Jacobson described the scenario as accumulation of too much jitter, but Anna Charny pointed out that arbitrary accumulation of jitter is not allowed by the Charny EF draft because the E error term can be used only once by a stream of traffic and does not reset. WG consensus is that the RFC 2598 EF definition needs some changes and clarifications, and the mathematics involved should be in the same document as the clarified definition. The WG chairs will form an offline design team to figure out what should be done, including people who are authors of neither RFC2598 nor the Charny EF draft. Both documents, plus the mathematical approach taken by Van Jacobson and Kathie Nichols on the list will be put before the design team. Design team details will be determined off-line.