Internet Engineering Task Force Bob Briscoe, Jon Crowcroft INTERNET-DRAFT BT, UCL draft-ietf-tsvwg-ecn-ip-00.txt 23 Feb 2001 Expires: 23 Aug 2001 An Open ECN Service in the IP layer 1 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 2 Abstract This document contributes to the effort to add explicit congestion notification (ECN) to IP. In the current effort to standardise ECN for TCP it is unavoidably necessary to standardise certain new aspects of IP. However, the IP aspects will not and cannot only be specific to TCP. We specify interaction with features of IP such as fragmentation, differentiated services, multicast forwarding, and a definition of the service offered to higher layer congestion control protocols. This document only concerns aspects related to the IP layer, but includes any aspects likely to be common to all higher layer protocols. Any specification of ECN support in higher layer protocols is expected to appear in a separate specification for each such protocol. Contents 1 Status of this Memo 1 2 Abstract 1 3 Introduction 3 draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 4 Conventions, definitions and acronyms 3 5 ECN router marking algorithms and differentiated services 4 5.1 Specification of marking behaviour . . . . . . . . . . . . . . 4 5.2 Equivalence between marking and drop behaviour . . . . . . . .5 5.3 Dependence on ECN-enabled routers . . . . . . . . . . . . . . .5 6 Forwarding of ECN for multicast 6 7 Anycast forwarding of ECN 8 8 ECN service to higher layer protocols 8 9 Host congestion control algorithms for ECN 9 10 Host requirements for ECN 9 11 ECN and fragmentation 10 12 Access to the ECN field 11 13 Security considerations 12 14 Further work 13 15 Conclusions 13 16 Acknowledgements 13 17 IANA considerations 15 18 Author contacts 15 19 Intellectual Property Claims 15 Bob Briscoe, Jon Crowcroft [Page 2] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 3 Introduction This document is intended to improve the specifications incorporating explicit congestion notification (ECN) into IP. It is intended to complement the existing Internet Draft on addition of ECN to TCP/IP [6] in order to hasten the proposal to the IETF standards track. We envisage the current document being absorbed into that I-D where agreement is reached on the issues discussed. Therefore, for brevity, we will not make this document stand alone; we hope the authors of that I-D will not be offended if we presume to write an addendum to the above I-D. We have tried to avoid conflicts with that I-D, wherever possible suggesting additions rather than changes. The present authors welcome all comments, which should be sent to the addresses given in section 18. Discussion is also welcome on the IETF Transport Area Working Group (tsvwg) mailing list [15]. In this document we only focus on issues with ECN at the IP layer (v4 & v6). In order to standardise ECN behaviour in TCP it is unavoidably necessary to standardise certain aspects in IP. However, the IP aspects will not and cannot only be specific to TCP. We believe the introduction of ECN into TCP/IP is best achieved in two documents, one on IP and the other on TCP. Therefore, in this document we solely discuss aspects of ECN that will be common to all protocols layered over IP. For the history and status of the endeavour to add ECN to the Internet, also refer to [6]. We share the desire of that work to ensure backwards compatibility, and offer this work with the aim of also ensuring forwards flexibility. In the remainder of this document we first define our terms. Then we focus on router behaviour, in particular differentiated queuing and multicast forwarding. Next we move the focus to host behaviour, particularly clarifying the ECN support that any congestion control protocol should be able to expect from the IP layer. We also give general requirements on such congestion control algorithms. Next we discuss fragmentation and re-assembly issues specific to IPv4. Finally we clarify access rights to ECN fields and discuss other security issues. 4 Conventions, definitions and acronyms The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [4]. Of course, readers should note that this is an Internet draft, and such keywords have no force unless the status of the document moves beyond draft. We use the tuple (ECT, CE) to represent the settings of the flags in the ECN field of the IP packet header. When set, [6] defines them to mean respectively ECN capable transport, and congestion experienced. These two flags in the ECN field can currently be treated separately. If, on the other hand, the two bit field is considered as four code points, currently only three have unanimously proposed uses. The fourth (ECT=0, CE=1) remains undefined, but with four speculative uses proposed from various quarters (we agree with one - see later). For clarity, where appropriate, the terms ECT and CE are used for the ECN flags (bits), while the succinct terms below will always be used in this document to refer to packets with the given code-points: Bob Briscoe, Jon Crowcroft [Page 3] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 o markable (ECT=1, CE=*); o unmarkable (ECT=0, CE=0). o marked (ECT=1, CE=1); o unmarked (ECT=1, CE=0); Note that markable traffic includes marked traffic but that unmarkable traffic does not include unmarked traffic. If required, the intention is to allow these definitions to include more code-points in the future without rewriting the whole document. For instance, in the future, both `markable' and `marked' might be redefined to include (ECT=0, CE=1). The terms marker, marking, pre-marking and re-marking are already defined concerning the setting of the diffserv code-point [2]. If it is ever not clear from the context whether we are discussing diffserv marking or ECN marking, we will use the terms congestion markable, congestion marked etc. 5 ECN router marking algorithms and differentiated services 5.1 Specification of marking behaviour The ECN specification for TCP/IP [6] expects the random early detection (RED) algorithm [7, 3] to be used to mark traffic that is markable. It also accepts that other active queue management mechanisms may be developed and used. For instance, a virtual queue has been suggested to trigger marking even before queuing starts [8]. In this proposal, as packets enter the real queue a reference to them is also placed in the virtual queue. But the virtual queue has a smaller buffer and is emptied at a slower rate than the real one. Whenever the virtual queue is in an overflow state, all packets leaving the real queue are marked. We believe that it is important for the marking behaviour of routers to be predictable for the hosts using them. As the art of active queue management evolves, it should not be possible for completely different marking behaviours to be invoked at each router along a path. We wish to point out that a framework for experimentation with and competition between queuing behaviours already exists: the differentiated services architecture [2]. The per hop behaviour (PHB) associated with each diffserv code point (DSCP) can already be specified. The guidelines on PHB specification in the diffserv architecture include the discard behaviour [2, Section 3]. In future, PHBs MUST also define the congestion marking behaviour{1} of markable traffic if they define the discard behaviour of unmarkable traffic. Where appropriate, of course, such a definition MAY simply state that markable traffic is treated as if it were unmarkable. The addition of a need to define marking behaviour UPDATES the guidelines in the diffserv architecture referred to above. In the absence of descriptions of discard and marking behaviour, the implementation will determine the default marking behaviour. Whether the definition of a PHB MUST be through standardisation or MAY be by local definition depends on which pool its code-point falls within [13, Section 6]. The same would obviously apply to the marking behaviour. Thus, the best effort (default) PHB might be standardised by specifying its congestion marking behaviour as the RED algorithm and by giving its parameters. Other PHBs might be offered by network operators each using a different algorithm to trigger congestion notification, such as a virtual queue. ________________________________ {1} Note that congestion marking behaviour is distinct from traffic contract policing behaviour. The former doesn't discriminate flows or customers, as distinct from the latter which identifies out of contract traffic on a per-customer basis at the network interface with that customer. Bob Briscoe, Jon Crowcroft [Page 4] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 5.2 Equivalence between marking and drop behaviour The ECN specification for TCP/IP [6] stipulates that a packet should only be congestion marked if it would have been dropped, were it unmarkable. It is even stipulated that this assumption should be embedded in implementations, by stating that the ECT flag should only be checked after the decision has been made to drop a packet. Exactly mimicking drop behaviour is motivated by the need to provide incentives for hosts to switch to ECN capability when competing with unmarkable flows. Indeed, [6] accepts that research into new criteria will be necessary for environments where all end-nodes are ECN-capable. It is perfectly possible that future end-to-end congestion control protocols may be developed in conjunction with new router behaviours. For such a new service treatment, the router might be required to drop markable packets under the same conditions as unmarkable packets. However, markable packets would have to be marked at a far lower level of utilisation. In these new protocols, hosts would then be required to react far less severely to a marked packet than to a dropped one. The incentive for sending markable packets into such a service discipline would be the extra feedback from the network, which would make applications of this service behave far more smoothly. Such a service would be valuable for applications that benefitted from rate stability. Another perfectly reasonable possibility is that the incentive to send markable packets into the network will be provided by a lower charge than for unmarkable packets. Such incentives are not appropriate for the best effort service which best serves its relatively elastic data applications by keeping queues relatively full. However, these incentives make sense for applications requiring the low latency of empty queues. Thus, there is clearly a need to ensure space for future experimentation. Each approach would have to define the standard point of equivalence between the behaviours for markable and unmarkable packets. Nonetheless, it is perfectly reasonable to restrict all protocols within a service treatment to the same standard. Otherwise, routers would have to examine the protocol field to determine the queuing behaviour. Therefore, the equivalence proposed in [6] is appropriate for all protocols using the best effort service. However, it would be unnecessary and probably incorrect to make such a sweeping restriction across every differentiated service. Indeed, it will often be meaningless to mimic the drop behaviour of a PHB that never existed before ECN. In fact, it is perfectly possible that some operators might deny unmarkable traffic access to certain service treatments in the future. To summarise, the point of equivalence between marking behaviour for markable packets and discard behaviour for unmarkable packets MUST be defined, but it MAY be different for each different service treatment. 5.3 Dependence on ECN-enabled routers If a differentiated service is offered that depends on its marking behaviour for optimal functioning, it must also depend on how many and which routers are ECN-enabled. There may be good reason why certain routers cannot be upgraded cost-effectively, or why a neighbouring domain may choose not to upgrade any routers to ECN-capability. Thus, statistics describing the distribution of ECN-enabled routers SHOULD be part of future service level agreements. Bob Briscoe, Jon Crowcroft [Page 5] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 6 Forwarding of ECN for multicast To the author's knowledge there are no known research papers let alone proposals in the IETF, specifically on multicast congestion control using ECN. Informal discussions in the research community have only recently started on this subject. A brief, provisional summary of the relevant state of the art from these discussions is given below. The issue with multicast and ECN solely concerns multicast duplication of the ECN field. Multicast active queue management will be no different to unicast - being dealt with at the egress interfaces of a multicast router, after multicast duplication and forwarding from the ingress. With loss-based multicast congestion control, there are two main arrangements for where the reaction to congestion occurs in a multicast group: o Single rate: The sender taking account of all receivers: Each receiver feeds back congestion levels to the sender, with suitable controls on implosion, then the sender alters the rate of the group taking all feedback into account. Such approaches tend to suffer from the loss path multi- plicity problem, finding more bottlenecks as the group size scales, and consequently causing the rate to `drop to zero' [1]. Hence the next approach is preferred over this; The sender choosing a representative receiver: Each receiver feeds back congestion levels to the sender, with suitable controls on implosion, then the sender nominates one receiver (typically the one that would run the slowest independent unicast session). This `acker' runs a tight rate control feedback loop with the sender [14]; o Multi-rate: Each receiver independently varies its:rateThe sender may arrange for data to be spread across multiple multicast groups with essential data in the `base' group, slightly less essential data in a second and so on (layering). Each receiver may then independently leave the least essential groups while remaining joined to the rest until the point where congestion on their leg is reduced to acceptable levels. This is termed receiver-driven layered multicast (RLM [11]); Currently, multicast duplication doesn't treat any fields in the header distinctively. It is often assumed that the ECN field should simply be duplicated in this way to every egress interface at a multicast router. However, there is concern that simple duplication would multiply the level of congestion seen by the session. This would result in as much congestion marking arriving at receivers as for multiple unicast flows. Where each receiver independently varies its rate (multi-rate), each misses out on the benefit it should derive from joining a multicast group. A multicast group should share the congestion it imposes on competing flows across its membership. Due to this concern, it has been informally proposed (by Kelly) that when a marked packet is duplicated, all but one randomly chosen copy at each router is reverted to unmarked. The random choice is made for each packet arrival. Unmarkable and unmarked packets are duplicated unchanged, of course. For brevity, we will term this proposal `randomly selected ECN'. The advantage of such an arrangement is that each congestion event is notified to a single receiver. Of course, implementation would be slightly more complex than simple duplication. When used in multi-rate schemes, randomly selected ECN tends to treat multicast fairly with respect to unicast. However, problems surface if it is used for single-rate schemes. In simple single rate schemes based Bob Briscoe, Jon Crowcroft [Page 6] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 on randomly selected ECN, if feedback to the sender is triggered on the arrival of each congestion mark, the scheme still suffers from the loss path multiplicity problem. Selection of a representative receiver is the current preferred way to solve this problem. However randomly selected ECN results in such a low rate of marking at any one receiver that it would be very slow to converge on a suitable choice of acker. The round trip time of each feedback message varies dramatically, but has a mean value of all the congested paths weighted by the congestion on each. Therefore, over time, more marks arrive at receivers closer to bottlenecks. But it takes a lot of time for a large group. It appears that, if a single rate is in use, simple duplication of ECN marking would be more useful, giving richer information to each receiver. Where rate control is co-ordinated by the sender (single rate), allowance can be made for duplication of the marking in the downstream direction during aggregation of congestion feedback in the upstream direction. It is `only' necessary for the level of aggregation to mimic the tree topology, whether exactly or approximately. Therefore it appears that the two types of congestion control scheme require different multicast duplication of the ECN field. Rather than require hosts to control multicast duplication, we propose a third 'hybrid ECN duplication' technique. In this hybrid scheme, when a marked packet is duplicated, all but one randomly chosen copy at each router is changed to be `potentially marked', denoted by the remaining unused code-point (ECT=0, CE=1). The random decision is made for each new packet. Unmarkable, unmarked and potentially marked packets themselves would all be duplicated unchanged. With this hybrid congestion notification, members of a group could extract the information they needed for either the single-rate or the multi-rate approaches. This would avoid having to add a signalling mechanism to request the network to choose one or the other approach, also saving having to secure the signalling. Implementation would be slightly more complex again, of course. If the hybrid scheme were used, we would have to re-define our definitions of terms in section 4, as follows: o markable (ECT=1, CE=*) or (ECT=0, CE=1); o unmarkable (ECT=0, CE=0). o marked (ECT=1, CE=1); o potentially marked (ECT=0, CE=1); o unmarked (ECT=1, CE=0) or (ECT=0, CE=1); Active queue managment would be as before, with markable packets being chosen to be marked. Of course, the implication is that potentially marked packets might be changed to marked packets (ECT=0, CE=1) -> (ECT=1, CE=1) if they hit congestion more than once. [6] suggests three alternative uses for the extra code-point we require for our hybrid ECN duplication scheme (ECT=0, CE=1): 1. Some other non-ECN-related function; 2. ECN-capable but for alternative semantics to the marked code-point (e.g. `slightly' marked). 3. The extra code-point could be given an identical meaning to the marked code-point so that the two could be alternated randomly throughout a flow depending on a nonce at the sender, allowing the sender to detect 50% of any changes along the path from marked packets to unmarked; Bob Briscoe, Jon Crowcroft [Page 7] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 At least for multicast, our proposal rules out categories 1 and 2, which we assume were speculative anyway. For instance, one would imagine that an alternate semantic (e.g. slightly congested) could be implied by a lower marking rate. It is likely that if either of these schemes was needed for unicast, it would also be needed for multicast. Our scheme is compatible with the nonce scheme (guessing the details which haven't been published yet). The motivation for the nonce scheme is primarily for the sender to detect receivers that under-report congestion feedback. This assumes large senders may wish to act as policers on behalf of the network (their incentive may not be a natural one). It only works with positive acknowledgements (acks) where the sender can compare the ECN field in the ack with that it sent. The sender accepts that the network destroys the nonce information when it marks a packet, so nacks would not be comparable. To avoid implosion, multicast feedback schemes never use acks. So only nacks are seen by the sender. Therefore a multicast sender might as well originate all packets as (ECT=1, CE=0). And fortunately, in our scheme, the network treats (ECT=0, CE=1) as effectively unmarked when it arrives at congestion downstream of previous congestion. A multicast sender could even arrange to use the nonce scheme in conjunction with our multicast duplication scheme. The motivation might be that some multicast congestion control schemes involve at least one receiver giving ack feedback (e.g. pgmcc [14]). The sender would guarantee an equal ratio of (ECT=1, CE=0) to (ECT=0, CE=1) over a moving window of n packets. It would affect say n/2 randomly selected packets with the nonce and insert padding into the remainder to balance the nonce packets. The multicast router would behave no differently from the description above. Each receiver could then detect the difference between the number of (ECT=1, CE=0) and (ECT=0, CE=1) packets over a moving window of n packets. This would be the level of `potentially marked' traffic. This level would be fairly slow to emerge, and noisy if there were losses too{2}. But it may be enough to decide to drop a layer (in for example RLM [11]) or select the 'slowest' receiver (in for example pgmcc) both of which have some hysteresis anyway. Until this last code point is defined, it is advisable for implementations of ECN on both hosts and routers to avoid optimisations that would make it difficult to treat the two bit ECN field as four code points. The uncertainty over multicast duplication of the ECN field need not hold up standardisation of other aspects of ECN in the IP layer. The default behaviour of all existing routers is to dumbly duplicate the ECN field along with the rest of the packet. Whatever the status of the rest of the ECN standardisation effort, simple duplication of the ECN field on multicast routers SHOULD be considered experimental. 7 Anycast forwarding of ECN Anycast forwarding of the ECN field is no different from unicast. 8 ECN service to higher layer protocols The IP service layer provides the following three ECN services to any upper layer protocol: _________________________________ {2} n would have to be large enough for there to be a high chance of more than two marks in any n packet window. The sender may have to adapt n and re-announce it depending on current feedback, which in itself is a potential security flaw if not done carefully. Bob Briscoe, Jon Crowcroft [Page 8] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 o The data sender MAY request that the packet is treated as markable by the IP layer. Nodes on an end-to-end path MAY honour such a request. If any node on the path cannot honour the request, it silently services the packet as if it were unmarkable. o The IP layer forwards a request to treat a packet as markable without alteration (with the exception of congestion control proxies - see section 12). o The IP layer only notifies receiving hosts of congestion experienced by each markable packet through the average marking rate apparent in a flow. The total congestion experienced is roughly the sum of con- gestion experienced at nodes along the packet's path. There is no guarantee that all or even any nodes on the path will be capable of contributing to this signal. The average marking rate is the ratio of marked packets to the total number of packets in a sample period. The meaning of a certain average marking rate and the sample period are defined by the marking behaviour in the service definition relevant to the packet's diffserv code-point{3}. The IP layer offers these three services to all higher layer protocols, whether or not they use them, simply to avoid having to inspect the protocol field of the IP header to establish whether the ECN service is appropriate. Thus, any higher layer protocol MUST be able to assume these services will be available to it, whatever protocol it is. Of course, this is aside from any access control to this service interface on the host, which may deny access to a capability of this interface dependent on the user running the higher layer protocol. 9 Host congestion control algorithms for ECN All new or updated congestion control protocols standardised through the IETF SHOULD state their applicability for markable as well as unmarkable packets. The ECN specification for TCP/IP [6] stipulates that the congestion control algorithm followed by an ECN-capable data receiver on receipt of a marked packet must be essentially the same as that following a dropped packet. As discussed in section 5.2, router and host algorithms are mutually dependent but need not be cast in stone. The point of equivalence between behaviour for markable and for unmarkable packets on a router will reflect that on a host. If markable traffic is marked at a router when unmarkable traffic is dropped, a mark should be treated like a drop at a host. If on the other hand the two types of traffic are both dropped in the same circumstances at the router, a drop for one should be treated like a drop for the other at the host. It was argued that this latter example seemed likely to be a useful one. A suitable wording for standardisation was given in that earlier section, which would allow room for experimentation across different service treatments. 10 Host requirements for ECN Where a host protocol layer does not implement congestion control (e.g. UDP), it SHOULD offer ECN services to higher layers that are equivalent to those defined in section 8 for the IP layer. Specifically, a sending protocol SHOULD honour requests to send markable datagrams; and a receiving protocol should ________________________________ {3} Of course, the host may operate a congestion control algorithm that tends to respond to the average marking rate without directly calculating it (e.g. [6]). Bob Briscoe, Jon Crowcroft [Page 9] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 allow higher layer protocols to determine whether received datagrams were markable and to determine whether each is marked. Note that a markable packet is generally a signal to the network to enable ECN behaviour. As new congestion control protocols are defined, it is possible this signal to the network will be overloaded as an end-to-end signal from the data sender to the data receiver to request ECN behaviour. Because multicast protocols generally have to support `late join', it is likely that data receivers may need to determine whether any datagram in a flow is markable. Due to these general requirements, a receiving application MUST be able to determine whether any arriving datagrams is markable. However, future ECN-based congestion control protocols MUST NOT use markable packets before ECN capability has been established. The only exception would be if the protocol were designed to ensure congestion control worked correctly even if such a marked packet arrived at a non-ECN-capable receiver. Until a use for the (ECT=0, CE=1) code point is defined, host implementations of ECN SHOULD be able to request and to pass on any of the four code-points of the ECN field, rather than just each flag (bit) separately. Whether hosts SHOULD or MUST implement an ECN version of each particular congestion control protocol (e.g. TCP) is not the concern of this document, which only covers aspects of ECN common to all protocols over IP. 11 ECN and fragmentation For IPv4, markable traffic MUST have the don't fragment (DF) flag set. Setting the DF flag and using path maximum transmission unit (MTU) discovery [12] is current best practice anyway [10]. Hence it is not a problem to mandate its use with a new feature of IP. This is not an issue for IPv6, where there is no DF flag because not fragmenting is the only supported behaviour. The rationale for not allowing fragmentation when ECN is enabled is to avoid complications on re-assembly of fragmented datagrams. Some fragments could be marked and others not, making it necessary to decide the marking of the re-assembled datagram before passing it to the congestion control protocol. To use the logical OR of the marking of all fragments might be a pragmatic solution, particularly for congestion control protocols like TCP where one loss per round trip is treated identically to many. However, it is becoming more common to see large numbers of packets per round trip time as data rates increase while packet sizes and the speed of light haven't increased for many years. Therefore it is to be expected that newer congestion control protocols might take more accurate account of the number of packets marked in a round trip. Hence, the inaccuracy of a logical OR during re-assembly at the IP layer is best avoided. A logical OR would also confound the accuracy of congestion avoidance charging [9], if it were shown to be necessary. If an IPv4 packet contains a markable code-point but does not have the DF flag set (an illegal combination), it SHOULD be silently forwarded unless fragmentation is required. If fragmentation is required, an ECN capable router MUST discard it and return an ICMP Destination Unreachable error to the data sender. It MAY contain a code meaning "fragmentation needed and DF set". Alternatively it SHOULD contain a new ICMP code meaning "fragmentation needed but markable code-point used". If such an illegal datagram reaches the data receiver in fragments (perhaps due to a non-ECN-capable router or due to a bug in a node Bob Briscoe, Jon Crowcroft [Page 10] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 on the path), the receiver MUST discard the datagram and return a similar ICMP message to the data sender, as this may imply an unknown upstream problem. If such an illegal IPv4 datagram arrives at the data receiver intact, there is no need to take corrective action. The datagram should be silently handled in the normal fashion. 12 Access to the ECN field This section clarifies exactly what types of node are expected to read or write the flags in the ECN field. In [6] it has been proposed or implied that: o the ECT flag SHOULD be set by the data sender if it has been established that all ends have ECN capability; o routers MAY read ECT (we cannot say MUST, because not all routers will be ECN-capable) but MUST NOT alter it; o whether the data receiver should read ECT once a session is in progress depends on the transport protocol in use{4}. Also, it has been proposed or implied that: o the CE flag SHOULD be clear when it leaves the data sender (excepting for random security checks); o routers MAY set CE but MUST NOT clear it; o the receiving host MAY read CE (it may not be ECN-capable), but certainly MUST NOT alter it. o if the receiving host has enabled an ECN-capable session, it MUST read CE during that session; This is, of course, quite apart from the discussions on what each node could do, if it chose to misbehave. We wish to differ on the implied rules concerning what an intermediate node might be allowed to do to these flags. Our goal is to allow future flexibility where there is no reason not to. The rules on changing the ECT flag at an intermediate point have not been explicitly stated, except in the context of tunnels, which we will discuss presently. Therefore, we will now propose rules for changing the ECN capability of a packet at intermediate nodes, in the most general form we can. An unmarkable code-point MUST NOT be changed to a markable one by an intermediate node unless that node is able to control congestion on behalf of the data sender in response to ECN signalling and it has established that a downstream node has an ECN-capable transport (sender congestion control proxy). Changing a markable code-point to unmarkable turns on drop behaviour in downstream routers. This capability may be used by a policer to 'punish' packets outside a contracted or reserved profile. Such packets ________________________________ {4} in TCP the data receiver MUST still read ECT once a session is in progress (even though ECN capability has been negotiated for the session, some acks will be for re-transmitted packets), and in other transport protocols a data receiver MAY be required to read ECT to determine the ECN capability of the session at any point in a session (e.g. to cater for late joins). Bob Briscoe, Jon Crowcroft [Page 11] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 would no longer be protected by ECN capability, so would be dropped while other packets within profile would merely be marked. Changing a markable code-point to unmarkable would not generally disable ECN at the data receiver, as it is expected that the markable code-point depends on ECN capability, not the other way round. However, the markable code-point MAY be used as part of the negotiation of ECN capability between data sender and receiver in future congestion control protocols. If the network happened to change a packet being used for this negotiation from markable to unmarkable, this might result in ECN being disabled for a whole session. An intermediate node MUST NOT change all packets with a markable code-point to unmarkable unless it is either able to handle ECN signalling on behalf of the data receiver (receiver congestion control proxy) or has arranged to reinstate the markable code-point with a node further downstream (effectively a limited functionality tunnel). Congestion control proxies may help with the introduction of ECN into the core of the network, even where hosts are not ECN capable. A proxy to transform an intserv reservation at one or many ends of a flow into ECN behaviour in the core has been proposed in [5]. If appropriate, such proxies SHOULD ensure account is taken of the reduction in path length they have introduced. To recap the position stated in [6] concerning the ECT flag and tunnels, a markable code-point MUST only not be copied to the active outermost header of a packet at tunnel ingress if it has also been arranged to reinstate it at tunnel egress. If the full-functionality tunnel behaviour is the considered normal, this constraint on limited functionality tunnels is effectively a specific case of the above rule concerning changing markable to unmarkable. 13 Security considerations Authentication of the ECN field depends on whether it is treated as two flags or four code points. This further depends on whether the last undefined code-point (ECT=0, CE=1) is defined to relate to marking capability or to marking itself. Therefore authentication will not be discussed in this draft until the fate of this last code-point is clearer. Firewalls SHOULD NOT discard packets simply because the ECN field has a non-zero value. In the past, while the currently unused (CU) field of the diffserv field (which phrase includes its previous uses) was truly unused, some firewalls treated any non-zero values as suspicious and discarded such packets. Note that the requirement in [6] for ECN to be backward compatible is not met for `simple tunnels'. This is because tunnel end-points MUST implement either the limited or the full functionality options, neither of which is the case with a simple tunnel. Security is also the main subject of section 12. It is also discussed in subsection 5.2 and section 6 where fairness and incentives to use congestion avoidance are considered. Bob Briscoe, Jon Crowcroft [Page 12] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 14 Further work Considerable further research is required to establish the need for the `potentially markable' ECN code-point for multicast duplication. Once the fate of the fourth code-point is decided, authentication can be finalised. Feedback is particularly requested concerning the relative merits of a new ICMP destination unreachable code (section 11), rather than overloading an old one. The argument for taking the approach adopted is that the purpose of an error message should be to identify the error, not identify that one of two errors has occurred. It is assumed that legacy host implementations will report an ICMP error code that is unknown to them opaquely, but such an assumption may be dangerous. The approach to fragmentation in section 11 effectively gives IPv4 another set of code points for markable datagrams with DF=0, as long as path MTU discovery has been done. However, the extra space is fairly useless, as the DF flag should remain set during a session to allow discovery to detect changes to the path MTU involving non-ECN capable routers. The extra IPv4 code-point will be slightly more useful as a greater proportion of Internet routers become ECN-capable. No such extra code-point is possible with IPv6. Many of the proposals in this document have not undergone a full security analysis to check for new denial of service threats, etc. 15 Conclusions This document includes the necessary words to ensure that interactions with more aspects of the IP layer have been specified than in previous Internet drafts. It is believed that every aspect of this document is additive to [6]. The ability to define new marking behaviours and new host behaviours has been added using the diffserv architecture. This has been achieved without affecting the behaviours already defined for TCP. Similarly, a forward looking approach to fragmentation has been defined. A stake has been placed in the ground warning that multicast duplication of ECN may not be as straight-forward as some believed, and allowing room for experimentation. Finally, requirements have been set to ensure that all new standardisation work will promote the use of ECN in preference to loss as a congestion signalling mechanism. 16 Acknowledgements Arnaud Jacquet (BT), Sally Floyd (ACIRI), David Black (EMC) and Martin Karsten (TU Darmstadt) each for their help and constructive review comments. Bob Briscoe, Jon Crowcroft [Page 13] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 References [1]Supratik Bhattacharyya, Don Towsley, and Jim Kurose. The loss path multiplicity problem in multicast congestion control. In Proc. IEEE Conference on Computer Communications (Infocom'99), http: //www.ieee-infocom.org/1999/papers/06c_04.pdf, March 1999. [2]S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. An architecture for differentiated services. Request for comments 2475, Internet Engineering Task Force, http://www.ietf.org/rfc/ rfc2475.txt, December 1998. [3]B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L. Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, and L. Zhang. Recommendations on queue management and congestion avoidance in the internet. Request for comments 2309, Internet Engineering Task Force, http://www.ietf.org/rfc/rfc2309.txt, April 1998. [4]Scott Bradner. Key words for use in RFCs to indicate requirement levels. BCP 14, Internet Engineering Task Force, http://www.ietf.org/rfc/rfc2119.txt, March 1997. (RFC 2119). [5]Ragnar Andreassen (Ed.). M3I; Requirements specifications; reference model. Deliverable 1, M3I Eu Vth Framework Project IST-1999-11429, http://www.m3i.org/, July 2000. [6]K. K. Ramakrishnan and Sally Floyd and David Black. The addition of explicit congestion notification (ECN) to IP. Internet draft, Internet Engineering Task Force, http://www.ietf.org/internet-drafts/ draft-ietf-tsvwg-ecn-01.txt, January 2001. (Work in progress) (expires Jul 2001). [7]Sally Floyd and Van Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on Networking, 1(4):397-413, August 1993. [8]Richard J. Gibbens and Frank P. Kelly. Resource pricing and the evolution of congestion control. Automatica, 35, 1999. [9]Frank P. Kelly, Aman K. Maulloo, and David K. H. Tan. Rate control for communication networks: shadow prices, proportional fairness and stability. Journal of the Operational Research Society, 49, 1998. [10]C. Kent and J. Mogul. Fragmentation considered harmful. In Proc. SIGCOMM '87 Workshop on Frontiers in Computer Communications Technology, August 1987. [11]Steven McCanne, Van Jacobson, and Martin Vetterli. Receiver-driven layered multicast. Proc. ACM SIGCOMM'96, Computer Communication Review, 26(4), October 1996. [12]J. Mogul and S. Deering. Path MTU discovery. Request for comments 1191, Internet Engineering Task Force, http://www.ietf.org/rfc/rfc1191.txt, November 1990. [13]K. Nichols, S. Blake, F.Baker, and D. Black. Definition of the differentiated services field (DS field) in the IPv4 and IPv6 headers. Request for comments 2474, Internet Engineering Task Force, http: //www.ietf.org/rfc/rfc2474.txt, December 1998. [14]Luigi Rizzo. pgmcc: A TCP-friendly single-rate multicast congestion control scheme. ACM SIGCOMM Computer Communication Review, 30(4):17-28, September 2000. [15]IETF secretariat. Transport area working group (tsvwg). Working group charter, Internet Engineering Task Force, http://www.ietf.cnri.reston.va.us/html.charters/tsvwg-charter.html, Continuously updated. Bob Briscoe, Jon Crowcroft [Page 14] draft-ietf-tsvwg-ecn-ip-00.txt Open ECN Service in IP layer 23 Feb 2001 17 IANA considerations It has been proposed that the two bit ECN field should be treated as four code-points. The code-point (ECT=0, CE=1) should be treated as experimental. A specific use has been proposed, and other competing proposals listed. The other three code-points are already defined in [6]. A new ICMP Destination unreachable error code meaning "fragmentation needed but markable code-point used" is required by this document. Decimal 16 looks like the next code available, but it won't be officially applied for until this draft has been discussed. 18 Author contacts Bob Briscoe Email: bob.briscoe@bt.com BT Research B54/130, Adastral Park Martlesham Heath Ipswich IP5 3RE UK Home page: http://www.labs.bt.com/people/briscorj/ Jon Crowcroft Email: j.crowcroft@cs.ucl.ac.uk Department of Computer Science University College London Gower Street London WC1E 6BT UK 19 Intellectual Property Claims None Bob Briscoe, Jon Crowcroft [Page 15]