idnits 2.17.1 

draft-babiarz-pcn-3sm-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 18.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1306.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1317.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1324.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1330.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 19, 2007) is 6002 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'Maglaris-88' is defined on line 972, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-02) exists of
     draft-babiarz-pcn-explicit-marking-01

  == Outdated reference: A later version (-11) exists of
     draft-ietf-pcn-architecture-01

  == Outdated reference: A later version (-03) exists of
     draft-menth-pcn-performance-00


     Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Babiarz
3	Internet-Draft                                                  X-G. Liu
4	Intended status: Informational                                   K. Chan
5	Expires: May 22, 2008                                             Nortel
6	                                                                M. Menth
7	                                                 University of Wuerzburg
8	                                                       November 19, 2007

10	                        Three State PCN Marking
11	                        draft-babiarz-pcn-3sm-01

13	Status of this Memo

15	   By submitting this Internet-Draft, each author represents that any
16	   applicable patent or other IPR claims of which he or she is aware
17	   have been or will be disclosed, and any of which he or she becomes
18	   aware will be disclosed, in accordance with Section 6 of BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as Internet-
23	   Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   The list of current Internet-Drafts can be accessed at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	   This Internet-Draft will expire on May 22, 2008.

38	Copyright Notice

40	   Copyright (C) The IETF Trust (2007).

42	Abstract

44	   This document proposes a mechanism for admission control and flow
45	   termination.  It is based on the concept of pre-congestion
46	   notification (PCN) using three different codepoints: "no pre-
47	   congestion", "admission-stop", and "excess-traffic" for packet
48	   marking.  Therefore, the proposal is called three state marking
49	   (3sm).  The behaviour of edge nodes is presented which distinguishes
50	   from other proposals through little complexity and its ability to
51	   cope with multipath routing.  Algorithms required for packet metering
52	   and marking are explained in detail.

54	Table of Contents

56	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
57	     1.1.  Requirements Notation  . . . . . . . . . . . . . . . . . .  6
58	     1.2.  Terminology Used in this Document  . . . . . . . . . . . .  6
59	     1.3.  Adapted Terminology  . . . . . . . . . . . . . . . . . . .  6
60	     1.4.  New Terminology  . . . . . . . . . . . . . . . . . . . . .  7
61	   2.  The 3sm Proposal . . . . . . . . . . . . . . . . . . . . . . .  8
62	     2.1.  Packet Marking in 3sm  . . . . . . . . . . . . . . . . . .  8
63	     2.2.  Admission Control in 3sm . . . . . . . . . . . . . . . . .  9
64	       2.2.1.  Explicit Edge-to-Edge Tunnels (E3Tunnel) . . . . . . .  9
65	     2.3.  End-to-End on-Path Signalling (End2PS) . . . . . . . . . . 11
66	       2.3.1.  Operation of Standard RSVP . . . . . . . . . . . . . . 11
67	       2.3.2.  Modification of Standard RSVP to Perform PCN-Based
68	               Admission Control  . . . . . . . . . . . . . . . . . . 12
69	     2.4.  Edge-to-Edge on-Path Signalling (Edge2PS)  . . . . . . . . 12
70	     2.5.  Flow Termination in 3sm  . . . . . . . . . . . . . . . . . 13
71	       2.5.1.  Marked Flow Termination (MFT)  . . . . . . . . . . . . 13
72	       2.5.2.  Measured Rate Termination (MRT)  . . . . . . . . . . . 14
73	   3.  Three State PCN Marker with Marking Frequency Reduction
74	       (MFR) for Marked Flow Termination (MFT)  . . . . . . . . . . . 14
75	     3.1.  ET-Marker  . . . . . . . . . . . . . . . . . . . . . . . . 15
76	       3.1.1.  Behaviour of SR-Metering and ET-Marking  . . . . . . . 15
77	       3.1.2.  Pseudo Code for the ET-Marker  . . . . . . . . . . . . 16
78	       3.1.3.  Configuration of the ET-Marker . . . . . . . . . . . . 16
79	       3.1.4.  Characteristics of the Proposed ET-Marker  . . . . . . 17
80	     3.2.  AS-Marker  . . . . . . . . . . . . . . . . . . . . . . . . 17
81	       3.2.1.  Behaviour of AR-Metering and AS-Marking  . . . . . . . 18
82	       3.2.2.  Pseudo Code for AS-Marker  . . . . . . . . . . . . . . 18
83	       3.2.3.  Configuration of the AS-Marker . . . . . . . . . . . . 18
84	       3.2.4.  Characteristics of the Proposed AS-Marker  . . . . . . 19
85	     3.3.  Marking Codepoints . . . . . . . . . . . . . . . . . . . . 19
86	   4.  Benefits and Shortcomings of the 3sm Proposal  . . . . . . . . 19
87	     4.1.  Benefits . . . . . . . . . . . . . . . . . . . . . . . . . 20
88	     4.2.  Shortcomings of 3sm  . . . . . . . . . . . . . . . . . . . 21
89	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
90	   6.  Changes from Previous Revision . . . . . . . . . . . . . . . . 21
91	   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
92	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 23
93	   Appendix A.  Overview of Token Bucket (TB) and Virtual Queue
94	                (VQ)  . . . . . . . . . . . . . . . . . . . . . . . . 24
95	     A.1.  New features for marking algorithm . . . . . . . . . . . . 24
96	       A.1.1.  Virtual Queue (VQ) . . . . . . . . . . . . . . . . . . 24
97	       A.1.2.  Token Bucket (TB)  . . . . . . . . . . . . . . . . . . 25
98	       A.1.3.  Tail Marking . . . . . . . . . . . . . . . . . . . . . 25
99	       A.1.4.  Tail Marking with Marking Frequency Reduction (MFR)  . 26
100	       A.1.5.  Tail Marking with Packet Size Independent Marking
101	               (PSIM) and Proportional Marking Frequency
102	               Reduction (PMFR) . . . . . . . . . . . . . . . . . . . 27
103	       A.1.6.  Threshold Marking  . . . . . . . . . . . . . . . . . . 28
104	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29
105	   Intellectual Property and Copyright Statements . . . . . . . . . . 31

107	1.  Introduction

109	   Pre-Congestion Notification (PCN) builds on the concepts of
110	   [RFC3168], "The addition of Explicit Congestion Notification (ECN) to
111	   IP".  It is used to implement admission control and flow termination
112	   for real-time flows (such as voice, video and multimedia streaming)
113	   in DiffServ [RFC2474], [RFC2475] enabled networks.  Flow admission
114	   control determines whether a new flow can be added into the network
115	   without overloading any of its links, whereas flow termination
116	   reduces the current PCN traffic load by terminating marked flows when
117	   at least one link in the network is overloaded for some reason.  For
118	   a general overview, the reader is referred to
119	   [I-D.ietf-pcn-architecture].

121	   This document describes the 3sm proposal which is a special
122	   implementation of the general PCN framework
123	   [I-D.ietf-pcn-architecture].  It relies on three different packet
124	   markings: "no pre-congestion" (NP), "admission-stop" (AS), or
125	   "excess-traffic" (ET).  Packets enter a network with NP (unmarked).
126	   The basic idea is as follows.  Each link in a PCN domain has an
127	   admissible rate (AR).  If the current PCN traffic rate of a link
128	   exceeds its AR, all PCN packets on that link are re-marked with AS.
129	   The PCN egress nodes monitor the packet markings and trigger the PCN
130	   ingress nodes to admit or reject new flow requests depending on
131	   whether they observe AS-marked packets or not.  Similarly, each link
132	   has a supportable rate (SR).  If the current PCN traffic rate of a
133	   link exceeds its SR, some of the PCN packets on that link are re-
134	   marked with ET.  The PCN egress nodes detect ET-marked packets and
135	   pass their flow IDs to the appropriate flow termination entity.  This
136	   concept is called marked flow termination (MFT).

138	   The 3sm architecture expects the above mentioned marking behaviour
139	   from PCN interior nodes.  We propose simple metering and marking
140	   algorithms for that purpose.  Those are threshold marking for AS-
141	   marking and tail marking with marking frequency reduction (MFR) for
142	   ET-marking.  We describe them in Section 3 based on a token bucket
143	   approach.  To improve the termination behaviour of MFT, we suggest
144	   packet size independent marking (PSIM) and proportional MFR (PMFR) in
145	   the Appendix A and show that the same behaviour can be specified
146	   based on a virtual queue.

148	   The document is structured as follows.  After a short section on
149	   terminology, Section 2 focuses on the edge behaviour of the 3sm
150	   proposal.  Section 3 provides detailed algorithms for the metering
151	   and marking behaviour expected from interior nodes in 3sm.  Section 4
152	   list benefits and shortcomings of the 3sm proposal.  Appendix A
153	   summarizes background information about token bucket (TB) and virtual
154	   queue (VQ) metering and marking as they are the base for the proposed
155	   metering and marking mechanisms.

157	1.1.  Requirements Notation

159	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
160	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
161	   document are to be interpreted as described in [RFC2119].

163	1.2.  Terminology Used in this Document

165	   The terminology used in this document conforms to the terminology of
166	   [I-D.ietf-pcn-architecture].  However, we adapted some terminology
167	   for better readability and added some new terminology that we found
168	   useful in this document.

170	1.3.  Adapted Terminology

172	   [I-D.ietf-pcn-architecture] has chosen very general terms for the
173	   rate thresholds as they have different semantic meanings in different
174	   proposals for the implementation of the general PCN architecture.
175	   For better readability, we use names that are more intuitive within
176	   the 3sm proposal.

178	   o  Admissible rate (AR) - PCN-lower-rate

180	   o  Supportable rate (SR) - PCN-upper-rate

182	   o  AR-overload - Condition that PCN traffic rate exceeds AR of a link

184	   o  SR-overload - Condition that PCN traffic rate exceeds SR of a link

186	   o  No pre-congestion (NP) - Default marking for PCN packets that have
187	      not been carried over links with any type of pre-congestion (AR-
188	      overload or SR-overload).

190	   o  Admission-stop (AS) - PCN-lower-rate-marking

192	   o  Excess-traffic (ET) - PCN-upper-rate-marking

194	   o  AS-marking - Action of re-marking of NP-marked packets to AS

196	   o  ET-marking - Action of re-marking of NP- or AS-marked packets to
197	      ET

199	1.4.  New Terminology

201	   We provide a brief definition of the terminology used in this
202	   document.

204	   o  PCN - Pre-Congestion Notification meters traffic rates per service
205	      class on a link and notifies the PCN egress nodes using packet
206	      marking whether a certain rate threshold is exceeded on any link
207	      of the path through the PCN-enabled network taken by the packet.
208	      The rate thresholds may be significantly lower than the line rates
209	      such that PCN egress nodes are notified long before queues build
210	      up in the buffers and real congestion occurs.  PCN is intended for
211	      the implementation of measurement-based admission control and flow
212	      termination for real-time inelastic traffic, e.g., voice.  The PCN
213	      marking in the packet headers need to be standardized.

215	   o  ECN Field - Refers to the use of the standardized two bit field in
216	      the IP header that is used for signalling Explicit Congestion
217	      Notification [RFC3168].  In the PCN framework the ECN field maybe
218	      reused to signal two levels of PCN marking.

220	   o  Service class - By service class we mean a grouping of packets
221	      belonging to one or more applications or services that generated
222	      traffic with similar characteristics and requiring similar QoS
223	      treatment.  See [RFC4594] for details.

225	   o  Admission Control - It is the function of admitting or blocking
226	      requests of new flows or sessions for access to the network to
227	      prevent AR-overload.

229	   o  Flow Termination - It is the function of terminating already
230	      admitted flows in the sense that they cannot continue to send PCN
231	      traffic.  Only flows contributing to SR-overload are terminated to
232	      reduce SR-overload.

234	   o  Deployment model - A method to find the PCN information for the
235	      path of a new flow that requests admission to the PCN domain and
236	      to perform the required admission decision.  Deployment models
237	      take advantage of information or protocols available in the
238	      networking scenario where PCN is to be deployed.

240	   o  Marked flow termination (MFT) - Flow termination function
241	      terminating marked flows.

243	   o  Measured rate termination (MRT) - Flow termination function
244	      terminating a certain rate of traffic that was directly or
245	      indirectly measured by the system.

247	   o  Marking frequency reduction (MFR) - Marked flow termination (MFT)
248	      requires that only some instead of all PCN packets exceeding the
249	      supportable rate (SR) of a link are marked.  This is achieved by
250	      MFR.  This can be achieved by adding a slowdown factor of "s"
251	      tokens to the fill state of the token bucket whenever a packet is
252	      ET-marked.

254	   o  Token bucket (TB) - One base mechanism for packet metering.

256	   o  Virtual queue (VQ) - Another, equivalent base mechanism for packet
257	      marking.

259	2.  The 3sm Proposal

261	   First, a high-level overview of the packet marking in 3sm is
262	   presented.  Then, the edge behaviour for the admission control and
263	   flow termination function is explained.

265	2.1.  Packet Marking in 3sm

267	   PCN traffic can be classified by DSCP or a group of DSCPs and is
268	   forwarded by an appropriated PHB.  PCN configures for each link an
269	   admissible and a supportable rate (AR, SR).  PCN traffic enters the
270	   network with a "no-precongestion" (NP) mark.  PCN nodes meter the PCN
271	   traffic on every link.  When the PCN traffic rate on a link exceeds
272	   the corresponding AR, the PCN node re-marks all NP-marked PCN packets
273	   to "admission-stop" (AS).

275	   Similarly, they re-mark some non-ET-marked PCN packets to "excess-
276	   traffic" (ET) when the PCN traffic rate on a link exceeds the
277	   corresponding SR.  Figure 1 summarizes the relation between the AR
278	   and SR thresholds and the marking behaviour.  The SR normally is at
279	   least a delta above the AR and a delta below the maximum service rate
280	   for PCN traffic for the sake of stability of the measurement-based
281	   reactive system.  If the PCN traffic rate is below AR, no packets are
282	   re-marked; if it is between AR and SR, all NP-marked PCN packets are
283	   re-marked to AS; and if it is above SR, some non-ET-marked PCN
284	   packets are re-marked to ET and all other NP-marked PCN packets are
285	   re-marked to AS.  Hence, the meters and markers operate in a marking-
286	   aware mode: NP-marked packets can be re-marked to AS or ET, AS-marked
287	   packets can be re-marked to ET but not to NP, and ET-marked packets
288	   cannot be re-marked at all.

290	              PCN traffic rate
291	                100%^
292	                    |   AR- and SR-overload:
293	                    |   re-mark SOME non-ET-marked
294	                    |   packets to ET and the remaining to AS,
295	                    |   indicating that AR and SR are exceeded
296	    Supportable rate|----------------------------------------------
297	          SR        |   AR-overload:
298	                    |   re-mark ALL NP-marked packets to AS,
299	                    |   indicating admissible rate is exceeded
300	    Admissible rate |----------------------------------------------
301	          AR        |
302	                    |   No overload: do not re-mark any packets
303	                    |
304	                  0%+------------------------------------------------->

306	                 Figure 1: Packet re-marking by PCN nodes.

308	2.2.  Admission Control in 3sm

310	   The admission of a flow requests depends on the marking that is
311	   currently observed on the path the data packet of the future flow
312	   will take.  However, it is not trivial to provide this information
313	   and may be achieved differently depending on the networking scenario.
314	   Therefore, we introduce three deployment models that use their own
315	   methods to get the PCN information for the path of a future data
316	   flow.  These deployment models are adapted to special networking
317	   scenarios that are in use today and are named after the concept that
318	   helps to find the path of future data packets:

320	   o  explicit edge-to-edge tunnels (E2Tunnel)

322	   o  end-to-end on-path signalling (End2PS)

324	   o  edge-to-edge on-path signalling (Edge2PS)

326	2.2.1.  Explicit Edge-to-Edge Tunnels (E3Tunnel)

328	   Explicit edge-to-edge tunnels provide an IP adjacency from a PCN
329	   ingress node to a PCN egress node.  As a consequence, the information
330	   about the PCN egress node of a packet can be derived from the routing
331	   table.  In addition, we assume that each tunnel is set up on an
332	   explicit path.  This can be realized, e.g. by MPLS label switched
333	   paths (LSPs) or IP-in-IP tunnelling.  We use the name E3Tunnel to
334	   abbreviate the term "explicit edge-to-edge tunnel" and as the name
335	   for the corresponding deployment model.  An E3Tunnel aggregate is the
336	   ensemble of PCN traffic carries through the PCN domain through a
337	   common explicit E3Tunnel.

339	   In the presence of E3Tunnels, the tunnel a packet is forwarded over
340	   is derived from the forwarding table of the PCN ingress node.  This
341	   information is required that a new flow can request admission to the
342	   appropriate E3Tunnel.  The PCN ingress and egress nodes have a
343	   context for each E3Tunnel they support.  The PCN ingress node works
344	   per context in two different modes:

346	   o  the reject mode and

348	   o  the accept mode.

350	   In the reject mode, the PCN ingress node rejects new admission
351	   requests while it admits them in its accept mode.

353	   Similarly, the PCN egress node works per context in two corresponding
354	   modes:

356	   o  the admission-stop mode and

358	   o  the admission-continue mode.

360	   The PCN egress node monitors the markings of the packets received
361	   from a specific E3Tunnel and depending on their marking, it switches
362	   to the admission-stop or admission-continue mode.  Depending on its
363	   mode, the PCN egress node periodically sends "admission-stop" or
364	   "admission-continue" messages to the PCN ingress node belonging to
365	   the same E3Tunnel.  If the PCN egress node receives an "admission-
366	   stop" message, it switches within the corresponding E3Tunnel context
367	   to its reject mode and if it receives an "admission-continue"
368	   message, it switches to its accept mode.

370	   Different algorithms can be applied by the PCN egress node to decide
371	   when to send which control messages.  We give two examples.

373	   o  Option 1 (single-packet-based): If the PCN egress node observes a
374	      single marked packet from a specific E3Tunnel, it turns to the
375	      admission-stop mode.  It returns to the admission-continue mode
376	      after some time unless it still observes marked packets
377	      occasionally.  This option can be implemented without rate
378	      measurement and even without any counters requiring per-packet
379	      modification.

381	   o  Option 2 (CLE-based): The PCN egress node tracks the number of
382	      marked and umarked packets from a specific E3Tunnel within a
383	      measurement interval.  It calculates the congestion level estimate
384	      (CLE) which is the fraction of the marked and all packets observed
385	      during this interval.  If no packet was received within the last
386	      measurement interval, the PCN egress node switches to the
387	      admission-continue mode.  If the PCN egress node is in the
388	      admission-continue mode and the CLE of the last measurement
389	      interval is larger than a predefined admission-stop threshold, it
390	      switches to the admission-stop mode.  If the PCN egress node is in
391	      the admission-stop mode and the CLE is smaller than a predefined
392	      admission-continue threshold, it switches to the admission-
393	      continue mode.

395	2.3.  End-to-End on-Path Signalling (End2PS)

397	   End-to-end on-path signalling sends PATH messages downstream to
398	   discover the path of future data packets and RESV messages upstream
399	   to trigger the admission request for the correct forward path using
400	   the gathered PATH information.  The deployment model End2PS reuses
401	   the end-to-end on-path signalling protocol for probing on which the
402	   admission decision is based.  We first explain the basic operation of
403	   standard RSVP and then adapt it to perform PCN-based admission
404	   control.

406	2.3.1.  Operation of Standard RSVP

408	   A popular protocol example is RSVP [RFC2205].  With RSVP, the data
409	   source issues a PATH message which is carried hop-by-hop over the
410	   same path future data packets will go.  To that end, the PATH message
411	   uses the same source and destination address as future data packets
412	   and also all other header fields that are possible input for routing
413	   and load balancing decisions need to be the same.  When a PATH
414	   message arrives at an RSVP-capable node, a PATH state is established
415	   pointing to the previous hop before the PATH message is forwarded
416	   further downstream.  When the PATH message arrives at the
417	   destination, the destination triggers the end-to-end reservation for
418	   the flow by sending a RESV message upstream along the nodes that set
419	   up a PATH state.  In these nodes, the RESV message is processed.  In
420	   particular, resource admission control is performed for the new flow
421	   request and if it succeeds, the node forwards the RESV message to the
422	   previous hop recorded by the PATH state.  This two pass signalling
423	   approach guarantees that the reservation is done on the downstream
424	   path of the future data flow.  In contrast to PATH messages, RESV
425	   messages have the source address of the sending node and the
426	   destination address of the hop pointed to by the PATH state.  That
427	   way, the information about the downstream next hop of the future data
428	   stream is conveyed to the previous hop and the flow-related
429	   information is stored in a RESV state.  RSVP is a soft-state
430	   protocol, i.e., the PATH and RESV control messages are periodically
431	   sent to keep the PATH and RESV states alive and, thereby, the flow
432	   reservations.  Admission control needs to be performed for a flow
433	   only once when no RESV state is set up, yet.

435	2.3.2.  Modification of Standard RSVP to Perform PCN-Based Admission
436	        Control

438	   We assume that interior nodes of a PCN domain are RSVP-disabled.
439	   That means that they just forward RSVP messages without processing
440	   them and PCN ingress and egress nodes are neighboring RSVP-capable
441	   nodes.  As a consequence, PCN ingress nodes decide whether new flows
442	   can be admitted and carried through domain or not.

444	   When the initial PATH message travels downstream, it is either marked
445	   or not, and eventually received by the PCN egress node.  If no PATH
446	   state can be found for this flow at the PCN egress node, this PATH
447	   message is the first one and not a REFRESH message.  If the PATH
448	   message is the first of the flow and if it is marked with "admission-
449	   stop", the RSVP engine sends back a PATHERR message to reject the
450	   flow.  If the PATH message is not marked (NP), the RSVP PATH state is
451	   established at the PCN egress node and the PATH message is forwarded
452	   further downstream.  REFRESH messages are just forwarded according to
453	   standard RSVP.  When the PATH message arrives at the destination and
454	   a RESV message is sent.  Eventually, the corresponding RESV message
455	   arrives at the PCN ingress node.  When no RESV state is set up yet,
456	   this is the first RESV message and admission control must be
457	   performed.  By the mere fact that the RESV message arrives, the PCN
458	   ingress node knows that the corresponding initial PATH message was
459	   not marked.  Thus, it can admit any flow for which a new RESV message
460	   arrives.

462	   A single probe is sufficient in 3sm because 3sm marks all packets
463	   when AR-overload occurs.  Note that RSVP is only an example for
464	   End2PS, but End2PS works equally well with other two pass on-path
465	   signalling protocols.

467	2.4.  Edge-to-Edge on-Path Signalling (Edge2PS)

469	   In the absence of explicit edge-to-edge tunnels or other on-path
470	   signalling protocols, PCN information about the path future packets
471	   of a new flow will take is required for a qualified admission
472	   decision at the PCN ingress node.  To that end, we propose to use an
473	   edge-to-edge on-path signalling protocol (Edge2PS).

475	   Again, we assume that all interior nodes of the PCN domain are RSVP-
476	   disabled.  When a request for the admission of a new flow arrives,
477	   e.g. signalled via SIP INVITE or other means, the PCN ingress and
478	   egress node act as RSVP proxies for the source and destination node
479	   of the data flow and set up a reservation request between PCN ingress
480	   and egress node.  The source and destination address of the PATH
481	   messages are those of the actual data flow.  This guarantees that
482	   they are carried on the same path as future data packet will be.  The
483	   PCN edge nodes do not forward the RSVP control messages outside the
484	   PCN domain.  Instead, the PCN egress node returns a RESV message to
485	   the PCN ingress node.  The RSVP messages created by PCN nodes on
486	   behalf of the flow need to be discerned somehow from regular end-to-
487	   end RSVP messages because they must not be forwarded outside the PCN
488	   domain.

490	   With this addition, the above presented modification of standard RSVP
491	   can be used to admit flows based on a single probe message.  Note
492	   that there is no stringent need to use RSVP for Edge2PS.  A less
493	   complex two-pass protocol suffices.

495	2.5.  Flow Termination in 3sm

497	   Although flows are admitted only if the PCN traffic rate does not
498	   exceed the admissible rate (AR) on any link of their paths, it is
499	   possible that the PCN traffic rate on a link exceeds the SR, e.g.,
500	   due to changed sending behaviour of admitted flows or due to route
501	   changes after a failure.

503	   With 3sm two different options for flow termination can be supported:
504	   marked flow termination (MFT) and measured rate termination (MRT).
505	   We explain them in the following.

507	2.5.1.  Marked Flow Termination (MFT)

509	   Each PCN egress node monitors its received PCN traffic.  If it
510	   detects an ET-marked packet, the corresponding PCN ingress node is
511	   identified and the PCN egress node sends the flow ID belonging to the
512	   ET-marked packet in a "traffic-reduction" message to the flow
513	   termination entity (e.g. the appropriate PCN ingress node) for
514	   termination.  To save signalling overhead, several IDs may be
515	   signalled in a single "traffic reduction" message.

517	   Marked flow termination can be applied with any deployment model.
518	   How the PCN ingress node of an ET-marked packet is derived depends on
519	   the deployment model:

521	   o  E3Tunnel: the adjacency from which the packet is received defines
522	      the E3Tunnel such that the corresponding PCN ingress node is
523	      known.

525	   o  With End2PS or Edge2PS, the PCN egress nodes can map the packets
526	      to RSVP reservations using RSVP classifiers, and the corresponding
527	      RSVP PATH state contains the address of the PCN ingress node.

529	   Marked flow termination (MFT) requires that only some of the packets
530	   that exceed the supportable rate are ET-marked.  Otherwise, MFT
531	   terminates too many flows.  With MFT, only a small fraction of the
532	   traffic is removed within a round-trip time, but the process
533	   continues if the PCN traffic rate still exceeds the supportable rate
534	   (SR).  Therefore, the PCN traffic rate is gradually reduced until it
535	   drops below SR.  Then, the flow termination process stops since no
536	   more packets are ET-marked.

538	2.5.2.  Measured Rate Termination (MRT)

540	   Measured rate termination can be applied only in the E3Tunnel
541	   deployment model.  It requires that all packets exceeding SR are ET-
542	   marked.  The PCN egress nodes measure the rate of ET-marked (ETR)
543	   packets per E3Tunnel and if it is larger than zero, it signals that
544	   ETR to the corresponding flow termination entity in a "traffic-
545	   reduction" message.  The flow termination entity may be the
546	   corresponding PCN ingress node which then terminates a subset of the
547	   flows carried over the respective E3Tunnel such that the rate of
548	   these flows is about ETR.  However, this is not an easy task since
549	   the actual rates of the flows are in general not known.  If SR-
550	   overload continues in spite of the flow termination, the PCN egress
551	   node must wait some time before it sends a new "traffic-reduction"
552	   message to guarantee that the impact of the previous one is already
553	   reflected by the new measured rate of ET-marked traffic (ETR).

555	   Measured rate termination can reduce SR-overload very quickly, but it
556	   has several drawbacks:

558	   o  Rate measurement of ET-marked packets is complex.

560	   o  The choice of the right subset of flows is difficult and requires
561	      that the rates of individual flows are known.

563	   As marked flow termination (MFT) is simpler than measured rate
564	   termination (MRT), we propose MFT as preferred method for flow
565	   termination in 3sm.

567	3.  Three State PCN Marker with Marking Frequency Reduction (MFR) for
568	    Marked Flow Termination (MFT)

570	   The three state PCN marker (3sm) meters PCN packet streams per link
571	   and performs packet re-marking according to Figure 1.  As a
572	   consequence the following three marking states are required:

574	   o  no pre-congestion (NP),

576	   o  admission-stop (AS), and
577	   o  excess-traffic (ET).

579	   In theory, the meter meters each packet and passes the packet and the
580	   metering result to the marker, and the marker marks packets according
581	   to the results of the meter.  This is illustrated in Figure 2.  The
582	   marking may be coded in the ECN field [RFC3168] of the packet for a
583	   specified PHB in a specific manner.

585	                         +------------+
586	                         |   Result   |
587	                         |            V
588	                     +-------+    +--------+
589	                     |       |    |        |
590	   Packet stream ===>| Meter |===>| Marker |===> Marked packet stream
591	                     |       |    |        |
592	                     +-------+    +--------+

594	           Figure 2: Block diagram of meter and marker function.

596	   The behaviour of the two functions is often described by a single
597	   metering and marking algorithm.  Therefore, we call the algorithm
598	   metering packets relative to admissible rate (AR) and re-marking them
599	   to AS simply AS-marker, and the algorithm metering packets relative
600	   to supportable rate (SR) and re-marking them to ET simply ET-marker.

602	   In the following, we explain the behaviour of the ET- and AS-marker
603	   using token bucket (TB) based algorithms.  The packet sizes counted
604	   by the meters and markers pertain to the size of the IP packet
605	   including its header bytes.  Equivalent virtual queue (VQ) based
606	   algorithms are presented in Appendix A.  Other implementations
607	   approximating the described behaviours can be used.

609	3.1.  ET-Marker

611	   We describe the behaviour for SR-metering and ET-marking, present
612	   pseudo code, explain its configuration, and discuss its behaviour.
613	   We use object-oriented notation for most variables.

615	3.1.1.  Behaviour of SR-Metering and ET-Marking

617	   We propose an ET-marker based on a token bucket with tail marking and
618	   marking frequency reduction (see Appendix A for explanation of
619	   different options).  The TB has a bucket of size TB.size which is
620	   continuously filled with tokens at rate TB.rate.  When a PCN packet
621	   arrives, it is re-marked with "ET" if the fill state of the bucket
622	   (TB.fill) in tokens is smaller than its size (packet.size) in bytes
623	   and "s" additional tokens are added to the bucket; otherwise, the
624	   fill state is reduced by packet.size tokens.  The slowdown parameter
625	   "s" reduces the marking frequency of the algorithm.

627	3.1.2.  Pseudo Code for the ET-Marker

629	   The behaviour of the token bucket with tail marking and marking
630	   frequency reduction for SR-metering and ET-marking is expressed by
631	   the following pseudo code.  It requires the time variable
632	   TB.lastUpdate indicating when the fill state of TB was last updated
633	   and a global variable "now" providing the current time.  A PCN packet
634	   has the variables packet.mark showing its marking (NP, AS, ET) and
635	   packet.size showing its size.

637	    Input: pcn packet

639	      TB.fill = min(TB.size, TB.fill + TB.rate * (now - TB.lastUpdate));
640	      TB.lastUpdate = now;
641	      if (TB.fill < packet.size)
642	          packet.mark = ET;
643	          TB.fill = min(TB.size, TB.fill + s);
644	      else
645	          TB.fill = TB.fill - packet.size;
646	      endif

648	    Output: void

650	   Several enhancements of this algorithm are presented in
651	   Appendix A.1.5.  It marks packet independent of its size and applies
652	   MFR proportionally to the packet size.  Early results show that this
653	   equalizes the termination probability for flows with different packet
654	   sizes and makes the time to remove the overload independent of packet
655	   sizes.  In addition, MFR is also applied when packets are already ET-
656	   marked by previous nodes.  This reduces the potential overtermination
657	   in case of multiple bottleneck links.  These enhancements are simple,
658	   but still make the algorithm slightly more complex.  Therefore, more
659	   performance results and discussions are needed to decide whether
660	   these enhancements should be standard marking behaviour or not
661	   [I-D.babiarz-pcn-explicit-marking], [I-D.menth-pcn-performance].

663	3.1.3.  Configuration of the ET-Marker

665	   The following parameters must be configured:

667	   o  TB.rate: supportable rate (SR)

669	   o  TB.size: supportable burst size (SBS), needs to be set
670	      appropriately

672	   o  Slowdown parameter "s": needs to be set appropriately

674	3.1.4.  Characteristics of the Proposed ET-Marker

676	   The proposed algorithm can be applied with and without marking
677	   frequency reduction (MFR), i.e., s>0 and s=0, respectively.

679	   a) No marking frequency reduction (noMFR, s=0)

681	   If the slowdown parameter is set to s=0, MFR is switched off.  As an
682	   alternative, a simplified version of the given algorithm can be used.
683	   If the PCN traffic rate on a link constantly exceeds its SR, the fill
684	   state of the TB decreases.  Arriving packets for which the number of
685	   tokens in the bucket does not suffice are ET-marked.  The size of the
686	   token bucket (supportable burst size (SBS)) controls how fast the
687	   marker reacts to a traffic rate above SR: if it is set to a low
688	   value, packets are already marked at a rate lower than SR in the
689	   presence of bursts, if it set to a high value, marking starts delayed
690	   if the PCN traffic rate exceeds SR.  A nice property of this option
691	   is that the rate of the ET-marked packets is exactly the rate of the
692	   traffic that exceeds SR (excess traffic rate) under the assumption
693	   that there was no packet loss.  This observation can be used for
694	   "measured rate termination" (MRT): the rate of ET-marked traffic is
695	   measured per ingress-egress aggregate by the PCN egress node and
696	   signalled to the corresponding flow termination entity.

698	   Measured rate termination (MRT) is also an option to realize flow
699	   termination in 3sm, but the preferred option for 3sm is marked flow
700	   termination (MFT).  The drawback of "no MFR" arises in conjunction
701	   with MFT.  Then too many packets are ET-marked and too many flows are
702	   terminated.  That is the motivation to reduce the marking frequency
703	   by setting s>0.

705	   b) Marking frequency reduction (MFR, s>0)

707	   If the slowdown parameter is set to a value s>0, MFR is achieved
708	   because for each marked packet up to additional "s" bytes that exceed
709	   SR can pass the ET-marker without being re-marked to ET.  Thus,
710	   increasing the slowdown parameter "s" decreases the number of ET-
711	   marked packets in a short time period.  In combination with MFT, a
712	   suitable "s" is required to achieve a fast termination of
713	   sufficiently many flows without terminating more flows than
714	   necessary.

716	3.2.  AS-Marker

718	   We explain the behaviour for AR-metering and AS-marking, present
719	   pseudo code, explain its configuration, and discuss its behaviour.

721	3.2.1.  Behaviour of AR-Metering and AS-Marking

723	   We propose an AS-marker based on a token bucket with threshold
724	   marking (see Appendix A for explanation of different options).  The
725	   TB has a bucket of size TB.size which is continuously filled with
726	   tokens at rate TB.rate.  The AR-meter and AS-marker consider only
727	   packets that are not ET-marked.  When a non-ET-marked PCN packet
728	   arrives, it is re-marked to "AS" if the fill state of the bucket
729	   (TB.fill) in tokens is smaller than its size (packet.size) in bytes;
730	   otherwise, the fillstate is reduced by packet.size tokens and if the
731	   fill state is then smaller than the marking threshold (TB.threshold),
732	   the packet is also re-marked to "AS" while if the fill state is then
733	   larger than or equal to the marking threshold, the packet is not re-
734	   marked.

736	   The AS-marker is sensitive to ET-markings in the sense that only non-
737	   ET-marked packets are considered for remarking.

739	3.2.2.  Pseudo Code for AS-Marker

741	   The behaviour of the token bucket with threshold marking for AR-
742	   metering and AS-marking is expressed by the following pseudo code
743	   using the same nomenclature like above.

745	   Input: pcn packet

747	      TB.fill = min(TB.size, TB.fill+TB.rate*(now-TB.lastUpdate));
748	      TB.lastUpdate = now
749	      if (packet.mark <> ET) //mark only non-ET-marked packets
750	        if (TB.fill < TB.threshold)
751	            packet.mark = AS;
752	        endif
753	        TB.fill = max(0, TB.fill - packet.size);
754	      endif

756	   Output: void

758	3.2.3.  Configuration of the AS-Marker

760	   The following parameters must be configured:

762	   o  TB.rate: admissible rate (AR)

764	   o  TB.size (TBS): admissible burst size (ABS) needs to be set
765	      appropriately

767	   o  TB.threshold: needs to be set appropriately

769	3.2.4.  Characteristics of the Proposed AS-Marker

771	   If the AR is exceeded, the TB fill state continuously decreases, it
772	   eventually falls below its marking threshold TB.threshold, and it
773	   only increases again if the PCN traffic rate on the link falls below
774	   AR.  As a consequence, all packets are AS-marked during that time and
775	   admission of further flows is stopped until the PCN traffic rate
776	   drops below AR.  In particular, all probe packets are AS-marked.

778	   [TR437] investigates the impact of the TB parameters on the marking
779	   characteristics, i.e. the percentage of marked packets depending on
780	   the PCN traffic rate relative to AR.  If the PCN traffic rate is
781	   above AR, all packets must be marked with AS.  To that end,
782	   TB.threshold must be set large enough.  If the PCN traffic rate is
783	   below AR, no packets should be marked.  To that end TB.size-
784	   TB.threshold must be set large enough.  Then, we get marking with
785	   clear decisions, i.e. packets are marked if the PCN rate is above AR
786	   and they are not marked if the PCN rate is below AR.  This type of
787	   marking is required for 3sm.  [I-D.menth-pcn-performance] provides a
788	   summary of [TR437].

790	3.3.  Marking Codepoints

792	   PCN metering and marking requires classification of traffic which is
793	   subject to PCN metering and PCN marking (PCN-capable).  Furthermore,
794	   PCN-aware flows that are subject to PCN-marking require at least the
795	   following codepoints:

797	   o  "no-precongestion" (NP),

799	   o  "admission-stop" (AS), and

801	   o  "excess-traffic" (ET).

803	   These signals may be encoded by re-using the two-bit ECN field or by
804	   different DS codepoints.  The actual encoding is out of the scope of
805	   this document.

807	4.  Benefits and Shortcomings of the 3sm Proposal

809	   This section highlights some benefits of the 3sm proposal that we
810	   think are important.  Some of them are based on performance results
811	   reported in [SIM-07], [I-D.babiarz-pcn-explicit-marking],
812	   [I-D.menth-pcn-performance], and [TR437].

814	4.1.  Benefits

816	   o  3sm does not require the standardization of the exact algorithm in
817	      the PCN interior node but only the standardization of the
818	      behavior.

820	   o  3sm does not require periodic transmission of measurement results
821	      from PCN egress to PCN ingress.

823	   o  The supportable rate of a link can be chosen independently of the
824	      admissible rate as long as it is not smaller.  This is maximum
825	      freedom and does not imply any degradation in resource efficiency
826	      for resilient admission control [PCN-Config].

828	   o  3sm works well with multipath routing due to marked flow
829	      termination and the application of probing when needed.

831	      *  A flow is terminated only if it is carried over an SR-
832	         overloaded path and if one of its packets was ET-marked.
833	         Therefore, its termination reduces the SR-overload on a
834	         bottleneck link.

836	      *  Conversely, flows that are not carried over a bottleneck link
837	         cannot be ET-marked and, therefore, not terminated.

839	   o  Marked flow termination is very robust

841	      *  The slowdown factor "s" can be reasonably chosen such that the
842	         time to remove the overload is short and that not more traffic
843	         than necessary is terminated.

845	      *  It works well with low and high packet loss.  Packet loss just
846	         increases the time to remove the overload.

848	      *  It works well with a small and large number of flows.  In
849	         particular, the right number of flows is terminated if there is
850	         only one flow per aggregate and an SR-overload of, e.g., 30%.

852	      *  It works well with constant bit rate (CBR) voice, on/off voice,
853	         and variable bit rate (VBR) traffic.  It is not very sensitive
854	         to different traffic characteristics [SIM-07].

856	      *  It works well with multiple bottleneck links in the path of a
857	         flow.

859	      *  It works well with flows that have significantly different
860	         round-trip times (RTT) or flow termination delays.  In
861	         particular, flows with different RTTs suffer the same flow
862	         termination probability.

864	      *  In case of terminating bidirectional flows, the termination of
865	         a flow entails the termination of the downstream and upstream
866	         flow.  In case of overload on both the downstream and upstream
867	         path, flows are terminated on both path.  This leads to
868	         increased termination aggressiveness.  However, marked flow
869	         termination leads only to little overtermination as it
870	         terminates traffic only gradually.

872	      *  Different slowdown factors "s" may be used for the
873	         configuration of marking frequency reduction on different links
874	         within a single PCN domain.

876	      *  It leads to higher termination probabilities for flows with
877	         larger packets.  This can be repaired by packet size
878	         independent marking (PSIM, in Appendix A.1.5)

880	4.2.  Shortcomings of 3sm

882	   o  Marked flow termination leads to higher termination probabilities
883	      for flows with larger packet frequency (higher rate flows).
884	      Currently, there is no mechanism to improve this.

886	5.  Security Considerations

888	   The Three State PCN Marker has no known security concerns.

890	6.  Changes from Previous Revision

892	   Changes in version -01 compared to version -00:

894	   * Abstract: adapted: more general, better summary

896	   * shortened introduction and adapted it to new document structure

898	   * put "1.3 Terminology" into separate section, clarification and some
899	   minor changes

901	   * put "1.2 Overview of PCN" into a separate section on 3sm edge
902	   behaviour

904	   * added more structure and content to that section

906	   o  deployment scenarios for admission control
907	   o  added measured rate termination (MRT) as a non-preferred option to
908	      marked flow termination (MFT) in 3sm

910	   * introduced AS-meter as abbreviation for AR-metering and AS-marking
911	   function

913	   * introduced ET-meter as abbreviation for SR-metering and ET-marking
914	   function

916	   * removed comment about similarity of 3sm marking and srTCM (RFC2697)

918	   * current 3.1 did some reformulation and added some new simulation
919	   insights, moved optimizations into Appendix A.1.5

921	   * current 3.2 did some reformulation and added some new simulation
922	   insights, changed the AS-marker according to

924	   o  Joe Babiarz's comment saying all packets need to be considered for
925	      AR-metering

927	   o  Bob Brisco's comment saying TB.fill should be set to zero if it is
928	      smaller than packet.size

930	   o  provided insights for the setting of TB parameters to obtain
931	      desired marking behaviour

933	   * new section on benefits of the 3sm proposal added

935	   * A.1 - A.1.4 minor changes (rewording)

937	   * A.1.5 new section on optimization for 3sm marker (PSIM, PMFR, MFR
938	   for already marked packets)

940	   * A.1.6 simplified threshold marking according to Bob's comments

942	   * old A.6 Section on Related Work (srTCM) removed as srTCM without
943	   change cannot be reused for implementation of threshold marking

945	   * removed Appendix B and integrated the comments into the previous
946	   sections of the document if necessary

948	7.  Acknowledgements

950	   The authors would like to thank the following people for reviewing
951	   this draft or earlier versions thereof and for their suggestions to
952	   make this document more complete: Dave McDysan, Nicolas Chevrollier,
953	   Frank Lehrieder, Bob Brisco, and Ben Strulo.

955	8.  Informative References

957	   [I-D.babiarz-pcn-explicit-marking]
958	              Liu, X. and J. Babiarz, "Simulations Results for 3sM",
959	              draft-babiarz-pcn-explicit-marking-01 (work in progress),
960	              July 2007.

962	   [I-D.ietf-pcn-architecture]
963	              Eardley, P., "Pre-Congestion Notification Architecture",
964	              draft-ietf-pcn-architecture-01 (work in progress),
965	              October 2007.

967	   [I-D.menth-pcn-performance]
968	              Menth, M. and F. Lehrieder, "Performance Evaluation of
969	              PCN-Based Algorithms", draft-menth-pcn-performance-00
970	              (work in progress), November 2007.

972	   [Maglaris-88]
973	              Maglaris et al, "Performance Models of Statistical
974	              Multiplexing in Packet Video Communications, IEEE
975	              Transactions on Communications 36, pp. 834-844", July
976	               1988.

978	   [PCN-Config]
979	              Menth, M. and M. Hartmann, "PCN-Based Resilient Network
980	              Admission Control: The Impact of a Single Bit", (http://
981	              www3.informatik.uni-wuerzburg.de/staff/menth/Publications/
982	              Menth07-PCN-Config.pdf)", 2007.

984	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
985	              Requirement Levels", BCP 14, RFC 2119, March 1997.

987	   [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
988	              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
989	              Functional Specification", RFC 2205, September 1997.

991	   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
992	              "Definition of the Differentiated Services Field (DS
993	              Field) in the IPv4 and IPv6 Headers", RFC 2474,
994	              December 1998.

996	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
997	              and W. Weiss, "An Architecture for Differentiated
998	              Services", RFC 2475, December 1998.

1000	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1001	              of Explicit Congestion Notification (ECN) to IP",
1002	              RFC 3168, September 2001.

1004	   [RFC4594]  Babiarz, J., Chan, K., and F. Baker, "Configuration
1005	              Guidelines for DiffServ Service Classes", RFC 4594,
1006	              August 2006.

1008	   [SIM-07]   Liu, X-G. and J. Babiarz, "Simulation Results for Explicit
1009	              PCN Marking and Flow Termination
1010	              (http://standards.nortel.com/pcn/Simulation_EPCN.pdf)",
1011	              February 2007.

1013	   [TR437]    Menth, M. and F. Lehrieder, "Comparison of Marking
1014	              Algorithms for PCN-Based Admission Control, Technical
1015	              Report No. 437, (http://               www-
1016	              info3.informatik.uni-wuerzburg.de/TR/tr437.pdf)",
1017	              October 2007.

1019	Appendix A.  Overview of Token Bucket (TB) and Virtual Queue (VQ)

1021	   Token buckets (TB) and virtual queues (VQ) are equivalent base
1022	   mechanisms for algorithms to control whether a packet is conform to
1023	   its flow's indicated rate R and burst size S. Therefore, TB
1024	   parameters are frequently used as traffic descriptors.  TBs and VQs
1025	   are dual approaches: while packets are TB-conform as long as
1026	   sufficient tokens are in the bucket at their arrival times, they are
1027	   VQ-conform as long as sufficient free space is available in the queue
1028	   at their arrival times.  Therefore, TBs and VQs can be used
1029	   interchangeably and, in particular, algorithms given based on a TB
1030	   description can be implemented by a VQ and vice-versa.

1032	   In the following, we explain the basic VQ and TB mechanisms
1033	   (Appendix A.1.1 and Appendix A.1.2).  Packets are marked depending on
1034	   the state of the VQ or TB at their arrival time.  There are different
1035	   marking options.  Only those packets that are not conform to its flow
1036	   description may be marked (tail marking, Appendix A.1.3), or only
1037	   some non-conforming packets may be marked (tail marking with marking
1038	   frequency reduction, Appendix A.1.4), or all packets may be marked
1039	   until the flow again reaches conformity (threshold marking,
1040	   Appendix A.1.6).

1042	A.1.  New features for marking algorithm

1044	A.1.1.  Virtual Queue (VQ)

1046	   We use an object-oriented notation for a more intuitive readability
1047	   of the algorithms.  The VQ has a VQ rate (VQ.rate) and a queue which
1048	   is capable to store up to VQ.size bytes.  The current length of the
1049	   queue is denoted by VQ.length.  This length is reduced over time at
1050	   rate VQ.rate.  When a packet arrives, it is "accepted" by the VQ and
1051	   increments VQ.length by its size (packet.size) if there is still
1052	   enough free space in the queue to accommodate it; otherwise it is
1053	   "rejected".  As the queue size is decreased continuously over time,
1054	   the behaviour of a VQ is best described by a fluid model.  However,
1055	   the state of the VQ shortly after packet arrivals can be calculated
1056	   based on the current time "now" and the length of the VQ at the last
1057	   update time of the VQ (VQ.lastUpdate) using the following algorithm:

1059	   VQ.length = max(0, VQ.length - (now - VQ.lastUpdate) * VQ.rate);
1060	   VQ.lastUpdate = now;
1061	   if (VQ.length + packet.size <= VQ.size)
1062	           VQ.length = VQ.length + packet.size;
1063	   endif

1065	A.1.2.  Token Bucket (TB)

1067	   The TB is basically the same mechanism, but it looks at the problem
1068	   from a different angle.  The TB has a rate (TB.rate) and a bucket
1069	   which is capable to store up to TB.size tokens.  A token is the
1070	   permission to send one byte.  The current fill state of the bucket is
1071	   denoted by TB.fill.  This fill state is increased over time at rate
1072	   TB.rate.  When a packet arrives, it is "accepted" and decrements
1073	   TB.fill by its size (packet.size) if there are enough tokens in the
1074	   bucket to send the entire packet; otherwise it is "rejected".  As the
1075	   fill state is increased continuously over time, the behaviour of a TB
1076	   is best described by a fluid model.  However, the state of the TB
1077	   shortly after packet arrival can be calculated based on the current
1078	   time "now" and the fill state of the TB at the last update time of
1079	   the TB (TB.lastUpdate) using the following algorithm:

1081	   TB.fill = min(TB.size, TB.fill + (now - TB.lastUpdate) * TB.rate);
1082	   TB.lastUpdate = now;
1083	   if (TB.fill >= packet.size)
1084	           TB.fill = TB.fill - packet.size;
1085	   endif

1087	A.1.3.  Tail Marking

1089	   To control whether packets of a stream with rate R and maximum burst
1090	   size MBS are conform to the description R and MBS, the stream is
1091	   metered either by a VQ with VQ.rate=R and VQ.size=MBS, or by a TB
1092	   with TB.rate=R and TB.size=MBS.  If a packet is accepted by the VQ or
1093	   by the TB, it is marked in-profile.  If it is rejected, it is marked
1094	   out-of-profile.

1096	   The corresponding pseudo codes are for the VQ:

1098	   VQ.length = max(0, VQ.length - (now - VQ.lastUpdate) * VQ.rate);
1099	   VQ.lastUpdate = now;
1100	   if (VQ.length + packet.size <= VQ.size)
1101	           VQ.length = VQ.length + packet.size;
1102	           packet.mark = in-profile;
1103	   else
1104	           packet.mark = out-of-profile;
1105	   endif

1107	   and for the TB:

1109	   TB.fill = min(TB.size, TB.fill + (now - TB.lastUpdate) * TB.rate);
1110	   TB.lastUpdate = now;
1111	   if (TB.fill >= packet.size)
1112	           TB.fill = TB.fill - packet.size;
1113	           packet.mark = in-profile;
1114	   else
1115	           packet.mark = out-of-profile;
1116	   endif

1118	A.1.4.  Tail Marking with Marking Frequency Reduction (MFR)

1120	   The objective of tail marking with MFR is to mark only some of the
1121	   packets that are out-of-profile.  The strength of the reduction can
1122	   be controlled by the slowdown parameter "s".  When a packet is
1123	   classified out-of-profile, the VQ length is decremented by "s" bytes
1124	   and the TB fill state is incremented by "s" tokens, respectively.  As
1125	   a consequence, the VQ and the TB are not likely to mark consecutive
1126	   packets as out-of-profile which reduces their marking frequency.

1128	   The corresponding pseudo codes are for the VQ:

1130	   VQ.length = max(0, VQ.length - (now - VQ.lastUpdate) * VQ.rate);
1131	   VQ.lastUpdate = now;
1132	   if (VQ.length + packet.size <= VQ.size)
1133	           VQ.length = VQ.length + packet.size;
1134	           packet.mark = in-profile;
1135	   else
1136	      VQ.length = max(0, VQ.length-s); //marking frequency reduction
1137	           packet.mark = out-of-profile;
1138	   endif

1140	   and for the TB:

1142	   TB.fill = min(TB.size, TB.fill + (now - TB.lastUpdate) * TB.rate);
1143	   TB.lastUpdate = now;
1144	   if (TB.fill >= packet.size)
1145	           TB.fill = TB.fill - packet.size;
1146	           packet.mark = in-profile;
1147	   else
1148	     TB.fill = min(TB.size, TB.fill+s) //marking frequency reduction
1149	      packet.mark = out-of-profile;
1150	   endif

1152	   If the slowdown parameter is set to s=0, the marking algorithm
1153	   behaves like pure tail marking.

1155	A.1.5.  Tail Marking with Packet Size Independent Marking (PSIM) and
1156	        Proportional Marking Frequency Reduction (PMFR)

1158	   For the sake of fairness, large packets should not have a larger
1159	   packet marking probability; otherwise this creates incentives to send
1160	   many small packets instead of a few large packets.  Therefore, we
1161	   propose to test a packet arrival whether an MTU can still be
1162	   supported.  Then, the marking probability depends only on VQ.length
1163	   or TB.fill, respectively.

1165	   For the sake of better control of the termination aggressiveness,
1166	   more flows should be marked and terminated in the presence of low bit
1167	   rate flows than in the presence of high bit rate flows.  To that end,
1168	   we propose to remove (VQ) or add (TB) the slowdown factor
1169	   proportionally to the packet size.

1171	   The reason for adding "s" tokens to the bucket when a packet is
1172	   marked is the anticipation of the termination of that flow.  This is
1173	   done to approximate the fill state with this flow already being
1174	   terminated in order to avoid that too many additional flows are
1175	   marked.  The same condition is met when an already ET-marked packet
1176	   arrives at the ET-marker.  Therefore, the same slowdown factor "s" is
1177	   added to the bucket of the ET-marker as if it had ET-marked the
1178	   packet itself.  This is done to avoid that more traffic than
1179	   necessary is terminated in case that a traffic aggregate crosses
1180	   several bottleneck links.  However, this situation is rather unlikely
1181	   and the effect is well visible only under pathologic conditions.
1182	   Therefore, optimal behaviour is not crucial and the algorithm may be
1183	   simplified by ignoring packets that are already ET-marked.

1185	   These are obvious enhancements of the base algorithm, but they are
1186	   only recommended when solid performance results indicate that the
1187	   improvements have a significant impact in practice.  These studies
1188	   are still ongoing.

1190	   The following pseudo codes incorporate all proposed enhancements.  We
1191	   give them for a VQ approach:

1193	   VQ.length = max(0, VQ.length - (now - VQ.lastUpdate) * VQ.rate);
1194	   VQ.lastUpdate = now;
1195	   if (packet.mark<>ET)
1196	      if (VQ.length + MTU <= VQ.size)                     // PSIM
1197	         VQ.length = VQ.length + packet.size;
1198	      else
1199	         VQ.length = max(0, VQ.length-s*paket.size/MTU);  // PMFR
1200	         packet.mark = ET;
1201	      endif
1202	   else
1203	      VQ.length = max(0, VQ.length-s*paket.size/MTU);     // PMFR
1204	   endif

1206	   and for the TB:

1208	   TB.fill = min(TB.size, TB.fill + (now - TB.lastUpdate) * TB.rate);
1209	   TB.lastUpdate = now;
1210	   If (packet.mark<>ET)
1211	      if (TB.fill >= MTU)                                 // PSIM
1212	         TB.fill = TB.fill - packet.size;
1213	      else
1214	         TB.fill = min(TB.size, TB.fill+s*packet.size/MTU) // PMFR
1215	         packet.mark = ET;
1216	      endif
1217	   else
1218	      TB.fill = min(TB.size, TB.fill + s*packet.size/MTU); // PMFR
1219	   endif

1221	A.1.6.  Threshold Marking

1223	   The objective of threshold marking is to mark all packets with, e.g.,
1224	   "rate-exceeded" as long as some packets are out-of-profile with
1225	   respect to flow parameters R and MBS.  We achieve that by setting the
1226	   VQ or TB size larger than MBS.  Packets are marked if the VQ length
1227	   exceeds MBS or if the fill state of the TB falls below TB.size-MBS.
1228	   Furthermore, the packet size is always added to the queue or removed
1229	   from the bucket.  Note that marking thresholds need to be configured
1230	   differently for VQ and TB to obtain the same behaviour:

1232	   o  VQ.threshold = MBS;

1234	   o  TB.threshold = TB.size - MBS.

1236	   The corresponding pseudo codes are for the VQ:

1238	   VQ.length = max(0, VQ.length - (now - VQ.lastUpdate) * VQ.rate);
1239	      VQ.lastUpdate = now;
1240	      if (VQ.length > VQ.threshold)
1241	         packet.mark = "rate-exceeded"; // packet out-of-profile
1242	      endif
1243	      VQ.length = min(VQ.size, VQ.length + packet.size);

1245	   and for the TB:

1247	   TB.fill = min(TB.size, TB.fill + (now - TB.lastUpdate) * TB.rate);
1248	      TB.lastUpdate = now;
1249	      TB.fill = TB.fill - packet.size;
1250	      if (TB.fill < TB.threshold)
1251	          packet.mark = "rate-exceeded"; // packet out-of-profile
1252	      endif
1253	      TB.fill = max(0, TB.fill - packet.size);

1255	Authors' Addresses

1257	   Jozef Z. Babiarz
1258	   Nortel
1259	   3500 Carling Avenue
1260	   Ottawa, Ont.  K2H 8E9
1261	   Canada

1263	   Phone: +1-613-763-6098
1264	   Email: babiarz@nortel.com

1266	   Xiao-Gao Liu
1267	   Nortel
1268	   3500 Carling Avenue
1269	   Ottawa, Ont.  K2H 8E9
1270	   Canada

1272	   Phone: +1-613-763-7516
1273	   Email: xgliu@nortel.com
1274	   Kwok Ho Chan
1275	   Nortel
1276	   600 Technology Park Drive
1277	   Billerica, MA  01821
1278	   USA

1280	   Phone: +1-978-288-8175
1281	   Email: khchan@nortel.com

1283	   Dr. Michael Menth
1284	   University of Wuerzburg
1285	   Institute of Computer Science
1286	   Am Hubland, D-97074 Wuerzburg,  Room B206
1287	   Germany

1289	   Phone: (+49)-931/888-6644
1290	   Email: menth@informatik.uni-wuerzburg.de

1292	Full Copyright Statement

1294	   Copyright (C) The IETF Trust (2007).

1296	   This document is subject to the rights, licenses and restrictions
1297	   contained in BCP 78, and except as set forth therein, the authors
1298	   retain all their rights.

1300	   This document and the information contained herein are provided on an
1301	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1302	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
1303	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
1304	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1305	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1306	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1308	Intellectual Property

1310	   The IETF takes no position regarding the validity or scope of any
1311	   Intellectual Property Rights or other rights that might be claimed to
1312	   pertain to the implementation or use of the technology described in
1313	   this document or the extent to which any license under such rights
1314	   might or might not be available; nor does it represent that it has
1315	   made any independent effort to identify any such rights.  Information
1316	   on the procedures with respect to rights in RFC documents can be
1317	   found in BCP 78 and BCP 79.

1319	   Copies of IPR disclosures made to the IETF Secretariat and any
1320	   assurances of licenses to be made available, or the result of an
1321	   attempt made to obtain a general license or permission for the use of
1322	   such proprietary rights by implementers or users of this
1323	   specification can be obtained from the IETF on-line IPR repository at
1324	   http://www.ietf.org/ipr.

1326	   The IETF invites any interested party to bring to its attention any
1327	   copyrights, patents or patent applications, or other proprietary
1328	   rights that may cover technology that may be required to implement
1329	   this standard.  Please address the information to the IETF at
1330	   ietf-ipr@ietf.org.

1332	Acknowledgment

1334	   Funding for the RFC Editor function is provided by the IETF
1335	   Administrative Support Activity (IASA).