idnits 2.17.1 

draft-ietf-pcn-architecture-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 1457.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1468.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1475.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1481.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (August 8, 2007) is 6106 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC2119' is defined on line 1298, but no explicit
     reference was found in the text

  == Unused Reference: 'I-D.briscoe-re-pcn-border-cheat' is defined on line
     1355, but no explicit reference was found in the text

  == Unused Reference: 'RFC4774' is defined on line 1371, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3168' is defined on line 1388, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-03) exists of
     draft-charny-pcn-single-marking-02

  == Outdated reference: A later version (-07) exists of
     draft-ietf-tsvwg-admitted-realtime-dscp-01

  == Outdated reference: A later version (-02) exists of
     draft-ietf-tsvwg-ecn-mpls-01

  == Outdated reference: A later version (-02) exists of
     draft-ietf-pwe3-congestion-frmwk-00

  == Outdated reference: A later version (-04) exists of
     draft-chan-pcn-encoding-comparison-00


     Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Congestion and Pre-Congestion                   Philip. Eardley (Editor)
3	Notification Working Group                                            BT
4	Internet-Draft                                            August 8, 2007
5	Intended status: Informational
6	Expires: February 9, 2008

8	                Pre-Congestion Notification Architecture
9	                     draft-ietf-pcn-architecture-00

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on February 9, 2008.

36	Copyright Notice

38	   Copyright (C) The IETF Trust (2007).

40	Abstract

42	   The purpose of this document is to describe a general architecture
43	   for flow admission and termination based on aggregated pre-congestion
44	   information in order to protect the quality of service of established
45	   inelastic flows within a single DiffServ domain.

47	Status

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
53	   3.  Assumptions and constraints on scope . . . . . . . . . . . . .  7
54	     3.1.  Assumption 1: Trust - controlled environment . . . . . . .  8
55	     3.2.  Assumption 2: Real-time applications . . . . . . . . . . .  9
56	     3.3.  Assumption 3: Many flows and additional load . . . . . . .  9
57	     3.4.  Assumption 4: Emergency use out of scope . . . . . . . . .  9
58	     3.5.  Other assumptions  . . . . . . . . . . . . . . . . . . . . 10
59	   4.  High-level functional architecture . . . . . . . . . . . . . . 10
60	   5.  Detailed Functional architecture . . . . . . . . . . . . . . . 14
61	     5.1.  PCN-interior-node functions  . . . . . . . . . . . . . . . 14
62	     5.2.  PCN-ingress-node functions . . . . . . . . . . . . . . . . 15
63	     5.3.  PCN-egress-node functions  . . . . . . . . . . . . . . . . 16
64	     5.4.  Admission control functions  . . . . . . . . . . . . . . . 16
65	     5.5.  Probing functions  . . . . . . . . . . . . . . . . . . . . 17
66	     5.6.  Flow termination functions . . . . . . . . . . . . . . . . 18
67	     5.7.  Addressing . . . . . . . . . . . . . . . . . . . . . . . . 19
68	     5.8.  Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . 19
69	     5.9.  Fault handling . . . . . . . . . . . . . . . . . . . . . . 20
70	   6.  Design goals and challenges  . . . . . . . . . . . . . . . . . 21
71	   7.  Operations and Management  . . . . . . . . . . . . . . . . . . 23
72	     7.1.  Fault OAM  . . . . . . . . . . . . . . . . . . . . . . . . 23
73	     7.2.  Configuration OAM  . . . . . . . . . . . . . . . . . . . . 23
74	     7.3.  Accounting OAM . . . . . . . . . . . . . . . . . . . . . . 25
75	     7.4.  Performance OAM  . . . . . . . . . . . . . . . . . . . . . 25
76	     7.5.  Security OAM . . . . . . . . . . . . . . . . . . . . . . . 26
77	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 26
78	   9.  Security considerations  . . . . . . . . . . . . . . . . . . . 26
79	   10. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 27
80	   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27
81	   12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 28
82	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28
83	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 28
84	     13.2. Informative References . . . . . . . . . . . . . . . . . . 28
85	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 31
86	   Intellectual Property and Copyright Statements . . . . . . . . . . 32

88	1.  Introduction

90	   The purpose of this document is to describe a general architecture
91	   for flow admission and termination based on aggregated (pre-)
92	   congestion information in order to protect the quality of service of
93	   flows within a DiffServ domain,[RFC2475].  This document defines an
94	   architecture for implementing two mechanisms to protect the quality
95	   of service of established inelastic flows within a single DiffServ
96	   domain, where all boundary and interior nodes are PCN-enabled and
97	   trust each other for correct PCN operation.  Flow admission control
98	   determines whether a new flow should be admitted and protects the QoS
99	   of existing PCN-flows in normal circumstances, by avoiding congestion
100	   occurring.  However, in abnormal circumstances, for instance a
101	   disaster affecting multiple nodes and causing traffic re-routes, then
102	   the QoS on existing PCN-flows may degrade even though care was
103	   exercised when admitting those flows before those circumstances.
104	   Therefore we also propose a mechanism for flow termination, which
105	   removes enough traffic in order to protect the QoS of the remaining
106	   PCN-flows.  As a fundamental building block to enable these two
107	   mechanisms, PCN-interior-nodes generate, encode and transport pre-
108	   congestion information towards the PCN-egress-nodes.

110	   Two rates, a PCN-lower-rate and a PCN-upper-rate, can be associated
111	   with each link of the PCN-domain.  Each rate is used by an algorithm
112	   (specified in another document) that determines how and when a number
113	   of PCN-packets are marked, and how the markings are encoded in packet
114	   headers.  PCN-egress-nodes make measurements of the packet markings
115	   and send information as necessary to the nodes that make the decision
116	   about which PCN-flows to accept/reject or terminate, based on this
117	   information.  Another document will describe the decision-making
118	   algorithms.  Overall the aim is to enable PCN-nodes to give an "early
119	   warning" of potential congestion before there is any significant
120	   build-up of PCN-packets in the queue; the admission control mechanism
121	   limits the PCN-traffic on each link to *roughly* its PCN-lower-rate
122	   and the flow termination mechanism limits the PCN-traffic on each
123	   link to *roughly* its PCN-upper-rate.

125	   We believe that the key benefits of the PCN mechanisms described in
126	   this document are that they are simple, scalable, and robust because:

128	   o  Per flow state is only required at the PCN-ingress-nodes
129	      ("stateless core").  This is required for policing purposes (to
130	      prevent non-admitted PCN traffic from entering the PCN-domain) and
131	      so on.  It is not generally required that other network entities
132	      are aware of individual flows (although they may be in particular
133	      deployment scenarios).

135	   o  Admission control is resilient: QoS reservations are decoupled
136	      from the routing system and so in general admitted flows can
137	      survive capacity, routing or topology changes without additional
138	      signalling.  The PCN-lower-rates can be chosen small enough that
139	      admitted traffic can still be carried after a rerouting in most
140	      failure cases.  This is an important feature as QoS violations in
141	      core networks due to link failures are more likely than QoS
142	      violations due to increased traffic volume, [Iyer].

144	   o  The PCN-marking algorithms only operate on the overall PCN-traffic
145	      on the link, not per flow.

147	   o  The information of these measurements is signalled to the PCN-
148	      egress-nodes by the PCN-marks in the packet headers.  No
149	      additional signalling protocol is required for transporting the
150	      PCN-marks.  Therefore no secure binding is required between data
151	      packets and separate congestion messages.

153	   o  The PCN-egress-nodes make separate measurements, operating on the
154	      overall PCN-traffic, for each PCN-ingress-node, ie not per flow.
155	      Similarly, signalling by the PCN-egress-node of PCN-feedback-
156	      information (which is used for flow admission and termination
157	      decisions) is at the granularity of the ingress-egress-aggregate.

159	   o  The admitted PCN-load is controlled dynamically.  Therefore it
160	      adapts as the traffic matrix changes, and also if the network
161	      topology changes (eg after a link failure).  Hence an operator can
162	      be less conservative when deploying network capacity, and less
163	      accurate in their prediction of the PCN-traffic matrix.

165	   o  The termination mechanism complements admission control.  It
166	      allows the network to recover from sudden unexpected surges of
167	      PCN-traffic on some links, thus restoring QoS to the remaining
168	      flows.  Such scenarios are expected to be rare but not impossible.
169	      They can be caused by large network failures that redirect lots of
170	      admitted PCN-traffic to other links, or by malfunction of the
171	      measurement-based admission control in the presence of admitted
172	      flows that send for a while with an atypically low rate and then
173	      increase their rates in a correlated way.

175	   o  The PCN-upper-rate may be set below the maximum rate that PCN-
176	      traffic can be transmitted on a link, in order to trigger
177	      termination of some PCN-flows before loss of PCN-packets occurs or
178	      to keep the maximum PCN-load on a link below a level configured by
179	      the operator.

181	   Operators of networks will want to use the PCN mechanisms in various
182	   arrangements, for instance depending on how they are performing
183	   admission control outside the PCN-domain (users after all are
184	   concerned about QoS end-to-end), what their particular goals and
185	   assumptions are, and so on.  Several deployment models are possible:

187	   o  An operator may choose to deploy either admission control or flow
188	      termination or both (see Section 7.2).

190	   o  IntServ over DiffServ [RFC2998].  The DiffServ region is PCN-
191	      enabled, RSVP signalling is used end-to-end and the PCN-domain is
192	      a single RSVP hop, ie only the PCN-boundary-nodes process RSVP
193	      messages.  Outside the PCN-domain RSVP messages are processed on
194	      each hop.  This is described in
195	      [I-D.briscoe-tsvwg-cl-architecture]

197	   o  RSVP signalling is originated and/or terminated by proxies, with
198	      application-layer signalling between the end user and the proxy.
199	      For instance SIP signalling with a home hub.

201	   o  Similar to previous bullets but NSIS signalling is used instead of
202	      RSVP.

204	   o  NOTE: Consideration of signalling extensions for specific
205	      protocols is outside the scope of the PCN WG, however it will
206	      produce a "Requirements for signalling" document as potential
207	      input for the appropriate WGs.

209	   o  Depending on the deployment scenario, the decision-making
210	      functionality (about flow admission and termination) could reside
211	      at the PCN-ingress-nodes or PCN-egress-nodes or at some central
212	      control node in the PCN-domain.  NOTE: The Charter restricts us to
213	      considering when functionality is at the PCN-boundary-nodes.

215	   o  There are several PCN-domains on the end-to-end path, each
216	      operating PCN mechanisms independently.  NOTE: The Charter
217	      restricts us to considering a single PCN-domain.  A possibility
218	      after re-chartering is to consider operating PCN over concatenated
219	      DiffServ domains that don't trust each other (ie weakens
220	      Assumption 1 about trust, see Section 3.1)

222	   o  The PCN-domain extends to the end users.  NOTE: This is outside
223	      the Charter because it breaks Assumption 3 (aggregation, see
224	      later; incidentally it doesn't necessarily break Assumption 1
225	      (trust), because in some environments, eg corporate, the end user
226	      may have a controlled configuration and so be trusted).  The
227	      scenario is described in [I-D.babiarz-pcn-sip-cap].

229	   o  Pseudowire: PCN may be used as a congestion avoidance mechanism
230	      for edge to edge pseudowire emulations

232	      [I-D.ietf-pwe3-congestion-frmwk].  NOTE: Specific consideration of
233	      pseudowires is not in the PCN WG Charter.

235	   o  MPLS: [RFC3270] defines how to support the DiffServ architecture
236	      in MPLS networks.  [I-D.ietf-tsvwg-ecn-mpls] describes how to add
237	      PCN for admission control of microflows into a set of MPLS-TE
238	      aggregates (Multi-protocol label switching traffic engineering).
239	      PCN-marking is done in MPLS's EXP field.  NOTE: This draft is a
240	      TSV WG draft, and is also being reviewed in the MPLS WG.

242	   o  Similarly, it may be possible to extend PCN into Ethernet
243	      networks, where PCN-marking is done in the Ethernet header.  NOTE:
244	      Specific consideration of this extension is outside the PCN WG
245	      Charter.

247	2.  Terminology

249	   o  PCN-domain: a PCN-capable DiffServ domain; a contiguous set of
250	      PCN-enabled DiffServ nodes.

252	   o  PCN-boundary-node: a node that connects one PCN-domain to a node
253	      either in another PCN-domain or in a non PCN-domain.

255	   o  PCN-interior-node: a node in a PCN-domain that is not a PCN-
256	      boundary-node.

258	   o  PCN-node: a PCN-boundary-node or a PCN-interior-node

260	   o  PCN-egress-node: a PCN-boundary-node in its role in handling
261	      traffic as it leaves a PCN-domain.

263	   o  PCN-ingress-node: a PCN-boundary-node in its role in handling
264	      traffic as it enters a PCN-domain.

266	   o  PCN-traffic: A PCN-domain carries traffic of different DiffServ
267	      classes [RFC4594].  Those using the PCN mechanisms are called PCN-
268	      classes (collectively called PCN-traffic) and the corresponding
269	      packets are PCN-packets.  The same network may carry traffic using
270	      other DiffServ classes.

272	   o  Ingress-egress-aggregate: The collection of PCN-packets from all
273	      PCN-flows that travel in one direction between a specific pair of
274	      PCN-boundary-nodes.

276	   o  PCN-lower-rate: a reference rate configured for each link in the
277	      PCN-domain, which is lower than the PCN-upper-rate.  It is used by
278	      an algorithm that determines whether a packet should be PCN-marked
279	      with a first encoding.

281	   o  PCN-upper-rate: a reference rate configured for each link in the
282	      PCN-domain, which is higher than the PCN-lower-rate.  It is used
283	      by an algorithm that determines whether a packet should be PCN-
284	      marked with a second encoding.

286	   o  Threshold-marking: a PCN-marking algorithm such that all PCN-
287	      traffic is marked if the PCN-traffic exceeds a particular rate
288	      (either the PCN-lower-rate or PCN-upper-rate).  NOTE: The
289	      definition reflects the overall intent of the algorithm rather
290	      than its instantaneous behaviour, since the rate measured at a
291	      particular moment depends on the algorithm, its implementation and
292	      the traffic's variance as well as its rate.

294	   o  Excess-rate-marking: a PCN-marking algorithm such that the amount
295	      of PCN-traffic that is PCN-marked is equal to the amount that
296	      exceeds a particular rate (either the PCN-lower-rate or PCN-upper-
297	      rate).  NOTE: The definition reflects the overall intent of the
298	      algorithm rather than its instantaneous behaviour, since the rate
299	      measured at a particular moment depends on the algorithm, its
300	      implementation and the traffic's variance as well as its rate.

302	   o  Pre-congestion: a condition of a link within a PCN-domain in which
303	      the PCN-node performs PCN-marking, in order to provide an "early
304	      warning" of potential congestion before there is any significant
305	      build-up of PCN-packets in the queue.

307	   o  PCN-marking: the process of setting the header in a PCN-packet
308	      based on defined rules, in reaction to pre-congestion.

310	   o  {{if necessary: PCN-lower-rate-marking and PCN-upper-rate-
311	      marking}}

313	   o  PCN-feedback-information: information signalled by a PCN-egress-
314	      node to a PCN-ingress-node or central control node, which is
315	      needed for the flow admission and flow termination mechanisms.

317	3.  Assumptions and constraints on scope

319	   The PCN WG's charter restricts the initial scope by a set of
320	   assumptions.  Here we list those assumptions and explain them.

322	   1.  these components are deployed in a single DiffServ domain, within
323	       which all PCN-nodes are PCN-enabled and trust each other for
324	       truthful PCN-marking and transport

326	   2.  all flows handled by these mechanisms are inelastic and
327	       constrained to a known peak rate through policing or shaping

329	   3.  the number of PCN-flows across any potential bottleneck link is
330	       sufficiently large that stateless, statistical mechanisms can be
331	       effective.  To put it another way, the aggregate bit rate of PCN-
332	       traffic across any potential bottleneck link needs to be
333	       sufficiently large relative to the maximum additional bit rate
334	       added by one flow

336	   4.  PCN-flows may have different precedence, but the applicability of
337	       the PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc.)
338	       is out of scope

340	   After completion of the initial phase, the PCN WG may re-charter to
341	   develop solutions for specific scenarios where some of these
342	   restrictions are not in place.  It may also re-charter to consider
343	   applying the PCN mechanisms to additional deployment scenarios
344	   (operation over concatenated DiffServ domains, PCN-aware application
345	   mechanisms etc.).  The WG may also re-charter to investigate
346	   additional response mechanisms that act on (pre-)congestion
347	   information.  One example could be flow-rate adaptation by elastic
348	   applications (rather than flow admission or termination).  Another
349	   example of a possible future work item is the operation of PCN over
350	   concatenated PCN-domains that don't trust each other (perhaps re-
351	   ECN,[I-D.briscoe-re-pcn-border-cheat]).  The details of these work
352	   items are outside the scope of the initial phase, but the WG may
353	   consider their requirements in order to design components that are
354	   sufficiently general to support such extensions in the future.  The
355	   working assumption is that the standards developed in the initial
356	   phase should not need to be modified to satisfy the solutions for
357	   when these restrictions are removed.

359	3.1.  Assumption 1: Trust - controlled environment

361	   We assume that the PCN-domain is a controlled environment, i.e. all
362	   the nodes in a PCN-domain run PCN and trust each other.  There are
363	   several reasons for proposing this assumption:

365	   o  The PCN-domain has to be encircled by a ring of PCN-boundary-
366	      nodes, otherwise PCN-packets could enter the PCN-domain without
367	      being subject to admission control, which would potentially
368	      destroy the QoS of existing flows.

370	   o  Similarly, a PCN-boundary-node has to trust that all the PCN-nodes
371	      are doing PCN-marking.  A non PCN-node wouldn't be able to alert
372	      that it is suffering pre-congestion, which potentially would lead
373	      to too many PCN-flows being admitted (or too few being
374	      terminated).  Worse, a rogue node could perform various attacks,
375	      as discussed in the Security Considerations section.

377	   One way of assuring the above two points is that the entire PCN-
378	   domain is run by a single operator.  Another possibility is that
379	   there are several operators but they trust each other to a sufficient
380	   level, in their handling of PCN-traffic.

382	3.2.  Assumption 2: Real-time applications

384	   We assume that PCN-packets come from real time applications
385	   generating inelastic traffic [Shenker] like voice and video requiring
386	   low delay, jitter and packet loss, for example the Controlled Load
387	   Service, [RFC2211], and the Telephony service class, [RFC4594].  This
388	   assumption is to help focus the effort where it looks like PCN would
389	   be most useful, ie the sorts of applications where per flow QoS is a
390	   known requirement.  For instance, the impact of this assumption would
391	   be to guide simulations work.

393	3.3.  Assumption 3: Many flows and additional load

395	   We assume that there are many flows on any bottleneck link in the
396	   PCN-domain (or, to put it another way, the aggregate bit rate of PCN-
397	   traffic across any potential bottleneck link is sufficiently large
398	   relative to the maximum additional bit rate added by one flow).
399	   Measurement-based admission control assumes that the present is a
400	   reasonable prediction of the future: the network conditions are
401	   measured at the time of a new flow request, however the actual
402	   network performance must be OK during the call some time later.  One
403	   issue is that if there are only a few variable rate flows, then the
404	   aggregate traffic level may vary a lot, perhaps enough to cause some
405	   packets to get dropped.  If there are many flows then the aggregate
406	   traffic level should be statistically smoothed.  How many flows is
407	   enough depends on a number of things such as the variation in each
408	   flow's rate, the total rate of PCN-traffic, and the size of the
409	   "safety margin" between the traffic level at which we start
410	   admission-marking and at which packets are dropped.

412	   We do not make explicit assumptions on how many PCN-flows are in each
413	   ingress-egress-aggregate.  Performance evaluation work may clarify
414	   whether it is necessary to make any additional assumption on
415	   aggregation at the ingress-egress-aggregate level.

417	3.4.  Assumption 4: Emergency use out of scope

419	   PCN-flows may have different precedence, but the applicability of the
420	   PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc) is out
421	   of scope.

423	3.5.  Other assumptions

425	   It is assumed that PCN-marking is being applied to traffic scheduled
426	   with the expedited forwarding per-hop behaviour, [RFC3246].

428	   It is assumed that PCN-nodes do not perform ECN, [RFC3246], on PCN-
429	   packets.

431	   If a packet that is part of a PCN-flow arrives at a PCN-ingress-node
432	   with its CE (Congestion experienced) codepoint set, then we assume
433	   that the PCN-ingress-node drops the packet.  After its initial
434	   Charter is complete, the WG may decide to work on a mechanism (such
435	   as through a signalling extension) that enables ECN-marking to be
436	   carried transparently across the PCN-domain.

438	4.  High-level functional architecture

440	   The high-level approach is to split functionality between:

442	   o  PCN-interior-nodes 'inside' the PCN-domain, which monitor their
443	      own state of pre-congestion and mark PCN-packets if appropriate.
444	      They are not flow-aware, nor aware of ingress-egress-aggregates.

446	   o  PCN-boundary-nodes at the edge of the PCN-domain, which control
447	      admission of new PCN-flows and termination of existing PCN-flows,
448	      based on information from PCN-interior-node.  This information is
449	      in the form of the PCN-marked data packets (which are intercepted
450	      by the PCN-egress-nodes) and not signalling messages.  PCN-
451	      ingress-nodes are flow-aware (required for policing purposes).  In
452	      several deployment scenarios PCN-egress-nodes will also be flow
453	      aware.  (Normally this adds no complexity since a PCN-boundary-
454	      node acts as both a PCN-ingress-node and as a PCN-egress-node.)

456	   The aim of this split is to keep the bulk of the network simple,
457	   scalable and robust, whilst confining policy, application-level and
458	   security interactions to the edge of the PCN-domain.  For example the
459	   lack of flow awareness means that the PCN-interior-nodes don't care
460	   about the flow information associated with the PCN-packets that they
461	   carry, nor do the PCN-boundary-nodes care about which PCN-interior-
462	   nodes its flows traverse.

464	   Flow admission:

466	   At a high level, flow admission control works as follows.  In order
467	   to generate information about the current state of the PCN-domain,
468	   each PCN-node PCN-marks packets if it is "pre-congested".  Exactly
469	   how a PCN-node decides if it is "pre-congested" (the algorithm) and
470	   exactly how packets are "PCN-marked" (the encoding) will be defined
471	   in a separate standards-track document, but at a high level it is
472	   expected to be as follows:

474	   o  the algorithm: a PCN-node meters the amount of PCN-traffic on each
475	      one of its outgoing links.  The measurement is made as an
476	      aggregate of all PCN-packets, and not per flow.  The algorithm has
477	      a configured parameter, PCN-lower-rate.  As the amount of PCN-
478	      traffic exceeds the PCN-lower-rate, then PCN-packets are PCN-
479	      marked.  See NOTE below for more explanation.

481	   o  the encoding: a PCN-node PCN-marks a PCN-packet (with a first
482	      encoding) by setting fields in the header to specific values.  It
483	      is expected that the ECN and/or DSCP fields will be used.

485	   NOTE: Two main categories of algorithm have been proposed: if the
486	   algorithm uses threshold-marking then all PCN-packets are marked if
487	   the current rate exceeds the PCN-lower-rate, whereas if the algorithm
488	   uses excess-rate-marking the amount marked is equal to the amount in
489	   excess of the PCN-lower-rate.  However, note that this description
490	   reflects the overall intent of the algorithm rather than its
491	   instantaneous behaviour, since the rate measured at a particular
492	   moment depends on the detailed algorithm, its implementation (eg
493	   virtual queue, token bucket...) and the traffic's variance as well as
494	   its rate (eg marking may well continue after a recent overload even
495	   after the instantaneous rate has dropped).

497	   The PCN-boundary-nodes monitor the PCN-marked packets in order to
498	   extract information about the current state of the PCN-domain.  Based
499	   on this monitoring, a decision is made about whether to admit a
500	   prospective new flow.  Exactly how the admission control decision is
501	   made will be defined in separately (at the moment the intention is
502	   that there will be one or more informational-track RFCs), but at a
503	   high level it is expected to be as follows:

505	   o  the PCN-egress-node measures (possibly as a moving average) the
506	      fraction of the PCN-traffic that is PCN-marked.  The fraction is
507	      measured for a specific ingress-egress-aggregate.  If the fraction
508	      is below a threshold value then the new flow is admitted.

510	   Note that the PCN-lower-rate is a parameter that can be configured by
511	   the operator.  It will be set lower than the traffic rate at which
512	   the link becomes congested and the node drops packets.  (Hence, by
513	   analogy with ECN we call our mechanism Pre-Congestion Notification.)

515	   Note also that the admission control decision is made for a
516	   particular ingress-egress-aggregate.  So it is quite possible for a
517	   new flow to be admitted between one pair of PCN-boundary-nodes,
518	   whilst at the same time another admission request is blocked between
519	   a different pair of PCN-boundary-nodes.

521	   Flow termination:

523	   At a high level, flow termination control works as follows.  Each
524	   PCN-node PCN-marks packets in a similar fashion to above.  An obvious
525	   approach is for the algorithm to use a second configured parameter,
526	   PCN-upper-rate, and a second header encoding ("PCN-upper-rate-
527	   marking").  However there is also a proposal to use the same rate and
528	   the same encoding.  Several approaches have been proposed to date
529	   about how to convert this information into a flow termination
530	   decision; at a high level these are as follows:

532	   o  One approach measures the rate of unmarked PCN-traffic (ie not
533	      PCN-upper-rate-marked) at the PCN-egress-node, which is the amount
534	      of PCN-traffic that can actually be supported; the PCN-ingress-
535	      node measures the rate of PCN-traffic that is destined for this
536	      specific PCN-egress-node, and hence can calculate the excess
537	      amount that should be terminated.

539	   o  Another approach instead measures the rate of PCN-upper-rate-
540	      marked traffic and calculates and selects the flows that should be
541	      terminated.

543	   o  Another approach terminates any PCN-flow with a PCN-upper-rate-
544	      marked packet.  It needs a different marking algorithm, otherwise
545	      far too much traffic would be terminated.

547	   o  Another approach uses only one sort of marking, which is based on
548	      the PCN-lower-rate, to decide not only whether to admit more PCN-
549	      flows but also whether any PCN-flows need to be terminated.  It
550	      assumes that the ratio of the (implicit) PCN-upper-rate and the
551	      PCN-lower-rate is the same on all links.  This approach measures
552	      the rate of unmarked PCN-traffic at a PCN-egress-node.  The PCN-
553	      ingress-node uses this measurement to compute the implicit PCN-
554	      upper-rate of the bottleneck link.  It then measures the rate of
555	      PCN-traffic that is destined for this specific PCN-egress-node and
556	      hence can calculate the amount that should be terminated.

558	   Since flow termination is designed for "abnormal" circumstances, it
559	   is quite likely that some PCN-nodes are congested and hence packets
560	   are being dropped and/or significantly queued.  The flow termination
561	   mechanism must bear this in mind.

563	   Note also that the termination control decision is made for a
564	   particular ingress-egress-aggregate.  So it is quite possible for
565	   PCN-flows to be terminated between one pair of PCN-boundary-nodes,
566	   whilst at the same time none are terminated between a different pair
567	   of PCN-boundary-nodes.

569	   Although designed to work together, flow admission and flow
570	   termination are independent mechanisms, and the use of one does not
571	   require or prevent the use of the other (discussed further in Section
572	   7.2).

574	   Information transport:

576	   The transport of pre-congestion information from a PCN-node to a PCN-
577	   egress-node is through PCN-markings in data packet headers, no
578	   signalling protocol messaging is needed.  However, signalling is
579	   needed to transport PCN-feedback-information between the PCN-
580	   boundary-nodes, for example to convey the fraction of PCN-marked
581	   traffic from a PCN-egress-node to the relevant PCN-ingress-node.
582	   Exactly what information needs to be transported will be described in
583	   the future PCN WG document(s) about the boundary mechanisms.  The
584	   signalling could be done by an extension of RSVP or NSIS, for
585	   instance; protocol work will be done by the relevant WG, but for
586	   example [I-D.lefaucheur-rsvp-ecn]describes the extensions needed for
587	   RSVP.

589	   The following are some high-level points about how PCN works:

591	   o  There needs to be a way for a PCN-node to distinguish PCN-traffic
592	      from non PCN-traffic.  They may be distinguished using the DSCP
593	      field and/or ECN field.  [I-D.chan-pcn-encoding-comparison]
594	      discusses further.

596	   o  The PCN mechanisms may be applied to more than one traffic class
597	      (which are distinguished by DSCP).

599	   o  There may be traffic that is more important than PCN, perhaps a
600	      particular application or an operator's control messages.  A PCN-
601	      node may dedicate capacity to such traffic or priority schedule it
602	      over PCN.  In the latter case its traffic needs to contribute to
603	      the PCN meters.

605	   o  There will be traffic less important than PCN.  For instance best
606	      effort or assured forwarding traffic.  It will be scheduled at
607	      lower priority than PCN, and use a separate queue or queues.
608	      However, a PCN-node may dedicate some capacity to lower priority
609	      traffic so that it isn't starved.

611	   o  There may be other traffic with the same priority as PCN-traffic.
612	      For instance, Expedited Forwarding sessions that are originated
613	      either without capacity admission or with traffic engineering.  In

615	      [I-D.ietf-tsvwg-admitted-realtime-dscp] the two traffic classes
616	      are called EF and EF-ADMIT.  A PCN-node could either use separate
617	      queues, or separate policers and a common queue; the draft
618	      provides some guidance when each is better, but for instance the
619	      latter is preferred when the two traffic classes are carrying the
620	      same type of application with the same jitter requirements.

622	5.  Detailed Functional architecture

624	   This section is intended to provide a systematic summary of the new
625	   functional architecture in the PCN-domain, which maps to the
626	   additional functionality required by the PCN-nodes, in addition to
627	   their normal router functions.  The section discusses the
628	   functionality needed for both flow admission control and flow
629	   termination.  It is split into:

631	   1.  functions needed at PCN-interior-nodes

633	   2.  functions needed at PCN-ingress-nodes

635	   3.  functions needed at PCN-egress-nodes

637	   4.  other functions needed for flow admission control

639	   5.  other functions needed for probing (which may be needed
640	       sometimes)

642	   6.  other functions needed for flow termination control

644	   The section then discusses some other detailed topics:

646	   1.  addressing

648	   2.  tunnelling

650	   3.  fault handling

652	5.1.  PCN-interior-node functions

654	   Each link of the PCN-domain is upgraded with the following
655	   functionality:

657	   o  Packet classify - decide whether an incoming packet is a PCN-
658	      packet or not.  Another PCN WG document will specify encoding,
659	      using the DSCPand/or ECN fields.

661	   o  PCN-meter - measure the 'amount of PCN-traffic'.  The measurement
662	      is made as an aggregate of all PCN-packets, and not per flow.

664	   o  PCN-mark - algorithms determine whether to PCN-mark PCN-packets
665	      and what packet encoding is used (as specified in another PCN WG
666	      document).

668	   The same general approach of metering and PCN-marking is performed
669	   for both flow admission control and flow termination, however the
670	   algorithms and encoding may be different.

672	   These functions are needed for each link of the PCN-region.  They are
673	   therefore needed on all links of PCN-interior-nodes, and on the links
674	   of PCN-boundary-nodes that are internal to the PCN-domain.  There may
675	   be more than one PCN-meter and marker installed at a given link, eg
676	   one for admission and one for termination.

678	5.2.  PCN-ingress-node functions

680	   Each ingress link of the PCN-domain is upgraded with the following
681	   functionality:

683	   o  Packet classify - decide whether an incoming packet is part of a
684	      previously admitted microflow, by using a filter spec (eg DSCP,
685	      source and destination addresses and port numbers)

687	   o  Police - police, by dropping or re-marking with a non-PCN DSCP,
688	      any packets received with a DSCP demanding PCN transport that do
689	      not belong to an admitted flow.  Similarly, police packets that
690	      are part of a previously admitted microflow, to check that the
691	      microflow keeps to the agreed rate or flowspec (eg RFC1633
692	      [RFC1633] and NSIS equivalent).

694	   o  PCN-colour - set the DSCP field or DSCP and ECN fields to the
695	      appropriate value(s) for a PCN-packet.  The draft about PCN-
696	      encoding will discuss further.

698	   o  PCN-meter - make "measurements of PCN-traffic".  Some approaches
699	      to flow termination require the PCN-ingress-node to measure the
700	      (aggregate) rate of PCN-traffic towards a particular PCN-egress-
701	      node.

703	   The first two are policing functions, needed to make sure that PCN-
704	   packets let into the PCN-domain belong to a flow that's been admitted
705	   (and probably also to ensure that the flow doesn't go at a faster
706	   rate than allowed by its service level agreement).  The filter spec
707	   will for example come from the flow request message (outside scope of
708	   PCN WG, see [I-D.briscoe-tsvwg-cl-architecture] for an example using
709	   RSVP).  PCN-colouring allows the rest of the PCN-domain to recognise
710	   PCN-packets.

712	5.3.  PCN-egress-node functions

714	   Each egress link of the PCN-domain is upgraded with the following
715	   functionality:

717	   o  Packet classify - determine which PCN-ingress-node a PCN-packet
718	      has come from.

720	   o  PCN-meter - make "measurements of PCN-traffic".  The
721	      measurement(s) is made as an aggregate (ie not per flow) of all
722	      PCN-packets from a particular PCN-ingress-node.

724	   o  PCN-colour - for PCN-packets, set the DSCP field or DSCP and ECN
725	      fields to the appropriate value(s) for use outside the PCN-domain.

727	   Another PCN WG document, about boundary mechanisms, will describe
728	   what the "measurements of PCN-traffic" are.  This depends on whether
729	   the measurement is targeted at admission control or flow termination.
730	   It also depends on what encoding and PCN-marking algorithms are
731	   specified by the PCN WG.

733	5.4.  Admission control functions

735	   Specific admission control functions can be performed at a PCN-
736	   boundary-node (PCN-ingress-node or PCN-egress-node) or at a
737	   centralised node, but not at normal PCN-interior-nodes.  The
738	   functions are:

740	   o  Make decision about admission - compare the required "measurements
741	      of PCN-traffic" (output of the PCN-egress-node's PCN-meter
742	      function) with some reference level, and hence decide whether to
743	      admit the potential new PCN-flow.  As well as the PCN
744	      measurements, the decision takes account of policy and application
745	      layer requirements.

747	   o  Communicate decision about admission - signal the decision to the
748	      node making the admission control request (which may be outside
749	      the PCN-region), and to the policer (PCN-ingress-node function)

751	   There are various possibilities for how the functionality can be
752	   distributed (we assume the operator would configure which is used):

754	   o  The decision is made at the PCN-egress-node and signalled to the
755	      PCN-ingress-node

757	   o  The decision is made at the PCN-ingress-node, which requires that
758	      the PCN-egress-node signals to the PCN-ingress-node the fraction
759	      of PCN-traffic that is PCN-marked (or whatever the PCN WG agrees
760	      as the required "measurements of PCN-traffic").

762	   o  The decision is made at a centralised node, which requires that
763	      the PCN-egress-node signals its measurements to the centralised
764	      node, and that the centralised node signals to the PCN-ingress-
765	      node about the decision about admission control.  It would be
766	      possible for the centralised node to be one of the PCN-boundary-
767	      nodes, when clearly the signalling would sometimes be replaced by
768	      a message internal to the node.

770	5.5.  Probing functions

772	   Probing functions are optional, and can be used for admission
773	   control.  A PCN-ingress-node generates and sends probe packets in
774	   order to test the pre-congestion level.  Probing is useful or even
775	   essential under the following conditions:

777	   o  when an ingress-egress-aggregate carries no traffic (or too little
778	      traffic for the PCN-egress-node to accurately make the
779	      "measurements of PCN-traffic" that are required for an admission
780	      decision).  It may be that the traffic levels on other ingress-
781	      egress-aggregates are so high that a new flow shouldn't be
782	      admitted on the 'empty' ingress-egress-aggregate.  Probing is
783	      useful to check this.

785	   o  in the presence of multipath routing (ECMP) between the PCN-
786	      boundary-nodes, when some paths are pre-congested there may be
787	      other paths which aren't pre-congested.  Probing is useful to
788	      determine whether the new flow would follow a path that isn't pre-
789	      congested and hence can be admitted.

791	   Probe packets may be simple data addressed to the PCN-egress-node and
792	   require no protocol standardisation, although there will be best
793	   practice for their number, size and rate.  There are two
794	   possibilities for how probing is triggered:

796	   o  the PCN-egress-node requests (signals) the PCN-ingress-node to
797	      generate probe traffic

799	   o  if the PCN-ingress-node knows which PCN-egress-node is associated
800	      with the destination address in the admission request, then the
801	      PCN-ingress-node could know it has no reservation with that PCN-
802	      egress-node and unilaterally start probing.

804	   The probing functions are:

806	   o  Make decision that probing is needed

808	   o  (if required) Communicate the request that probing is needed - the
809	      PCN-egress-node signals to the PCN-ingress-node that probe traffic
810	      is needed

812	   o  Generate probe traffic - the PCN-ingress-node generates the probe
813	      traffic.  The appropriate number (or rate) of probe packets will
814	      depend on the PCN-marking algorithm; for example an excess-rate-
815	      marking algorithm generates fewer PCN-marks than a threshold-
816	      marking algorithm.

818	   o  Forward probe packets - as far as PCN-interior-nodes are
819	      concerned, probe packets must be handled the same as (ordinary
820	      data) PCN-packets.

822	   o  Consume probe packets - the PCN-egress-node consumes probe packets
823	      to ensure that they don't travel beyond the PCN-domain.

825	5.6.  Flow termination functions

827	   Specific termination control functions can be performed at a PCN-
828	   boundary-node (PCN-ingress-node or PCN-egress-node) or at a
829	   centralised node, but not at normal PCN-interior-nodes.  There are
830	   various possibilities for how the functionality can be distributed,
831	   similar to those discussed above in the Admission control section;
832	   the flow termination decision could be made at the PCN-ingress-node,
833	   the PCN-egress-node or at some centralised node.  The functions are:

835	   o  PCN-meter at PCN-egress-node - (as described in Section 5.3) make
836	      "measurements of PCN-traffic" from a particular PCN-ingress-node.

838	   o  (if required) PCN-meter at PCN-ingress-node - make "measurements
839	      of PCN-traffic" being sent towards a particular PCN-egress-node;
840	      again, this is done for the ingress-egress-aggregate and not per
841	      flow.

843	   o  (if required) Communicate "measurements of PCN-traffic" to the
844	      node that makes the flow termination decision.  For example, if
845	      the PCN-ingress-node makes the decision then communicate the PCN-
846	      egress-node's measurements to it (as in
847	      [I-D.briscoe-tsvwg-cl-architecture]).

849	   o  Make decision about flow termination - use the "measurements of
850	      PCN-traffic" to decide which PCN-flow or PCN-flows to terminate.
851	      The decision takes account of policy and application layer
852	      requirements.

854	   o  Communicate decision about flow termination - signal the decision
855	      to the node that is able to terminate the flow (which may be
856	      outside the PCN-region), and to the policer (PCN-ingress-node
857	      function)

859	   One particular proposal, [I-D.charny-pcn-single-marking], for PCN-
860	   marking and performing flow admission and termination would require a
861	   global parameter to be defined on all PCN-boundary-nodes in the PCN-
862	   domain.  [I-D.charny-pcn-single-marking] discusses in full the impact
863	   of this particular proposal on the operation of PCN.

865	5.7.  Addressing

867	   PCN-nodes may need to know the address of other PCN-nodes:

869	   o  in all cases PCN-interior-nodes don't need to know the address of
870	      any other PCN-nodes, except their next hop neighbours

872	   o  in the cases of admission or termination decision by a PCN-
873	      boundary-node, the PCN-egress-node needs to know the address of
874	      the PCN-ingress-node associated with a flow, at a minimum so that
875	      the PCN-ingress-node can be informed to enforce the admission
876	      decision through policing.  The addressing information can be
877	      gathered from signalling, for example as described for RSVP in
878	      [I-D.lefaucheur-rsvp-ecn].  Alternatively, if PCN-traffic is
879	      always tunnelled across the PCN-domain, then the PCN-ingress-
880	      node's address is simply the source address of the outer packet
881	      header.

883	   o  in the cases of admission or termination decision by a central
884	      control node, the PCN-egress-node needs to be configured with the
885	      address of the centralised node.  In addition, depending on the
886	      exact deployment scenario and its signalling, the centralised node
887	      may need to know the addresses of the PCN-ingress-node and PCN-
888	      egress-node, and the PCN-egress-node know the address of the PCN-
889	      ingress-node.  NOTE: Consideration of the centralised case is out
890	      of scope of the initial PCN WG Charter.

892	5.8.  Tunnelling

894	   It is possible that tunnels terminate at a PCN-node.  It is important
895	   that any PCN-marking is preserved after decapsulation, so that it is
896	   still seen by the PCN-egress-node.  To ensure this, on decapsulation
897	   the following rules are applied:

899	   o  the PCN-marking state of the inner and outer headers are compared
900	   o  if the inner header's marking state is more severe then it is
901	      preserved

903	   o  if the outer header's marking state is more severe then it is
904	      copied onto the inner header

906	   o  NB the order of increasing severity is: unmarked; PCN-marking with
907	      first encoding (ie associated with the PCN-lower-rate); PCN-
908	      marking with second encoding (ie associated with the PCN-upper-
909	      rate)

911	   Similarly, if encapsulation is done within the PCN-domain, then the
912	   following rule is applied:

914	   o  any PCN-marking is copied into the outer header

916	   Tunnelling considerations also depend on which header bits the PCN WG
917	   decides to use.  If the ECN bits are used then
918	   [I-D.briscoe-tsvwg-ecn-tunnel] applies; the rules above conform to
919	   its spirit.  If the DSCP field is used then [RFC2983] needs to be
920	   considered carefully.

922	   An operator may wish to tunnel PCN-traffic from PCN-ingress-nodes to
923	   PCN-egress-nodes, in which case the rules above aren't needed.  The
924	   potential reasons for doing such tunnelling are: the PCN-egress-node
925	   then automatically knows the address of the relevant PCN-ingress-node
926	   for a flow; even if ECMP is running, all PCN-packets on a particular
927	   ingress-egress-aggregate follow the same path.  But it also has
928	   drawbacks: additional overhead in terms of bandwidth and processing;
929	   and the effective elimination of ECMP as a load balancing mechanism.

931	5.9.  Fault handling

933	   If a PCN-interior-node fails (or one of its links), then lower layer
934	   protection mechanisms or the regular IP routing protocol will
935	   eventually re-route round it.  If the new route can carry all the
936	   admitted traffic, flows will gracefully continue.  If instead this
937	   causes early warning of pre-congestion on the new route, then
938	   admission control based on pre-congestion notification will ensure
939	   new flows will not be admitted until enough existing flows have
940	   departed.  Finally re-routing may result in heavy (pre-)congestion,
941	   when the flow termination mechanism will kick in.

943	   If a PCN-boundary-node fails then we would like the regular QoS
944	   signalling protocol to take care of things.  As an example
945	   [I-D.briscoe-tsvwg-cl-architecture] considers what happens if RSVP is
946	   the QoS signalling protocol.  The details for a specific signalling
947	   protocol are out of scope of the PCN WG, however there is a WG
948	   Milestone on generic "Requirements for signalling".

950	6.  Design goals and challenges

952	   Prior work on PCN and similar mechanisms has thrown up a number of
953	   considerations about PCN's design goals (things PCN should be good
954	   at) and some issues that have been hard to solve in a fully
955	   satisfactory manner.  Taken as a whole it represents a list of trade-
956	   offs (it's unlikely that they can all be 100% achieved) and perhaps
957	   as evaluation criteria to help an operator (or the IETF) decide
958	   between options.

960	   The following are key design goals for PCN (based on
961	   [I-D.chan-pcn-problem-statement]):

963	   o  The PCN-enabled packet forwarding network should be simple,
964	      scalable and robust

966	   o  Compatibility with other traffic (i.e. a proposed solution should
967	      work well when non-PCN traffic is also present in the network)

969	   o  Support of different types of real-time traffic (eg should work
970	      well with CBR and VBR voice and video sources treated together)

972	   o  Reaction time of the mechanisms should be commensurate with the
973	      desired application-level requirements (e.g. a termination
974	      mechanism needs to terminate flows before significant QoS issues
975	      are experienced by real-time traffic, and before most users hang
976	      up)

978	   o  Compatibility with different precedence levels of real-time
979	      applications (e.g. preferential treatment of higher precedence
980	      calls over lower precedence calls, [ITU-MLPP].

982	   The following are open issues.  They are taken from
983	   [I-D.briscoe-tsvwg-cl-architecture] which also describes some
984	   possible solutions (potential solutions are out of scope for this
985	   document).  Note that some may be considered unimportant in general
986	   or in specific deployment scenarios.

988	   o  ECMP (Equal Cost Multi-Path) Routing: The level of pre-congestion
989	      is measured on a specific ingress-egress-aggregate.  However, if
990	      the PCN-domain runs ECMP, then traffic on this ingress-egress-
991	      aggregate may follow several different paths - some of the paths
992	      could be pre-congested whilst others are not.  There are two
993	      potential problems:

995	      1.  over-admission: a new flow is admitted (because the pre-
996	          congestion level measured by the PCN-egress-node is
997	          sufficiently diluted by unmarked packets from non-congested
998	          paths that a new flow is admitted), but its packets travel
999	          through a pre-congested PCN-node

1001	      2.  ineffective termination: flows are terminated, however their
1002	          path doesn't travel through the (pre-)congested router(s).

1004	      The overall PCN solution for flow termination must solve the
1005	      second problem, since flow termination is a 'last resort'.  For
1006	      flow admission, the risk of slight over-admission may be
1007	      acceptable (particularly with flow termination as a fall-back), at
1008	      least for some operators.

1010	   o  Bi-Directional Sessions: Many applications have bi-directional
1011	      sessions - hence there are two flows that should be admitted (or
1012	      terminated) as a pair - for instance a bi-directional voice call
1013	      only makes sense if flows in both directions are admitted.
1014	      However, PCN's mechanisms concern admission and termination of a
1015	      single flow, and coordination of the decision for both flows is a
1016	      matter for the signalling protocol and out of scope of PCN.  One
1017	      possible example would use SIP pre-conditions; there are others.

1019	   o  Global Coordination: PCN makes its admission decision based on
1020	      PCN-markings on a particular ingress-egress-aggregate.  Decisions
1021	      about flows through a different ingress-egress-aggregate are made
1022	      independently.  However, one can imagine network topologies and
1023	      traffic matrices where from a global perspective it would be
1024	      better to make a coordinated decision across all the ingress-
1025	      egress-aggregates for the whole PCN-domain.  For example, to block
1026	      (or even terminate) flows on one ingress-egress-aggregate so that
1027	      more important flows through a different ingress-egress-aggregate
1028	      could be admitted.  Mechanisms to solve these problems may well be
1029	      out of scope.

1031	   o  Aggregate Traffic Characteristics: Even when the number of flows
1032	      is stable, the traffic level through the PCN-domain will vary
1033	      because the sources vary their traffic rates.  PCN works best when
1034	      there's not too much variability in the total traffic level at a
1035	      PCN-node's interface (ie in the aggregate traffic from all
1036	      sources).  Too much variation means that a node may (at one
1037	      moment) not be doing any PCN-marking and then (at another moment)
1038	      drop packets because it's overloaded.  This makes it hard to tune
1039	      the admission control scheme to stop admitting new flows at the
1040	      right time.

1042	   o  Flash crowds and Speed of Reaction: PCN is a measurement-based
1043	      mechanism and so has a limited speed of reaction.  For example,
1044	      potentially if a big burst of admission requests occurs in a very
1045	      short space of time (eg prompted by a televote), they could all
1046	      get admitted before enough PCN-marks are seen to block new flows.
1047	      In other words, any additional load offered within the reaction
1048	      time of the mechanism mustn't move the PCN-domain directly from no
1049	      congestion to overload.  This 'vulnerability period' may impact at
1050	      the signalling level, for instance QoS requests shouldn't be
1051	      handled any faster than the vulnerability period.

1053	   o  Compatibility of PCN-encoding with ECN-encoding.  This issue will
1054	      be considered further in [I-D.chan-pcn-encoding-comparison].

1056	7.  Operations and Management

1058	   EDITOR'S NOTE: A re-write of this section is planned; some of the
1059	   sub-sections are very short!  The PCN WG Charter says that the
1060	   architecture document should include security, manageability and
1061	   operational considerations.

1063	   This Section considers operations and management issues, under the
1064	   FCAPS headings: OAM of Faults, Configuration, Accounting, Performance
1065	   and Security.

1067	7.1.  Fault OAM

1069	   Fault OAM is about how to tell the management system (or manual
1070	   operator) that the system has recovered (or not) from a failure.

1072	   Faults include node or link failures, a wrongly configured address in
1073	   a node, a wrong address given in a signalling protocol, a wrongly
1074	   configured parameter in a queueing algorithm, and so on.

1076	7.2.  Configuration OAM

1078	   Perhaps the most important consideration here is that the level of
1079	   detail of the standardisation affects what can be configured.  We
1080	   would like different implementations and configurations (eg choice of
1081	   parameters) that are compliant with the PCN standard to work together
1082	   successfully.

1084	   Obvious configuration parameters are the PCN-lower-rate and PCN-
1085	   upper-rate.  A larger PCN-lower-rate enables more PCN-traffic to be
1086	   admitted on a link, hence improving capacity utilisation.  A PCN-
1087	   upper-rate set further above the PCN-lower-rate allows greater
1088	   increases in traffic (whether due to natural fluctuations or some
1089	   unexpected event) before any flows are terminated, ie minimises the
1090	   chances of unnecessarily triggering the termination mechanism.  A
1091	   greater gap, between the maximum rate at which PCN-traffic can be
1092	   forwarded on a link and the PCN-lower-rate and PCN-upper-rate,
1093	   increases the 'safety margin' - which can cover unexpected surges in
1094	   traffic due to a re-routing event for instance.  For instance an
1095	   operator may want to design their network so that it can cope with a
1096	   failure of any single PCN-node without terminating any flows.
1097	   Setting the rates will therefore depend on things like: the
1098	   operator's requirements, the link's capacity, the typical number of
1099	   flows and perhaps their traffic characteristics, and so on.

1101	   Other configurable parameters concern the PCN-boundary-nodes.  For
1102	   example, the amount of PCN-marked traffic above which new flows are
1103	   blocked.

1105	   Another configuration choice is the distribution of the functions
1106	   concerning flows admission and termination, given in Section 5.4 and
1107	   5.6, and which could potentially be under the control of a
1108	   configuration parameter.

1110	   Another configuration decision is whether to operate both the
1111	   admission control and termination mechanisms.  Although we suggest
1112	   that an operator uses both, this isn't required and some operators
1113	   may want to implement only one.  For example, an operator could use
1114	   just admission control, solving heavy congestion (caused by re-
1115	   routing) by 'just waiting' - as sessions end, existing microflows
1116	   naturally depart from the system over time, and the admission control
1117	   mechanism will prevent admission of new microflows that use the
1118	   affected links.  So the PCN-domain will naturally return to normal
1119	   operation, but with reduced capacity.  The drawback of this approach
1120	   would be that until PCN-flows naturally depart to relieve the
1121	   congestion, all PCN-flows as well as lower priority services will be
1122	   adversely affected.  On the other hand, an operator could just rely
1123	   for admission control on statically provisioned capacity per PCN-
1124	   ingress-node (regardless of the PCN-egress-node of a flow), as is
1125	   typical in the hose model of the DiffServ architecture [RFC2475].
1126	   Such traffic conditioning agreements can lead to focused overload:
1127	   many flows happen to focus on a particular link and then all flows
1128	   through the congested link fail catastrophically.  The flow
1129	   termination mechanism could then be used to counteract such a
1130	   problem.

1132	   A different possibility is to configure only the PCN-lower-rate and
1133	   hence only do one type of PCN-marking, but generate admission and
1134	   flow termination responses from different levels of marking.  This is
1135	   suggested in [I-D.charny-pcn-single-marking] which gives some of the
1136	   pros and cons of this approach.

1138	   Another PCN WG document will specify PCN-marking, in particular how
1139	   many PCN-packets get PCN-marked according to what measure of PCN-
1140	   traffic.  For instance an algorithm relating the current rate of PCN-
1141	   traffic to the probability of admission-marking a packet.  Depending
1142	   on how tightly it is decided to specify this, there are potentially
1143	   quite a few configuration choices, for instance:

1145	   o  does the probability go from 0% at one rate of PCN-traffic (the
1146	      PCN-lower-rate) to 100% at a slightly higher rate (ie threshold-
1147	      marking), or does it 'ramp up' gradually (as in RED)?  Does the
1148	      standard allow both?

1150	   o  how is the current rate of PCN-traffic measured?  Rate cannot be
1151	      measured instantaneously, so how is this smoothed?  A sliding
1152	      window or exponentially weighted moving average?

1154	   o  is the PCN-lower-rate a fixed parameter?  An idea raised in
1155	      [Songhurst] is that the PCN-lower-rate on each router should
1156	      depend on the current amount of non-PCN-traffic; the aim is that
1157	      resource allocation reflects the traffic mix - for instance more
1158	      PCN-traffic could be admitted if the fraction of PCN-traffic was
1159	      higher.  Is this allowed?

1161	   Another question is whether there are any configuration parameters
1162	   that have to be set once to 'globally' control the whole PCN-domain
1163	   (as required by some proposals).  This may affect operational
1164	   complexity and the chances of interoperability problems between kit
1165	   from different vendors.

1167	7.3.  Accounting OAM

1169	   Accounting at the flow level will have to record instances of flow
1170	   admission, rejection and termination, but accounting itself is
1171	   outside the scope of PCN.  The ability to enable or disable flow
1172	   accounting for specific classes of flow and to specify retrieval of
1173	   accounting records in real time for specified classes of flow is a
1174	   general requirement not specific to PCN that may, however, find
1175	   specific use when diagnosing faults affecting PCN operation.

1177	7.4.  Performance OAM

1179	   Performance OAM is about monitoring performance at run-time.  There
1180	   are a wide variety of performance metrics that it may be worth
1181	   collecting at PCN-ingress-nodes, PCN-egress-nodes and PCN-interior-
1182	   nodes.  A detailed list of metrics is not part of this architecture
1183	   document, but the sorts of things would be:

1185	   o  can the operator identify 'hot spots' in the network (links which
1186	      most often do PCN-marking)?  This would help them plan to install
1187	      extra capacity where it is most needed.

1189	   o  what is the rate at which flows are admitted and terminated (for
1190	      each pair of PCN-boundary-nodes)?  Such information would be
1191	      useful for fault management, networking planning and service level
1192	      monitoring.

1194	7.5.  Security OAM

1196	   Security OAM is finding out about security breaches or near-misses at
1197	   run-time.

1199	8.  IANA Considerations

1201	   This memo includes no request to IANA.

1203	9.  Security considerations

1205	   Security considerations essentially come from the Trust Assumption
1206	   (Section 3.1), ie that all PCN-nodes are PCN-enabled and trust each
1207	   other for truthful PCN-marking and transport.  PCN splits
1208	   functionality between PCN-interior-nodes and PCN-boundary-nodes, and
1209	   the security considerations are somewhat different for each, mainly
1210	   because PCN-boundary-nodes are flow-aware and PCN-interior-nodes are
1211	   not.

1213	   o  because the PCN-boundary-nodes are flow-aware, they are trusted to
1214	      use that awareness correctly.  The degree of trust required
1215	      depends on the kinds of decisions they have to make and the kinds
1216	      of information they need to make them.  For example when the PCN-
1217	      boundary-node needs to know the contents of the sessions for
1218	      making the admission and termination decisions (perhaps based on
1219	      the MLPP precedence), or when the contents are highly classified,
1220	      then the security requirements for the PCN-boundary-nodes involved
1221	      will also need to be high.

1223	   o  the PCN-ingress-nodes police packets to ensure a flow sticks
1224	      within its agreed limit, and to ensure that only flows which have
1225	      been admitted contribute PCN-traffic into the PCN-domain.  The
1226	      policer must drop (or perhaps re-mark to a different DSCP) any
1227	      PCN-packets received that are outside this remit.  This is similar
1228	      to the existing IntServ behaviour.  Between them the PCN-boundary-
1229	      nodes must encircle the PCN-domain, otherwise PCN-packets could
1230	      enter the PCN-domain without being subject to admission control,
1231	      which would potentially destroy the QoS of existing flows.

1233	   o  PCN-interior-nodes aren't flow-aware.  This prevents some security
1234	      attacks where an attacker targets specific flows in the data plane
1235	      - for instance for DoS or eavesdropping.

1237	   o  PCN-marking by the PCN-interior-nodes along the packet forwarding
1238	      path needs to be trusted, because the PCN-boundary-nodes rely on
1239	      this information.  For instance a rogue PCN-interior-node could
1240	      PCN-mark all packets so that no flows were admitted.  Another
1241	      possibility is that it doesn't PCN-mark any packets, even when
1242	      it's pre-congested.  More subtly, the rogue PCN-interior-node
1243	      could perform these attacks selectively on particular flows, or it
1244	      could PCN-mark the correct fraction overall, but carefully choose
1245	      which flows it marked.

1247	   o  the PCN-boundary-nodes should be able to deal with DoS attacks and
1248	      state exhaustion attacks based on fast changes in per flow
1249	      signalling.

1251	   o  the signalling between the PCN-boundary-nodes (and possibly a
1252	      central control node) must be protected from attacks.  For example
1253	      the recipient needs to validate that the message is indeed from
1254	      the node that claims to have sent it.  Possible measures include
1255	      digest authentication and protection against replay and man-in-
1256	      the-middle attacks.  For the specific protocol RSVP, hop-by-hop
1257	      authentication is in [RFC2747], and
1258	      [I-D.behringer-tsvwg-rsvp-security-groupkeying] may also be
1259	      useful; for a generic signalling protocol the PCN WG document on
1260	      "Requirements for signalling" will describe the requirements in
1261	      more detail.

1263	10.  Conclusions

1265	   {ToDo:}

1267	11.  Acknowledgements

1269	   This document is a revised version of [I-D.eardley-pcn-architecture].
1270	   Its authors were: P. Eardley, J. Babiarz, K. Chan, A. Charny, R.
1271	   Geib, G. Karagiannis, M. Menth, T. Tsou.  They are therefore
1272	   contributors to this document.

1274	   Thanks to those who've made comments on
1275	   [I-D.eardley-pcn-architecture]: Bob Briscoe, Michael Menth, Lars
1276	   Eggert, Steven Blake, Tina Tsou, Tom Taylor, Ruediger Geib, Joe
1277	   Babiarz, Anna Charny, Joachim Charzinski, Georgios Karagiannis.

1279	   This document is the result of discussions in the PCN WG and
1280	   forerunner activity in the TSVWG.  A number of previous drafts were
1281	   presented to TSVWG: [I-D.chan-pcn-problem-statement],
1282	   [I-D.briscoe-tsvwg-cl-architecture], [I-D.briscoe-tsvwg-cl-phb],
1283	   [I-D.charny-pcn-single-marking], [I-D.babiarz-pcn-sip-cap],
1284	   [I-D.lefaucheur-rsvp-ecn].  The authors of them were: B, Briscoe, P.
1285	   Eardley, D. Songhurst, F. Le Faucheur, A. Charny, J. Babiarz, K.
1286	   Chan, S. Dudley, G. Karagiannis, A. Bader, L. Westberg, J. Zhang, V.
1287	   Liatsos, X-G.  Liu.

1289	12.  Comments Solicited

1291	   Comments and questions are encouraged and very welcome.  They can be
1292	   addressed to the IETF PCN working group mailing list <pcn@ietf.org>.

1294	13.  References

1296	13.1.  Normative References

1298	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1299	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1301	13.2.  Informative References

1303	   [I-D.briscoe-tsvwg-cl-architecture]
1304	              Briscoe, B., "An edge-to-edge Deployment Model for Pre-
1305	              Congestion Notification: Admission  Control over a
1306	              DiffServ Region", draft-briscoe-tsvwg-cl-architecture-04
1307	              (work in progress), October 2006.

1309	   [I-D.briscoe-tsvwg-cl-phb]
1310	              Briscoe, B., "Pre-Congestion Notification marking",
1311	              draft-briscoe-tsvwg-cl-phb-03 (work in progress),
1312	              October 2006.

1314	   [I-D.charny-pcn-single-marking]
1315	              Charny, A., "Pre-Congestion Notification Using Single
1316	              Marking for Admission and  Termination",
1317	              draft-charny-pcn-single-marking-02 (work in progress),
1318	              July 2007.

1320	   [I-D.ietf-tsvwg-admitted-realtime-dscp]
1321	              Baker, F., "DSCPs for Capacity-Admitted Traffic",
1322	              draft-ietf-tsvwg-admitted-realtime-dscp-01 (work in
1323	              progress), March 2007.

1325	   [I-D.babiarz-pcn-sip-cap]
1326	              Babiarz, J., "SIP Controlled Admission and Preemption",
1327	              draft-babiarz-pcn-sip-cap-00 (work in progress),
1328	              October 2006.

1330	   [I-D.ietf-tsvwg-ecn-mpls]
1331	              Davie, B., "Explicit Congestion Marking in MPLS",
1332	              draft-ietf-tsvwg-ecn-mpls-01 (work in progress),
1333	              June 2007.

1335	   [I-D.lefaucheur-rsvp-ecn]
1336	              Faucheur, F., "RSVP Extensions for Admission Control over
1337	              Diffserv using Pre-congestion  Notification (PCN)",
1338	              draft-lefaucheur-rsvp-ecn-01 (work in progress),
1339	              June 2006.

1341	   [I-D.chan-pcn-problem-statement]
1342	              Chan, K., "Pre-Congestion Notification Problem Statement",
1343	              draft-chan-pcn-problem-statement-01 (work in progress),
1344	              October 2006.

1346	   [I-D.ietf-pwe3-congestion-frmwk]
1347	              Bryant, S., "Pseudowire Congestion Control Framework",
1348	              draft-ietf-pwe3-congestion-frmwk-00 (work in progress),
1349	              February 2007.

1351	   [I-D.briscoe-tsvwg-ecn-tunnel]
1352	              "", <http://www.watersprings.org/pub/id/
1353	              briscoe-tsvwg-ecn-tunnel-00.txt>.

1355	   [I-D.briscoe-re-pcn-border-cheat]
1356	              "", <http://www.watersprings.org/pub/id/
1357	              briscoe-re-pcn-border-cheat-00.txt>.

1359	   [I-D.behringer-tsvwg-rsvp-security-groupkeying]
1360	              "", <http://www.watersprings.org/pub/id/
1361	              behringer-tsvwg-rsvp-security-groupkeying-00.txt>.

1363	   [I-D.eardley-pcn-architecture]
1364	              "", <http://www.watersprings.org/pub/id/
1365	              draft-eardley-pcn-architecture-00.txt>.

1367	   [I-D.chan-pcn-encoding-comparison]
1368	              "", <http://www.watersprings.org/pub/id/
1369	              draft-chan-pcn-encoding-comparison-00.txt>.

1371	   [RFC4774]  Floyd, S., "Specifying Alternate Semantics for the
1372	              Explicit Congestion Notification (ECN) Field", BCP 124,
1373	              RFC 4774, November 2006.

1375	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
1376	              and W. Weiss, "An Architecture for Differentiated
1377	              Services", RFC 2475, December 1998.

1379	   [RFC3246]  Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
1380	              J., Courtney, W., Davari, S., Firoiu, V., and D.
1381	              Stiliadis, "An Expedited Forwarding PHB (Per-Hop
1382	              Behavior)", RFC 3246, March 2002.

1384	   [RFC4594]  Babiarz, J., Chan, K., and F. Baker, "Configuration
1385	              Guidelines for DiffServ Service Classes", RFC 4594,
1386	              August 2006.

1388	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1389	              of Explicit Congestion Notification (ECN) to IP",
1390	              RFC 3168, September 2001.

1392	   [RFC2211]  Wroclawski, J., "Specification of the Controlled-Load
1393	              Network Element Service", RFC 2211, September 1997.

1395	   [RFC2998]  Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L.,
1396	              Speer, M., Braden, R., Davie, B., Wroclawski, J., and E.
1397	              Felstaine, "A Framework for Integrated Services Operation
1398	              over Diffserv Networks", RFC 2998, November 2000.

1400	   [RFC3270]  Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen,
1401	              P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi-
1402	              Protocol Label Switching (MPLS) Support of Differentiated
1403	              Services", RFC 3270, May 2002.

1405	   [RFC1633]  Braden, B., Clark, D., and S. Shenker, "Integrated
1406	              Services in the Internet Architecture: an Overview",
1407	              RFC 1633, June 1994.

1409	   [RFC2983]  Black, D., "Differentiated Services and Tunnels",
1410	              RFC 2983, October 2000.

1412	   [RFC2747]  Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic
1413	              Authentication", RFC 2747, January 2000.

1415	   [ITU-MLPP]
1416	              "Multilevel Precedence and Pre-emption Service (MLPP)",
1417	              ITU-T Recommendation I.255.3, 1990.

1419	   [Iyer]     "An approach to alleviate link overload as observed on an
1420	              IP backbone", IEEE INFOCOM , 2003,
1421	              <http://www.ieee-infocom.org/2003/papers/10_04.pdf>.

1423	   [Shenker]  "Fundamental design issues for the future Internet", IEEE
1424	              Journal on selected areas in communications pp 1176 -
1425	              1188, Vol 13 (7), 1995.

1427	   [Songhurst]
1428	              "Guaranteed QoS Synthesis for Admission Control with
1429	              Shared Capacity", BT Technical Report TR-CXR9-2006-001,
1430	              Feburary 2006, <http://www.cs.ucl.ac.uk/staff/B.Briscoe/
1431	              projects/ipe2eqos/gqs/papers/GQS_shared_tr.pdf>.

1433	Author's Address

1435	   Philip Eardley
1436	   BT
1437	   B54/77, Sirius House Adastral Park Martlesham Heath
1438	   Ipswich, Suffolk  IP5 3RE
1439	   United Kingdom

1441	   Email: philip.eardley@bt.com

1443	Full Copyright Statement

1445	   Copyright (C) The IETF Trust (2007).

1447	   This document is subject to the rights, licenses and restrictions
1448	   contained in BCP 78, and except as set forth therein, the authors
1449	   retain all their rights.

1451	   This document and the information contained herein are provided on an
1452	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1453	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
1454	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
1455	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1456	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1457	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1459	Intellectual Property

1461	   The IETF takes no position regarding the validity or scope of any
1462	   Intellectual Property Rights or other rights that might be claimed to
1463	   pertain to the implementation or use of the technology described in
1464	   this document or the extent to which any license under such rights
1465	   might or might not be available; nor does it represent that it has
1466	   made any independent effort to identify any such rights.  Information
1467	   on the procedures with respect to rights in RFC documents can be
1468	   found in BCP 78 and BCP 79.

1470	   Copies of IPR disclosures made to the IETF Secretariat and any
1471	   assurances of licenses to be made available, or the result of an
1472	   attempt made to obtain a general license or permission for the use of
1473	   such proprietary rights by implementers or users of this
1474	   specification can be obtained from the IETF on-line IPR repository at
1475	   http://www.ietf.org/ipr.

1477	   The IETF invites any interested party to bring to its attention any
1478	   copyrights, patents or patent applications, or other proprietary
1479	   rights that may cover technology that may be required to implement
1480	   this standard.  Please address the information to the IETF at
1481	   ietf-ipr@ietf.org.

1483	Acknowledgment

1485	   Funding for the RFC Editor function is provided by the IETF
1486	   Administrative Support Activity (IASA).