idnits 2.17.1 draft-ietf-pcn-architecture-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? -- It seems you're using the 'non-IETF stream' Licence Notice instead Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 14, 2009) is 5580 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion and Pre-Congestion Philip. Eardley (Editor) 3 Notification Working Group BT 4 Internet-Draft January 14, 2009 5 Intended status: Informational 6 Expires: July 18, 2009 8 Pre-Congestion Notification (PCN) Architecture 9 draft-ietf-pcn-architecture-09 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on July 18, 2009. 34 Copyright Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. 46 Abstract 48 This document describes a general architecture for flow admission and 49 termination based on pre-congestion information in order to protect 50 the quality of service of established inelastic flows within a single 51 DiffServ domain. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 3. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 58 4. Deployment scenarios . . . . . . . . . . . . . . . . . . . . . 9 59 5. Assumptions and constraints on scope . . . . . . . . . . . . . 12 60 5.1. Assumption 1: Trust and support of PCN - controlled 61 environment . . . . . . . . . . . . . . . . . . . . . . . 12 62 5.2. Assumption 2: Real-time applications . . . . . . . . . . . 13 63 5.3. Assumption 3: Many flows and additional load . . . . . . . 13 64 5.4. Assumption 4: Emergency use out of scope . . . . . . . . . 14 65 6. High-level functional architecture . . . . . . . . . . . . . . 14 66 6.1. Flow admission . . . . . . . . . . . . . . . . . . . . . . 16 67 6.2. Flow termination . . . . . . . . . . . . . . . . . . . . . 16 68 6.3. Flow admission and/or flow termination when there are 69 only two PCN encoding states . . . . . . . . . . . . . . . 17 70 6.4. Information transport . . . . . . . . . . . . . . . . . . 18 71 6.5. PCN-traffic . . . . . . . . . . . . . . . . . . . . . . . 19 72 6.6. Backwards compatibility . . . . . . . . . . . . . . . . . 20 73 7. Detailed Functional architecture . . . . . . . . . . . . . . . 20 74 7.1. PCN-interior-node functions . . . . . . . . . . . . . . . 21 75 7.2. PCN-ingress-node functions . . . . . . . . . . . . . . . . 21 76 7.3. PCN-egress-node functions . . . . . . . . . . . . . . . . 22 77 7.4. Admission control functions . . . . . . . . . . . . . . . 22 78 7.5. Flow termination functions . . . . . . . . . . . . . . . . 23 79 7.6. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 24 80 7.7. Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . 25 81 7.8. Fault handling . . . . . . . . . . . . . . . . . . . . . . 26 82 8. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 27 83 9. Operations and Management . . . . . . . . . . . . . . . . . . 29 84 9.1. Configuration OAM . . . . . . . . . . . . . . . . . . . . 29 85 9.1.1. System options . . . . . . . . . . . . . . . . . . . . 30 86 9.1.2. Parameters . . . . . . . . . . . . . . . . . . . . . . 31 87 9.2. Performance & Provisioning OAM . . . . . . . . . . . . . . 32 88 9.3. Accounting OAM . . . . . . . . . . . . . . . . . . . . . . 34 89 9.4. Fault OAM . . . . . . . . . . . . . . . . . . . . . . . . 34 90 9.5. Security OAM . . . . . . . . . . . . . . . . . . . . . . . 35 91 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 92 11. Security considerations . . . . . . . . . . . . . . . . . . . 36 93 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 94 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 95 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 37 96 15. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 97 15.1. Changes from -08 to -09 . . . . . . . . . . . . . . . . . 38 98 15.2. Changes from -07 to -08 . . . . . . . . . . . . . . . . . 38 99 15.3. Changes from -06 to -07 . . . . . . . . . . . . . . . . . 38 100 15.4. Changes from -05 to -06 . . . . . . . . . . . . . . . . . 38 101 15.5. Changes from -04 to -05 . . . . . . . . . . . . . . . . . 39 102 15.6. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 40 103 15.7. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 41 104 15.8. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 42 105 15.9. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 43 106 16. Appendix: Possible future work items . . . . . . . . . . . . . 44 107 16.1. Probing . . . . . . . . . . . . . . . . . . . . . . . . . 46 108 16.1.1. Introduction . . . . . . . . . . . . . . . . . . . . . 46 109 16.1.2. Probing functions . . . . . . . . . . . . . . . . . . 47 110 16.1.3. Discussion of rationale for probing, its downsides 111 and open issues . . . . . . . . . . . . . . . . . . . 47 112 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 50 113 17.1. Normative References . . . . . . . . . . . . . . . . . . . 50 114 17.2. Informative References . . . . . . . . . . . . . . . . . . 50 115 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 54 117 1. Introduction 119 The purpose of this document is to describe a general architecture 120 for flow admission and termination based on (pre-) congestion 121 information in order to protect the quality of service of flows 122 within a DiffServ domain [RFC2475]. This document defines an 123 architecture for implementing two mechanisms to protect the quality 124 of service of established inelastic flows within a single DiffServ 125 domain, where all boundary and interior nodes are PCN-enabled and are 126 trusted for correct PCN operation. Flow admission control determines 127 whether a new flow should be admitted, in order to protect the QoS of 128 existing PCN-flows in normal circumstances. However, in abnormal 129 circumstances, for instance a disaster affecting multiple nodes and 130 causing traffic re-routes, then the QoS on existing PCN-flows may 131 degrade even though care was exercised when admitting those flows. 132 Therefore this document also describes a mechanism for flow 133 termination, which removes enough traffic in order to protect the QoS 134 of the remaining PCN-flows. 136 As a fundamental building block to enable these two mechanisms, PCN- 137 interior-nodes generate, encode and transport pre-congestion 138 information towards the PCN-egress-nodes. Two rates, a PCN- 139 threshold-rate and a PCN-excess-rate, are associated with each link 140 of the PCN-domain. Each rate is used by a marking behaviour that 141 determines how and when PCN-packets are marked, and how the markings 142 are encoded in packet headers. Overall the aim is to enable PCN- 143 nodes to give an "early warning" of potential congestion before there 144 is any significant build-up of PCN-packets in the queue. 146 PCN-boundary-nodes convert measurements of these PCN-markings into 147 decisions about flow admission and termination. In a PCN-domain with 148 both threshold marking and excess traffic marking enabled, then the 149 admission control mechanism limits the PCN-traffic on each link to 150 *roughly* its PCN-threshold-rate and the flow termination mechanism 151 limits the PCN-traffic on each link to *roughly* its PCN-excess-rate. 152 Other scenarios are discussed later. 154 The behaviour of PCN-interior-nodes is standardised in other 155 documents, which are summarised in this document: 157 o Marking behaviour: threshold marking and excess traffic marking 158 [PCN08-2]. Threshold marking marks all PCN-packets if the PCN 159 traffic rate is greater than a first configured rate, "PCN- 160 threshold-rate". Excess traffic marking marks a proportion of 161 PCN-packets, such that the amount marked equals the traffic rate 162 in excess of a second configured rate, "PCN-excess-rate". 164 o Encoding: a combination of the DSCP field and ECN field in the IP 165 header indicates that a packet is a PCN-packet and whether it is 166 PCN-marked. The "baseline" encoding is described in [PCN08-1], 167 which standardises two PCN encoding states (PCN-marked and not 168 PCN-marked), whilst (experimental) extensions to the baseline 169 encoding can provide three encoding states (threshold-marked, 170 excess-traffic-marked, not PCN-marked, or perhaps further encoding 171 states as suggested in [Westberg08]). PCN encoding therefore 172 defines semantics for the ECN field different from the default 173 semantics of [RFC3168], and so its encoding needs to meet the 174 guidelines of BCP 124 [RFC4774]. 176 The behaviour of PCN-boundary-nodes is described in Informational 177 documents. Several possibilities are outlined in this document; 178 detailed descriptions and comparisons are in [Charny07-1] and 179 [Menth08-3]. 181 This document describes the PCN architecture at a high level (Section 182 6) and in more detail (Section 7). It also defines some terminology 183 and outlines some benefits, deployment scenarios, and assumptions of 184 PCN (Sections 2-5). Finally it outlines some challenges, operations 185 and management, and security considerations, and some potential 186 future work items (Sections 8, 9, 11 and Appendix). 188 2. Terminology 190 o PCN-domain: a PCN-capable domain; a contiguous set of PCN-enabled 191 nodes that perform DiffServ scheduling [RFC2474]; the complete set 192 of PCN-nodes whose PCN-marking can in principle influence 193 decisions about flow admission and termination for the PCN-domain, 194 including the PCN-egress-nodes, which measure these PCN-marks. 196 o PCN-boundary-node: a PCN-node that connects one PCN-domain to a 197 node either in another PCN-domain or in a non PCN-domain. 199 o PCN-interior-node: a node in a PCN-domain that is not a PCN- 200 boundary-node. 202 o PCN-node: a PCN-boundary-node or a PCN-interior-node 204 o PCN-egress-node: a PCN-boundary-node in its role in handling 205 traffic as it leaves a PCN-domain. 207 o PCN-ingress-node: a PCN-boundary-node in its role in handling 208 traffic as it enters a PCN-domain. 210 o PCN-traffic, PCN-packets, PCN-BA: a PCN-domain carries traffic of 211 different DiffServ behaviour aggregates (BAs) [RFC2474]. The 212 PCN-BA uses the PCN mechanisms to carry PCN-traffic and the 213 corresponding packets are PCN-packets. The same network will 214 carry traffic of other DiffServ BAs. The PCN-BA is distinguished 215 by a combination of the DiffServ codepoint (DSCP) and ECN fields. 217 o PCN-flow: the unit of PCN-traffic that the PCN-boundary-node 218 admits (or terminates); the unit could be a single microflow (as 219 defined in [RFC2474]) or some identifiable collection of 220 microflows. 222 o Ingress-egress-aggregate: The collection of PCN-packets from all 223 PCN-flows that travel in one direction between a specific pair of 224 PCN-boundary-nodes. 226 o PCN-threshold-rate: a reference rate configured for each link in 227 the PCN-domain, which is lower than the PCN-excess-rate. It is 228 used by a marking behaviour that determines whether a packet 229 should be PCN-marked with a first encoding, "threshold-marked". 231 o PCN-excess-rate: a reference rate configured for each link in the 232 PCN-domain, which is higher than the PCN-threshold-rate. It is 233 used by a marking behaviour that determines whether a packet 234 should be PCN-marked with a second encoding, "excess-traffic- 235 marked". 237 o Threshold-marking: a PCN-marking behaviour with the objective that 238 all PCN-traffic is marked if the PCN-traffic exceeds the PCN- 239 threshold-rate. 241 o Excess-traffic-marking: a PCN-marking behaviour with the objective 242 that the amount of PCN-traffic that is PCN-marked is equal to the 243 amount that exceeds the PCN-excess-rate. 245 o Pre-congestion: a condition of a link within a PCN-domain such 246 that the PCN-node performs PCN-marking, in order to provide an 247 "early warning" of potential congestion before there is any 248 significant build-up of PCN-packets in the real queue. (Hence, by 249 analogy with ECN we call our mechanism Pre-Congestion 250 Notification.) 252 o PCN-marking: the process of setting the header in a PCN-packet 253 based on defined rules, in reaction to pre-congestion; either 254 threshold-marking or excess-traffic-marking. 256 o PCN-colouring: the process of setting the header in a PCN-packet 257 by a PCN-boundary-node; performed by a PCN-ingress-node so that 258 PCN-nodes can easily identify PCN-packets; performed by a PCN- 259 egress-node so that the header is appropriate for nodes beyond the 260 PCN-domain. 262 o PCN-feedback-information: information signalled by a PCN-egress- 263 node to a PCN-ingress-node (or a central control node), which is 264 needed for the flow admission and flow termination mechanisms. 266 o PCN-admissible-rate: the rate of PCN-traffic on a link up to which 267 PCN admission control should accept new PCN-flows. 269 o PCN-supportable-rate: the rate of PCN-traffic on a link down to 270 which PCN flow termination should, if necessary, terminate already 271 admitted PCN-flows. 273 3. Benefits 275 We believe that the key benefits of the PCN mechanisms described in 276 this document are that they are simple, scalable, and robust because: 278 o Per flow state is only required at the PCN-ingress-nodes 279 ("stateless core"). This is required for policing purposes (to 280 prevent non-admitted PCN traffic from entering the PCN-domain) and 281 so on. It is not generally required that other network entities 282 are aware of individual flows (although they may be in particular 283 deployment scenarios). 285 o Admission control is resilient: with PCN QoS is decoupled from the 286 routing system. Hence in general admitted flows can survive 287 capacity, routing or topology changes without additional 288 signalling. The PCN-admissible-rate on each link can be chosen 289 small enough that admitted traffic can still be carried after a 290 rerouting in most failure cases [Menth07]. This is an important 291 feature as QoS violations in core networks due to link failures 292 are more likely than QoS violations due to increased traffic 293 volume [Iyer03]. 295 o The PCN-marking behaviours only operate on the overall PCN-traffic 296 on the link, not per flow. 298 o The information of these measurements is signalled to the PCN- 299 egress-nodes by the PCN-marks in the packet headers, ie [Style] 300 "in-band". No additional signalling protocol is required for 301 transporting the PCN-marks. Therefore no secure binding is 302 required between data packets and separate congestion messages. 304 o The PCN-egress-nodes make separate measurements, operating on the 305 aggregate PCN-traffic from each PCN-ingress-node, ie not per flow. 306 Similarly, signalling by the PCN-egress-node of PCN-feedback- 307 information (which is used for flow admission and termination 308 decisions) is at the granularity of the ingress-egress-aggregate. 309 An alternative approach is that the PCN-egress-nodes monitor the 310 PCN-traffic and signal PCN-feedback-information (which is used for 311 flow admission and termination decisions) at the granularity of 312 one (or a few) PCN-marks. 314 o The admitted PCN-load is controlled dynamically. Therefore it 315 adapts as the traffic matrix changes, and also if the network 316 topology changes (eg after a link failure). Hence an operator can 317 be less conservative when deploying network capacity, and less 318 accurate in their prediction of the PCN-traffic matrix. 320 o The termination mechanism complements admission control. It 321 allows the network to recover from sudden unexpected surges of 322 PCN-traffic on some links, thus restoring QoS to the remaining 323 flows. Such scenarios are expected to be rare but not impossible. 324 They can be caused by large network failures that redirect lots of 325 admitted PCN-traffic to other links, or by malfunction of the 326 measurement-based admission control in the presence of admitted 327 flows that send for a while with an atypically low rate and then 328 increase their rates in a correlated way. 330 o Flow termination can also enable an operator to be less 331 conservative when deploying network capacity. It is an 332 alternative to running links at low utilisation in order to 333 protect against link or node failures. This is especially the 334 case with SRLGs (shared risk link groups, which are links that 335 share a resource, such as a fibre, whose failure affects all those 336 links [RFC4216]). Fully protecting traffic against a single SRLG 337 failure requires low utilisation (~10%) of the link bandwidth on 338 some links before failure [Charny08]. 340 o The PCN-supportable-rate may be set below the maximum rate that 341 PCN-traffic can be transmitted on a link, in order to trigger 342 termination of some PCN-flows before loss (or excessive delay) of 343 PCN-packets occurs, or to keep the maximum PCN-load on a link 344 below a level configured by the operator. 346 o Provisioning of the network is decoupled from the process of 347 adding new customers. By contrast, with the DiffServ architecture 348 [RFC2475] operators rely on subscription-time Service Level 349 Agreements, which statically define the parameters of the traffic 350 that will be accepted from a customer, and so the operator has to 351 verify provision is sufficient each time a new customer is added 352 to check that the Service Level Agreement can be fulfilled. A 353 PCN-domain doesn't need such traffic conditioning. 355 4. Deployment scenarios 357 Operators of networks will want to use the PCN mechanisms in various 358 arrangements, for instance depending on how they are performing 359 admission control outside the PCN-domain (users after all are 360 concerned about QoS end-to-end), what their particular goals and 361 assumptions are, how many PCN encoding states are available, and so 362 on. 364 From the perspective of the outside world, a PCN-domain essentially 365 looks like a DiffServ domain. PCN-traffic is either transported 366 across it transparently or policed at the PCN-ingress-node (ie 367 dropped or carried at a lower QoS). One difference is that PCN- 368 traffic has better QoS guarantees than normal DiffServ traffic, 369 because the PCN mechanisms better protect the QoS of admitted flows. 370 Another difference may occur in the rare circumstance when there is a 371 failure: on the one hand some PCN-flows may get terminated, but on 372 the other hand other flows will get their QoS restored. Non PCN- 373 traffic is treated transparently, ie the PCN-domain is a normal 374 DiffServ domain. 376 An operator may choose to deploy either admission control or flow 377 termination or both. Although designed to work together, they are 378 independent mechanisms, and the use of one does not require or 379 prevent the use of the other. 381 A PCN-domain may have three encoding states (or pedantically, an 382 operator may choose to use up three encoding states for PCN): not 383 PCN-marked, threshold-marked, excess-traffic-marked. Then both PCN 384 admission control and flow termination can be supported. As 385 illustrated in Figure 1, admission control accepts new flows until 386 the PCN-traffic rate on the bottleneck link rises above the PCN- 387 threshold-rate, whilst if necessary the flow termination mechanism 388 terminates flows down to the PCN-excess-rate on the bottleneck link. 390 ==Marking behaviour== ==PCN mechanisms== 391 Rate of ^ 392 PCN-traffic on | 393 bottleneck link | (as below and also) 394 | (as below) Drop some PCN-pkts 395 | 396 scheduler rate -|------------------------------------------------ 397 (for PCN-traffic) | 398 | Some pkts Terminate some 399 | excess-traffic-marked admitted flows 400 | & & 401 | Rest of pkts Block new flows 402 | threshold-marked 403 | 404 PCN-excess-rate -|------------------------------------------------ 405 (=PCN-supportable-rate)| 406 | All pkts Block new flows 407 | threshold-marked 408 | 409 PCN-threshold-rate -|------------------------------------------------ 410 (=PCN-admissible-rate)| 411 | No pkts Admit new flows 412 | PCN-marked 413 | 415 Figure 1: Schematic of how the PCN admission control and flow 416 termination mechanisms operate as the rate of PCN-traffic increases, 417 for a PCN-domain with three encoding states. 419 On the other hand, a PCN-domain may have two encoding states (as in 420 [PCN08-1]) (or pedantically, an operator may choose to use up two 421 encoding states for PCN): not PCN-marked, PCN-marked. Then there are 422 three possibilities, as discussed in the following paragraphs (see 423 also Section 6.3). 425 First, an operator could just use PCN's admission control, solving 426 heavy congestion (caused by re-routing) by 'just waiting' - as 427 sessions end, PCN-traffic naturally reduces, and meanwhile the 428 admission control mechanism will prevent admission of new flows that 429 use the affected links. So the PCN-domain will naturally return to 430 normal operation, but with reduced capacity. The drawback of this 431 approach would be that, until sufficient sessions have ended to 432 relieve the congestion, all PCN-flows as well as lower priority 433 services will be adversely affected. 435 Second, an operator could just rely for admission control on 436 statically provisioned capacity per PCN-ingress-node (regardless of 437 the PCN-egress-node of a flow), as is typical in the hose model of 438 the DiffServ architecture [RFC2475]. Such traffic conditioning 439 agreements can lead to focused overload: many flows happen to focus 440 on a particular link and then all flows through the congested link 441 fail catastrophically. PCN's flow termination mechanism could then 442 be used to counteract such a problem. 444 Third, both admission control and flow termination can be triggered 445 from the single type of PCN-marking; the main downside is that 446 admission control is less accurate [Charny07-2]. 448 Within the PCN-domain there is some flexibility about how the 449 decision making functionality is distributed. These possibilities 450 are outlined in Section 7.4 and also discussed elsewhere, such as in 451 [Menth08-3]. 453 The flow admission and termination decisions need to be enforced 454 through per flow policing by the PCN-ingress-nodes. If there are 455 several PCN-domains on the end-to-end path, then each needs to police 456 at its PCN-ingress-nodes. One exception is if the operator runs both 457 the access network (not a PCN-domain) and the core network (a PCN- 458 domain); per flow policing could be devolved to the access network 459 and not done at the PCN-ingress-node. Note: to aid readability, the 460 rest of this draft assumes that policing is done by the PCN-ingress- 461 nodes. 463 PCN admission control has to fit with the overall approach to 464 admission control. For instance [Briscoe06] describes the case where 465 RSVP signalling runs end-to-end. The PCN-domain is a single RSVP 466 hop, ie only the PCN-boundary-nodes process RSVP messages, with RSVP 467 messages processed on each hop outside the PCN-domain, as in IntServ 468 over DiffServ [RFC2998]. It would also be possible for the RSVP 469 signalling to be originated and/or terminated by proxies, with 470 application-layer signalling between the end user and the proxy (eg 471 SIP signalling with a home hub). A similar example would use NSIS 472 signalling instead of RSVP. 474 It is possible that a user wants its inelastic traffic to use the PCN 475 mechanisms but also react to ECN marking outside the PCN-domain 476 [Sarker08]. Two possible ways to do this are to tunnel all PCN- 477 packets across the PCN-domain, so that the ECN marks are carried 478 transparently across the PCN-domain, or to use an encoding like 479 [Moncaster08]. Tunnelling is discussed further in Section 7.7. 481 Some further possible deployment models are outlined in the Appendix. 483 5. Assumptions and constraints on scope 485 The scope is restricted by the following assumptions: 487 1. these components are deployed in a single DiffServ domain, within 488 which all PCN-nodes are PCN-enabled and are trusted for truthful 489 PCN-marking and transport 491 2. all flows handled by these mechanisms are inelastic and 492 constrained to a known peak rate through policing or shaping 494 3. the number of PCN-flows across any potential bottleneck link is 495 sufficiently large that stateless, statistical mechanisms can be 496 effective. To put it another way, the aggregate bit rate of PCN- 497 traffic across any potential bottleneck link needs to be 498 sufficiently large relative to the maximum additional bit rate 499 added by one flow. This is the basic assumption of measurement- 500 based admission control. 502 4. PCN-flows may have different precedence, but the applicability of 503 the PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc.) 504 is out of scope. 506 5.1. Assumption 1: Trust and support of PCN - controlled environment 508 We assume that the PCN-domain is a controlled environment, ie all the 509 nodes in a PCN-domain run PCN and are trusted. There are several 510 reasons this assumption: 512 o The PCN-domain has to be encircled by a ring of PCN-boundary- 513 nodes, otherwise traffic could enter a PCN-BA without being 514 subject to admission control, which would potentially degrade the 515 QoS of existing PCN-flows. 517 o Similarly, a PCN-boundary-node has to trust that all the PCN-nodes 518 mark PCN-traffic consistently. A node not performing PCN-marking 519 wouldn't be able to alert when it suffered pre-congestion, which 520 potentially would lead to too many PCN-flows being admitted (or 521 too few being terminated). Worse, a rogue node could perform 522 various attacks, as discussed in the Security Considerations 523 section. 525 One way of assuring the above two points is that the entire PCN- 526 domain is run by a single operator. Another possibility is that 527 there are several operators that trust each other in their handling 528 of PCN-traffic. 530 Note: All PCN-nodes need to be trustworthy. However if it is known 531 that an interface cannot become pre-congested then it is not strictly 532 necessary for it to be capable of PCN-marking. But this must be 533 known even in unusual circumstances, eg after the failure of some 534 links. 536 5.2. Assumption 2: Real-time applications 538 We assume that any variation of source bit rate is independent of the 539 level of pre-congestion. We assume that PCN-packets come from real 540 time applications generating inelastic traffic, ie sending packets at 541 the rate the codec produces them, regardless of the availability of 542 capacity [RFC4594]. For example, voice and video requiring low 543 delay, jitter and packet loss, the Controlled Load Service, 544 [RFC2211], and the Telephony service class, [RFC4594]. This 545 assumption is to help focus the effort where it looks like PCN would 546 be most useful, ie the sorts of applications where per flow QoS is a 547 known requirement. In other words we focus on PCN providing a 548 benefit to inelastic traffic (PCN may or may not provide a benefit to 549 other types of traffic). 551 As a consequence, it is assumed that PCN-marking is being applied to 552 traffic scheduled with the expedited forwarding per-hop behaviour, 553 [RFC3246], or a per-hop behaviour with similar characteristics. 555 5.3. Assumption 3: Many flows and additional load 557 We assume that there are many PCN-flows on any bottleneck link in the 558 PCN-domain (or, to put it another way, the aggregate bit rate of PCN- 559 traffic across any potential bottleneck link is sufficiently large 560 relative to the maximum additional bit rate added by one PCN-flow). 561 Measurement-based admission control assumes that the present is a 562 reasonable prediction of the future: the network conditions are 563 measured at the time of a new flow request, however the actual 564 network performance must be acceptable during the call some time 565 later. One issue is that if there are only a few variable rate 566 flows, then the aggregate traffic level may vary a lot, perhaps 567 enough to cause some packets to get dropped. If there are many flows 568 then the aggregate traffic level should be statistically smoothed. 569 How many flows is enough depends on a number of factors such as the 570 variation in each flow's rate, the total rate of PCN-traffic, and the 571 size of the "safety margin" between the traffic level at which we 572 start admission-marking and at which packets are dropped or 573 significantly delayed. 575 We do not make explicit assumptions on how many PCN-flows are in each 576 ingress-egress-aggregate. Performance evaluation work may clarify 577 whether it is necessary to make any additional assumption on 578 aggregation at the ingress-egress-aggregate level. 580 5.4. Assumption 4: Emergency use out of scope 582 PCN-flows may have different precedence, but the applicability of the 583 PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc) is out 584 of scope of this document. 586 6. High-level functional architecture 588 The high-level approach is to split functionality between: 590 o PCN-interior-nodes 'inside' the PCN-domain, which monitor their 591 own state of pre-congestion and mark PCN-packets as appropriate. 592 They are not flow-aware, nor aware of ingress-egress-aggregates. 593 The functionality is also done by PCN-ingress-nodes for their 594 outgoing interfaces (ie those 'inside' the PCN-domain). 596 o PCN-boundary-nodes at the edge of the PCN-domain, which control 597 admission of new PCN-flows and termination of existing PCN-flows, 598 based on information from PCN-interior-nodes. This information is 599 in the form of the PCN-marked data packets (which are intercepted 600 by the PCN-egress-nodes) and not signalling messages. Generally 601 PCN-ingress-nodes are flow-aware. 603 The aim of this split is to keep the bulk of the network simple, 604 scalable and robust, whilst confining policy, application-level and 605 security interactions to the edge of the PCN-domain. For example the 606 lack of flow awareness means that the PCN-interior-nodes don't care 607 about the flow information associated with PCN-packets, nor do the 608 PCN-boundary-nodes care about which PCN-interior-nodes its ingress- 609 egress-aggregates traverse. 611 In order to generate information about the current state of the PCN- 612 domain, each PCN-node PCN-marks packets if it is "pre-congested". 613 Exactly when a PCN-node decides if it is "pre-congested" (the 614 algorithm) and exactly how packets are "PCN-marked" (the encoding) 615 will be defined in separate standards-track documents, but at a high 616 level it is as follows: 618 o the algorithms: a PCN-node meters the amount of PCN-traffic on 619 each one of its outgoing (or incoming) links. The measurement is 620 made as an aggregate of all PCN-packets, and not per flow. There 621 are two algorithms, one for threshold-marking and one for excess- 622 traffic-marking. 624 o the encoding(s): a PCN-node PCN-marks a PCN-packet by modifying a 625 combination of the DSCP and ECN fields. In the "baseline" 626 encoding [PCN08-1], the ECN field is set to 11 and the DSCP is not 627 altered. Extension encodings may be defined that, at most, use a 628 second DSCP (eg as in [Moncaster08]) and/or set the ECN field to 629 values other than 11 (eg as in [Menth08-2]). 631 In a PCN-domain the operator may have two or three encoding states 632 available. The baseline encoding provides two encoding states (not 633 PCN-marked, PCN-marked), whilst extended encodings can provide three 634 encoding states (not PCN-marked, threshold-marked, excess-traffic- 635 marked). 637 The PCN-boundary-nodes monitor the PCN-marked packets in order to 638 extract information about the current state of the PCN-domain. Based 639 on this monitoring, a distributed decision is made about whether to 640 admit a prospective new flow or whether to terminate existing 641 flow(s). Sections 7.4 and 7.5 mention various possibilities for how 642 the functionality could be distributed. 644 PCN-marking needs to be configured on all (potentially pre-congested) 645 links in the PCN-domain to ensure that the PCN mechanisms protect all 646 links. The actual functionality can be configured on the outgoing or 647 incoming interfaces of PCN-nodes - or one algorithm could be 648 configured on the outgoing interface and the other on the incoming 649 interface. The important point is that a consistent choice is made 650 across the PCN-domain to ensure that the PCN mechanisms protect all 651 links. See [PCN08-2] for further discussion. 653 The objective of the threshold-marking algorithm is to threshold-mark 654 all PCN-packets whenever the rate of PCN-packets is greater than some 655 configured rate, the PCN-threshold-rate. The objective of the 656 excess-traffic-marking algorithm is to excess-traffic-mark PCN- 657 packets at a rate equal to the difference between the bit rate of 658 PCN-packets and some configured rate, the PCN-excess-rate. Note that 659 this description reflects the overall intent of the algorithm rather 660 than its instantaneous behaviour, since the rate measured at a 661 particular moment depends on the detailed algorithm, its 662 implementation, and the traffic's variance as well as its rate (eg 663 marking may well continue after a recent overload even after the 664 instantaneous rate has dropped). The algorithms are specified in 665 [PCN08-2]. 667 Admission and termination approaches are detailed and compared in 668 [Charny07-1] and [Menth08-3]. The discussion below is just a brief 669 summary. It initially assumes there are three encoding states 670 available. 672 6.1. Flow admission 674 The objective of PCN's flow admission control mechanism is to limit 675 the PCN-traffic on each link in the PCN-domain to *roughly* its PCN- 676 admissible-rate, by admitting or blocking prospective new flows, in 677 order to protect the QoS of existing PCN-flows. With three encoding 678 states available, the PCN-threshold-rate is configured by the 679 operator as equal to the PCN-admissible-rate on each link. It is set 680 lower than the traffic rate at which the link becomes congested and 681 the node drops packets. 683 Exactly how the admission control decision is made will be defined 684 separately in informational documents. This document describes two 685 approaches (others might be possible): 687 o the PCN-egress-node measures (possibly as a moving average) the 688 fraction of the PCN-traffic that is threshold-marked. The 689 fraction is measured for a specific ingress-egress-aggregate. If 690 the fraction is below a threshold value then the new flow is 691 admitted, and if the fraction is above the threshold value then it 692 is blocked. The fraction could be measured as an EWMA 693 (exponentially weighted moving average), which has sometimes been 694 called the "congestion level estimate". 696 o the PCN-egress-node monitors PCN-traffic and if it receives one 697 (or several) threshold-marked packets, then the new flow is 698 blocked, otherwise it is admitted. One possibility may be to 699 react to the marking state of an initial flow set-up packet (eg 700 RSVP PATH). Another is that after one (or several) threshold- 701 marks then all flows are blocked until after a specific period of 702 no congestion. 704 Note that the admission control decision is made for a particular 705 pair of PCN-boundary-nodes. So it is quite possible for a new flow 706 to be admitted between one pair of PCN-boundary-nodes, whilst at the 707 same time another admission request is blocked between a different 708 pair of PCN-boundary-nodes. 710 6.2. Flow termination 712 The objective of PCN's flow termination mechanism is to limit the 713 PCN-traffic on each link to *roughly* its PCN-supportable-rate, by 714 terminating some existing PCN-flows, in order to protect the QoS of 715 the remaining PCN-flows. With three encoding states available, the 716 PCN-excess-rate is configured by the operator as equal to the PCN- 717 supportable-rate on each link. It may be set lower than the traffic 718 rate at which the link becomes congested and the node drops packets. 720 Exactly how the flow termination decision is made will be defined 721 separately in informational documents. This document describes 722 several approaches (others might be possible): 724 o In one approach the PCN-egress-node measures the rate of PCN- 725 traffic that is not excess-traffic-marked, which is the amount of 726 PCN-traffic that can actually be supported, and communicates this 727 to the PCN-ingress-node. Also the PCN-ingress-node measures the 728 rate of PCN-traffic that is destined for this specific PCN-egress- 729 node, and hence it can calculate the excess amount that should be 730 terminated. 732 o Another approach instead measures the rate of excess-traffic- 733 marked traffic and terminates this amount of traffic. This 734 terminates less traffic than the previous bullet if some nodes are 735 dropping PCN-traffic. 737 o Another approach monitors PCN-packets and terminates some of the 738 PCN-flows that have an excess-traffic-marked packet. (If all such 739 flows were terminated, far too much traffic would be terminated, 740 so a random selection needs to be made from those with an excess- 741 traffic-marked packet, [Menth08-1].) 743 Since flow termination is designed for "abnormal" circumstances, it 744 is quite likely that some PCN-nodes are congested and hence packets 745 are being dropped and/or significantly queued. The flow termination 746 mechanism must accommodate this. 748 Note also that the termination control decision is made for a 749 particular pair of PCN-boundary-nodes. So it is quite possible for 750 PCN-flows to be terminated between one pair of PCN-boundary-nodes, 751 whilst at the same time none are terminated between a different pair 752 of PCN-boundary-nodes. 754 6.3. Flow admission and/or flow termination when there are only two PCN 755 encoding states 757 If a PCN-domain has only two encoding states available (PCN-marked 758 and not PCN-marked), ie it is using the baseline encoding [PCN08-1], 759 then an operator has three options (others might be possible): 761 o admission control only: PCN-marking means threshold-marking, ie 762 only the threshold-marking algorithm writes PCN-marks. Only PCN 763 admission control is available. 765 o flow termination only: PCN-marking means excess-traffic-marking, 766 ie only the excess-traffic-marking algorithm writes PCN-marks. 767 Only PCN termination control is available. 769 o both admission control and flow termination: only the excess- 770 traffic-marking algorithm writes PCN-marks, however the configured 771 rate (PCN-excess-rate) is set equal to the PCN-admissible-rate, as 772 shown in Figure 2. [Charny07-2] describes how both admission 773 control and flow termination can be triggered in this case and 774 also gives some of the pros and cons of this approach. The main 775 downside is that admission control is less accurate. 777 ==Marking behaviour== ==PCN mechanisms== 778 Rate of ^ 779 PCN-traffic on | 780 bottleneck link | Terminate some 781 | Further pkts admitted flows 782 | excess-traffic-marked & 783 | Block new flows 784 | 785 | 786 U*PCN-excess-rate -|------------------------------------------------ 787 (=PCN-supportable-rate)| 788 | Some pkts Block new flows 789 | excess-traffic-marked 790 | 791 PCN-excess-rate -|------------------------------------------------ 792 (=PCN-admissible-rate)| 793 | No pkts Admit new flows 794 | PCN-marked 795 | 797 Figure 2: Schematic of how the PCN admission control and flow 798 termination mechanisms operate as the rate of PCN-traffic increases, 799 for a PCN-domain with two encoding states and using the approach of 800 [Charny07-2]. Note: U is a global parameter for all links in the 801 PCN-domain. 803 6.4. Information transport 805 The transport of pre-congestion information from a PCN-node to a PCN- 806 egress-node is through PCN-markings in data packet headers, ie "in- 807 band": no signalling protocol messaging is needed. Signalling is 808 needed to transport PCN-feedback-information between the PCN- 809 boundary-nodes, for example to convey the fraction of PCN-marked 810 traffic from a PCN-egress-node to the relevant PCN-ingress-node. 811 Exactly what information needs to be transported will be described in 812 future documents about possible boundary mechanisms. The signalling 813 could be done by an extension of RSVP or NSIS, for instance; 814 [Lefaucheur06] describes the extensions needed for RSVP. 816 6.5. PCN-traffic 818 The following are some high-level points about how PCN works: 820 o There needs to be a way for a PCN-node to distinguish PCN-traffic 821 from other traffic. This is through a combination of the DSCP 822 field and/or ECN field. 824 o It is not advised to have non PCN-traffic that competes for the 825 same capacity as PCN-traffic but, if there is such traffic, there 826 needs to be a mechanism to limit it. "Capacity" means the 827 forwarding bandwidth on a link; "competes" means that non PCN- 828 packets will delay PCN-packets in the queue for the link. Hence 829 more non PCN-traffic results in poorer QoS for PCN. Further, the 830 unpredictable amount of non PCN-traffic makes the PCN mechanisms 831 less accurate and so reduces PCN's ability to protect the QoS of 832 admitted PCN-flows 834 o Two examples of such non PCN-traffic (ie that competes for the 835 same capacity as PCN-traffic) are: 837 1. traffic that is priority scheduled over PCN (perhaps a particular 838 application or an operator's control messages). 840 2. traffic that is scheduled at the same priority as PCN (for 841 example if the Voice-Admit codepoint is used for PCN-traffic 842 [PCN08-1] and there is non-PCN voice-admit traffic in the PCN- 843 domain). 845 o If there is such non PCN-traffic (ie that competes for the same 846 capacity as PCN-traffic), then PCN's mechanisms should take 847 account of it, in order to improve the accuracy of the decision 848 about whether to admit (or terminate) a PCN-flow. For example, 849 one mechanism is that such non PCN-traffic contributes to the PCN 850 meters (ie is metered by the threshold-marking and excess-traffic- 851 marking algorithms). 853 o There will be non PCN-traffic that doesn't compete for the same 854 capacity as PCN-traffic, because it is forwarded at lower 855 priority. Hence it shouldn't contribute to the PCN meters. 856 Examples are best effort and assured forwarding traffic. However, 857 a PCN-node should dedicate some capacity to lower priority traffic 858 so that it isn't starved. 860 o The document assumes that the PCN mechanisms are applied to a 861 single behaviour aggregate in the PCN-domain. However, it would 862 also be possible to apply them independently to more than one 863 behaviour aggregate, which are distinguished by DSCP. 865 6.6. Backwards compatibility 867 PCN specifies semantics for the ECN field that differ from the 868 default semantics of [RFC3168]. A particular PCN encoding scheme 869 needs to describe how it meets the guidelines of BCP 124 [RFC4774] 870 for specifying alternative semantics for the ECN field. In summary 871 the approach is to: 873 o use a DSCP to allow PCN-nodes to distinguish PCN-traffic that uses 874 the alternative ECN semantics; 876 o define these semantics for use within a controlled region, the 877 PCN-domain; 879 o take appropriate action if ECN capable, non-PCN traffic arrives at 880 a PCN-ingress-node with the DSCP used by PCN. 882 For the baseline encoding [PCN08-1], the 'appropriate action' is to 883 block ECN-capable traffic that uses the same DSCP as PCN from 884 entering the PCN-domain directly. Blocking means it is dropped or 885 downgraded to a lower priority behaviour aggregate, or alternatively 886 such traffic may be tunnelled through the PCN-domain. The reason 887 that 'appropriate action' is needed is that the PCN-egress-node 888 clears the ECN field to 00. 890 Extended encoding schemes may take different 'appropriate action'. 892 7. Detailed Functional architecture 894 This section is intended to provide a systematic summary of the new 895 functional architecture in the PCN-domain. First it describes 896 functions needed at the three specific types of PCN-node; these are 897 data plane functions and are in addition to their normal router 898 functions. Then it describes further functionality needed for both 899 flow admission control and flow termination; these are signalling and 900 decision-making functions, and there are various possibilities for 901 where the functions are physically located. The section is split 902 into: 904 1. functions needed at PCN-interior-nodes 906 2. functions needed at PCN-ingress-nodes 908 3. functions needed at PCN-egress-nodes 910 4. other functions needed for flow admission control 911 5. other functions needed for flow termination control 913 Note: Probing is covered in the Appendix. 915 The section then discusses some other detailed topics: 917 1. addressing 919 2. tunnelling 921 3. fault handling 923 7.1. PCN-interior-node functions 925 Each link of the PCN-domain is configured with the following 926 functionality: 928 o Behaviour aggregate classification - determine whether an incoming 929 packet is a PCN-packet or not. 931 o Meter - measure the 'amount of PCN-traffic'. The measurement is 932 made as an aggregate of all PCN-packets, and not per flow. 934 o PCN-mark - algorithms determine whether to PCN-mark PCN-packets 935 and what packet encoding is used. 937 The functions are defined in [PCN08-2] and the baseline encoding in 938 [PCN08-1] (extended encodings are to be defined in other documents). 940 7.2. PCN-ingress-node functions 942 Each ingress link of the PCN-domain is configured with the following 943 functionality: 945 o Packet classification - determine whether an incoming packet is 946 part of a previously admitted flow, by using a filter spec (eg 947 DSCP, source and destination addresses and port numbers). 949 o Traffic conditioning - police, by dropping or downgrading, any 950 packets received with a DSCP indicating PCN transport that do not 951 belong to an admitted flow. (A prospective PCN-flow that is 952 rejected could be blocked or admitted into a lower priority 953 behaviour aggregate.) Similarly, police packets that are part of 954 a previously admitted flow, to check that the flow keeps to the 955 agreed rate or flowspec (eg [RFC1633] for a microflow and its NSIS 956 equivalent). 958 o PCN-colour - set the DSCP and ECN fields appropriately for the 959 PCN-domain, for example as in [PCN08-1]. 961 o Meter - some approaches to flow termination require the PCN- 962 ingress-node to measure the (aggregate) rate of PCN-traffic 963 towards a particular PCN-egress-node. 965 The first two are policing functions, needed to make sure that PCN- 966 packets admitted into the PCN-domain belong to a flow that has been 967 admitted and to ensure that the flow keeps to the flowspec agreed (eg 968 doesn't exceed an agreed maximum rate and is inelastic traffic). 969 Installing the filter spec will typically be done by the signalling 970 protocol, as will re-installing the filter, for example after a re- 971 route that changes the PCN-ingress-node (see [Briscoe06] for an 972 example using RSVP). PCN-colouring allows the rest of the PCN-domain 973 to recognise PCN-packets. 975 7.3. PCN-egress-node functions 977 Each egress link of the PCN-domain is configured with the following 978 functionality: 980 o Packet classify - determine which PCN-ingress-node a PCN-packet 981 has come from. 983 o Meter - "measure PCN-traffic" or "monitor PCN-marks". 985 o PCN-colour - for PCN-packets, set the DSCP and ECN fields to the 986 appropriate values for use outside the PCN-domain. 988 The metering functionality of course depends on whether it is 989 targeted at admission control or flow termination. Alternatives 990 involve the PCN-egress-node "measuring" as an aggregate (ie not per 991 flow) all PCN-packets from a particular PCN-ingress-node, or 992 "monitoring" the PCN-traffic and reacting to one (or several) PCN- 993 marked packets. For PCN-colouring, [PCN08-1] specifies that the PCN- 994 egress-node re-sets the ECN field to 00; other encodings may define 995 different behaviour. 997 7.4. Admission control functions 999 As well as the functions covered above, other specific admission 1000 control functions need to be performed (others might be possible): 1002 o Make decision about admission - based on the output of the PCN- 1003 egress-node's PCN meter function. In the case where it "measures 1004 PCN-traffic", the measured traffic on the ingress-egress-aggregate 1005 is compared with some reference level. In the case where it 1006 "monitors PCN-marks", then the decision is based on whether one 1007 (or several) packets is (are) PCN-marked or not (eg the RSVP PATH 1008 message). In either case, the admission decision also takes 1009 account of policy and application layer requirements [RFC2753]. 1011 o Communicate decision about admission - signal the decision to the 1012 node making the admission control request (which may be outside 1013 the PCN-domain), and to the policer (PCN-ingress-node function) 1014 for enforcement of the decision. 1016 There are various possibilities for how the functionality could be 1017 distributed (we assume the operator would configure which is used): 1019 o The decision is made at the PCN-egress-node and the decision 1020 (admit or block) is signalled to the PCN-ingress-node. 1022 o The decision is recommended by the PCN-egress-node (admit or 1023 block) but the decision is definitively made by the PCN-ingress- 1024 node. The rationale is that the PCN-egress-node naturally has the 1025 necessary information about PCN-marking on the ingress-egress- 1026 aggregate, but the PCN-ingress-node is the policy enforcement 1027 point [RFC2753], which polices incoming traffic to ensure it is 1028 part of an admitted PCN-flow. 1030 o The decision is made at the PCN-ingress-node, which requires that 1031 the PCN-egress-node signals PCN-feedback-information to the PCN- 1032 ingress-node. For example, it could signal the current fraction 1033 of PCN-traffic that is PCN-marked. 1035 o The decision is made at a centralised node (see Appendix). 1037 Note: Admission control functionality is not performed by normal PCN- 1038 interior-nodes. 1040 7.5. Flow termination functions 1042 As well as the functions covered above, other specific termination 1043 control functions need to be performed (others might be possible): 1045 o PCN-meter at PCN-egress-node - similarly to flow admission, there 1046 are two types of possibilities: to "measure PCN-traffic" on the 1047 ingress-egress-aggregate, and to "monitor PCN-marks" and react to 1048 one (or several) PCN-marks. 1050 o (if required) PCN-meter at PCN-ingress-node - make "measurements 1051 of PCN-traffic" being sent towards a particular PCN-egress-node; 1052 again, this is done for the ingress-egress-aggregate and not per 1053 flow. 1055 o (if required) Communicate PCN-feedback-information to the node 1056 that makes the flow termination decision. For example, as in 1057 [Briscoe06], communicate the PCN-egress-node's measurements to the 1058 PCN-ingress-node. 1060 o Make decision about flow termination - use the information from 1061 the PCN-meter(s) to decide which PCN-flow or PCN-flows to 1062 terminate. The decision takes account of policy and application 1063 layer requirements [RFC2753]. 1065 o Communicate decision about flow termination - signal the decision 1066 to the node that is able to terminate the flow (which may be 1067 outside the PCN-domain), and to the policer (PCN-ingress-node 1068 function) for enforcement of the decision. 1070 There are various possibilities for how the functionality could be 1071 distributed, similar to those discussed above in the Admission 1072 control section. 1074 7.6. Addressing 1076 PCN-nodes may need to know the address of other PCN-nodes. Note: in 1077 all cases PCN-interior-nodes don't need to know the address of any 1078 other PCN-nodes (except as normal their next hop neighbours, for 1079 routing purposes). 1081 The PCN-egress-node needs to know the address of the PCN-ingress-node 1082 associated with a flow, at a minimum so that the PCN-ingress-node can 1083 be informed to enforce the admission decision (and any flow 1084 termination decision) through policing. There are various 1085 possibilities for how the PCN-egress-node can do this, ie associate 1086 the received packet to the correct ingress-egress-aggregate. It is 1087 not the intention of this document to mandate a particular mechanism. 1089 o The addressing information can be gathered from signalling. For 1090 example, regular processing of an RSVP Path message, as the PCN- 1091 ingress-node is the previous RSVP hop (PHOP) ([Lefaucheur06]). Or 1092 the PCN-ingress-node could signal its address to the PCN-egress- 1093 node. 1095 o Always tunnel PCN-traffic across the PCN-domain. Then the PCN- 1096 ingress-node's address is simply the source address of the outer 1097 packet header. The PCN-ingress-node needs to learn the address of 1098 the PCN-egress-node, either by manual configuration or by one of 1099 the automated tunnel endpoint discovery mechanisms (such as 1100 signalling or probing over the data route, interrogating routing 1101 or using a centralised broker). 1103 7.7. Tunnelling 1105 Tunnels may originate and/or terminate within a PCN-domain (eg IP 1106 over IP, IP over MPLS). It is important that the PCN-marking of any 1107 packet can potentially influence PCN's flow admission control and 1108 termination - it shouldn't matter whether the packet happens to be 1109 tunnelled at the PCN-node that PCN-marks the packet, or indeed 1110 whether it's decapsulated or encapsulated by a subsequent PCN-node. 1111 This suggests that the "uniform conceptual model" described in 1112 [RFC2983] should be re-applied in the PCN context. In line with this 1113 and the approach of [RFC4303] and [Briscoe08-2], the following rule 1114 is applied if encapsulation is done within the PCN-domain: 1116 o any PCN-marking is copied into the outer header 1118 Note: A tunnel will not provide this behaviour if it complies with 1119 [RFC3168] tunnelling in either mode, but it will if it complies with 1120 [RFC4301] IPSec tunnelling. 1122 Similarly, in line with the "uniform conceptual model" of [RFC2983], 1123 the "full-functionality option" of [RFC3168], and [RFC4301], the 1124 following rule is applied if decapsulation is done within the PCN- 1125 domain: 1127 o if the outer header's marking state is more severe then it is 1128 copied onto the inner header. 1130 Note: the order of increasing severity is: not PCN-marked; threshold- 1131 marking; excess-traffic-marking. 1133 An operator may wish to tunnel PCN-traffic from PCN-ingress-nodes to 1134 PCN-egress-nodes. The PCN-marks shouldn't be visible outside the 1135 PCN-domain, which can be achieved by the PCN-egress-node doing the 1136 PCN-colouring function (Section 7.3) after all the other (PCN and 1137 tunnelling) functions. The potential reasons for doing such 1138 tunnelling are: the PCN-egress-node then automatically knows the 1139 address of the relevant PCN-ingress-node for a flow; even if ECMP is 1140 running, all PCN-packets on a particular ingress-egress-aggregate 1141 follow the same path. But it also has drawbacks, for example the 1142 additional overhead in terms of bandwidth and processing, and the 1143 cost of setting up a mesh of tunnels between PCN-boundary-nodes 1144 (there is an N^2 scaling issue). 1146 Potential issues arise for a "partially PCN-capable tunnel", ie where 1147 only one tunnel endpoint is in the PCN domain: 1149 1. The tunnel originates outside a PCN-domain and ends inside it. 1150 If the packet arrives at the tunnel ingress with the same 1151 encoding as used within the PCN-domain to indicate PCN-marking, 1152 then this could lead the PCN-egress-node to falsely measure pre- 1153 congestion. 1155 2. The tunnel originates inside a PCN-domain and ends outside it. 1156 If the packet arrives at the tunnel ingress already PCN-marked, 1157 then it will still have the same encoding when it's decapsulated 1158 which could potentially confuse nodes beyond the tunnel egress. 1160 In line with the solution for partially capable DiffServ tunnels in 1161 [RFC2983], the following rules are applied: 1163 o For case (1), the tunnel egress node clears any PCN-marking on the 1164 inner header. This rule is applied before the 'copy on 1165 decapsulation' rule above. 1167 o For case (2), the tunnel ingress node clears any PCN-marking on 1168 the inner header. This rule is applied after the 'copy on 1169 encapsulation' rule above. 1171 Note that the above implies that one has to know, or determine, the 1172 characteristics of the other end of the tunnel as part of 1173 establishing it. 1175 Tunnelling constraints were a major factor in the choice of the 1176 baseline encoding. As explained in [PCN08-1], with current 1177 tunnelling endpoints only the 11 codepoint of the ECN field survives 1178 decapsulation, and hence the baseline encoding only uses the 11 1179 codepoint to indicate PCN-marking. Extended encoding schemes need to 1180 explain their interactions with (or assumptions about) tunnelling. A 1181 lengthy discussion of all the issues associated with layered 1182 encapsulation of congestion notification (for ECN as well as PCN) is 1183 in [Briscoe08-2]. 1185 7.8. Fault handling 1187 If a PCN-interior-node (or one of its links) fails, then lower layer 1188 protection mechanisms or the regular IP routing protocol will 1189 eventually re-route around it. If the new route can carry all the 1190 admitted traffic, flows will gracefully continue. If instead this 1191 causes early warning of pre-congestion on the new route, then 1192 admission control based on pre-congestion notification will ensure 1193 new flows will not be admitted until enough existing flows have 1194 departed. Re-routing may result in heavy (pre-)congestion, when the 1195 flow termination mechanism will kick in. 1197 If a PCN-boundary-node fails then we would like the regular QoS 1198 signalling protocol to be responsible for taking appropriate action. 1199 As an example [Briscoe08-2] considers what happens if RSVP is the QoS 1200 signalling protocol. 1202 8. Challenges 1204 Prior work on PCN and similar mechanisms has thrown up a number of 1205 considerations about PCN's design goals (things PCN should be good 1206 at) and some issues that have been hard to solve in a fully 1207 satisfactory manner. Taken as a whole it represents a list of trade- 1208 offs (it is unlikely that they can all be 100% achieved) and perhaps 1209 as evaluation criteria to help an operator (or the IETF) decide 1210 between options. 1212 The following are open issues. They are mainly taken from 1213 [Briscoe06], which also describes some possible solutions. Note that 1214 some may be considered unimportant in general or in specific 1215 deployment scenarios or by some operators. 1217 NOTE: Potential solutions are out of scope for this document. 1219 o ECMP (Equal Cost Multi-Path) Routing: The level of pre-congestion 1220 is measured on a specific ingress-egress-aggregate. However, if 1221 the PCN-domain runs ECMP, then traffic on this ingress-egress- 1222 aggregate may follow several different paths - some of the paths 1223 could be pre-congested whilst others are not. There are three 1224 potential problems: 1226 1. over-admission: a new flow is admitted (because the pre- 1227 congestion level measured by the PCN-egress-node is 1228 sufficiently diluted by unmarked packets from non-congested 1229 paths that a new flow is admitted), but its packets travel 1230 through a pre-congested PCN-node. 1232 2. under-admission: a new flow is blocked (because the pre- 1233 congestion level measured by the PCN-egress-node is 1234 sufficiently increased by PCN-marked packets from pre- 1235 congested paths that a new flow is blocked), but its packets 1236 travel along an uncongested path. 1238 3. ineffective termination: a flow is terminated, but its path 1239 doesn't travel through the (pre-)congested router(s). Since 1240 flow termination is a 'last resort', which protects the 1241 network should over-admission occur, this problem is probably 1242 more important to solve than the other two. 1244 o ECMP and signalling: It is possible that, in a PCN-domain running 1245 ECMP, the signalling packets (eg RSVP, NSIS) follow a different 1246 path than the data packets, which could matter if the signalling 1247 packets are used as probes. Whether this is an issue depends on 1248 which fields the ECMP algorithm uses; if the ECMP algorithm is 1249 restricted to the source and destination IP addresses, then it 1250 will not be an issue. ECMP and signalling interactions are a 1251 specific instance of a general issue for non-traditional routing 1252 combined with resource management along a path [Hancock02]. 1254 o Tunnelling: There are scenarios where tunnelling makes it 1255 difficult to determine the path in the PCN-domain. The problem, 1256 its impact, and the potential solutions are similar to those for 1257 ECMP. 1259 o Scenarios with only one tunnel endpoint in the PCN domain may make 1260 it harder for the PCN-egress-node to gather from the signalling 1261 messages (eg RSVP, NSIS) the identity of the PCN-ingress-node. 1263 o Bi-Directional Sessions: Many applications have bi-directional 1264 sessions - hence there are two microflows that should be admitted 1265 (or terminated) as a pair - for instance a bi-directional voice 1266 call only makes sense if microflows in both directions are 1267 admitted. However, the PCN mechanisms concern admission and 1268 termination of a single flow, and coordination of the decision for 1269 both flows is a matter for the signalling protocol and out of 1270 scope of PCN. One possible example would use SIP pre-conditions. 1271 However, there are others. 1273 o Global Coordination: PCN makes its admission decision based on 1274 PCN-markings on a particular ingress-egress-aggregate. Decisions 1275 about flows through a different ingress-egress-aggregate are made 1276 independently. However, one can imagine network topologies and 1277 traffic matrices where, from a global perspective, it would be 1278 better to make a coordinated decision across all the ingress- 1279 egress-aggregates for the whole PCN-domain. For example, to block 1280 (or even terminate) flows on one ingress-egress-aggregate so that 1281 more important flows through a different ingress-egress-aggregate 1282 could be admitted. The problem may well be relatively 1283 insignificant. 1285 o Aggregate Traffic Characteristics: Even when the number of flows 1286 is stable, the traffic level through the PCN-domain will vary 1287 because the sources vary their traffic rates. PCN works best when 1288 there is not too much variability in the total traffic level at a 1289 PCN-node's interface (ie in the aggregate traffic from all 1290 sources). Too much variation means that a node may (at one 1291 moment) not be doing any PCN-marking and then (at another moment) 1292 drop packets because it is overloaded. This makes it hard to tune 1293 the admission control scheme to stop admitting new flows at the 1294 right time. Therefore the problem is more likely with fewer, 1295 burstier flows. 1297 o Flash crowds and Speed of Reaction: PCN is a measurement-based 1298 mechanism and so there is an inherent delay between packet marking 1299 by PCN-interior-nodes and any admission control reaction at PCN- 1300 boundary-nodes. For example, potentially if a big burst of 1301 admission requests occurs in a very short space of time (eg 1302 prompted by a televote), they could all get admitted before enough 1303 PCN-marks are seen to block new flows. In other words, any 1304 additional load offered within the reaction time of the mechanism 1305 must not move the PCN-domain directly from a no congestion state 1306 to overload. This 'vulnerability period' may have an impact at 1307 the signalling level, for instance QoS requests should be rate 1308 limited to bound the number of requests able to arrive within the 1309 vulnerability period. 1311 o Silent at start: after a successful admission request the source 1312 may wait some time before sending data (eg waiting for the called 1313 party to answer). Then the risk is that, in some circumstances, 1314 PCN's measurements underestimate what the pre-congestion level 1315 will be when the source does start sending data. 1317 9. Operations and Management 1319 This Section considers operations and management issues, under the 1320 FCAPS headings: OAM of Faults, Configuration, Accounting, Performance 1321 and Security. Provisioning is discussed with performance. 1323 9.1. Configuration OAM 1325 Threshold-marking and excess-traffic-marking are standardised in 1326 [PCN08-2]. However, more diversity in PCN-boundary-node behaviours 1327 is expected, in order to interface with diverse industry 1328 architectures. It may be possible to have different PCN-boundary- 1329 node behaviours for different ingress-egress-aggregates within the 1330 same PCN-domain. 1332 A PCN marking behaviour (threshold-marking, excess-traffic-marking) 1333 is enabled on either the egress or the ingress interfaces of PCN- 1334 nodes. A consistent choice must be made across the PCN-domain to 1335 ensure that the PCN mechanisms protect all links. 1337 PCN configuration control variables fall into the following 1338 categories: 1340 o system options (enabling or disabling behaviours) 1342 o parameters (setting levels, addresses etc) 1344 One possibility is that all configurable variables sit within an SNMP 1345 management framework [RFC3411], being structured within a defined 1346 management information base (MIB) on each node, and being remotely 1347 readable and settable via a suitably secure management protocol 1348 (SNMPv3). 1350 Some configuration options and parameters have to be set once to 1351 'globally' control the whole PCN-domain. Where possible, these are 1352 identified below. This may affect operational complexity and the 1353 chances of interoperability problems between equipment from different 1354 vendors. 1356 It may be possible for an operator to configure some PCN-interior- 1357 nodes so that they don't run the PCN mechanisms, if it knows that 1358 these links will never become (pre-)congested. 1360 9.1.1. System options 1362 On PCN-interior-nodes there will be very few system options: 1364 o Whether two PCN-markings (threshold-marked and excess-traffic- 1365 marked) are enabled or only one. Typically all nodes throughout a 1366 PCN-domain will be configured the same in this respect. However, 1367 exceptions could be made. For example, if most PCN-nodes used 1368 both markings, but some legacy hardware was incapable of running 1369 two algorithms, an operator might be willing to configure these 1370 legacy nodes solely for excess-traffic-marking to enable flow 1371 termination as a back-stop. It would be sensible to place such 1372 nodes where they could be provisioned with a greater leeway over 1373 expected traffic levels. 1375 o In the case where only one PCN-marking is enabled, all nodes must 1376 be configured to generate PCN-marks from the same meter (ie either 1377 the threshold meter or the excess traffic meter). 1379 PCN-boundary-nodes (ingress and egress) will have more system 1380 options: 1382 o Which of admission and flow termination are enabled. If any PCN- 1383 interior-node is configured to generate a marking, all PCN- 1384 boundary-nodes must be able to interpret that marking (which 1385 includes understanding, in a PCN-domain that uses only one type of 1386 PCN-marking, whether they are generated by PCN-interior-nodes' 1387 threshold meters or the excess traffic meters). Therefore all 1388 PCN-boundary-nodes must be configured the same in this respect. 1390 o Where flow admission and termination decisions are made: at PCN- 1391 ingress-nodes or at PCN-egress-nodes (or at a centralised node, 1392 see Appendix). Theoretically, this configuration choice could be 1393 negotiated for each pair of PCN-boundary-nodes, but we cannot 1394 imagine why such complexity would be required, except perhaps in 1395 future inter-domain scenarios. 1397 o How PCN-markings are translated into admission control and flow 1398 termination decisions (see Section 6.1 and Section 6.2). 1400 PCN-egress-nodes will have further system options: 1402 o How the mapping should be established between each packet and its 1403 aggregate, eg by MPLS label, by IP packet filterspec; and how to 1404 take account of ECMP. 1406 o If an equipment vendor provides a choice, there may be options to 1407 select which smoothing algorithm to use for measurements. 1409 9.1.2. Parameters 1411 Like any DiffServ domain, every node within a PCN-domain will need to 1412 be configured with the DSCP(s) used to identify PCN-packets. On each 1413 interior link the main configuration parameters are the PCN- 1414 threshold-rate and PCN-excess-rate. A larger PCN-threshold-rate 1415 enables more PCN-traffic to be admitted on a link, hence improving 1416 capacity utilisation. A PCN-excess-rate set further above the PCN- 1417 threshold-rate allows greater increases in traffic (whether due to 1418 natural fluctuations or some unexpected event) before any flows are 1419 terminated, ie minimises the chances of unnecessarily triggering the 1420 termination mechanism. For instance, an operator may want to design 1421 their network so that it can cope with a failure of any single PCN- 1422 node without terminating any flows. 1424 Setting these rates on first deployment of PCN will be very similar 1425 to the traditional process for sizing an admission controlled 1426 network, depending on: the operator's requirements for minimising 1427 flow blocking (grade of service), the expected PCN traffic load on 1428 each link and its statistical characteristics (the traffic matrix), 1429 contingency for re-routing the PCN traffic matrix in the event of 1430 single or multiple failures, and the expected load from other classes 1431 relative to link capacities [Menth07]. But once a domain is in 1432 operation, a PCN design goal is to be able to determine growth in 1433 these configured rates much more simply, by monitoring PCN-marking 1434 rates from actual rather than expected traffic (see Section 9.2 on 1435 Performance & Provisioning). 1437 Operators may also wish to configure a rate greater than the PCN- 1438 excess-rate that is the absolute maximum rate that a link allows for 1439 PCN-traffic. This may simply be the physical link rate, but some 1440 operators may wish to configure a logical limit to prevent starvation 1441 of other traffic classes during any brief period after PCN-traffic 1442 exceeds the PCN-excess-rate but before flow termination brings it 1443 back below this rate. 1445 Threshold-marking requires a threshold token bucket depth to be 1446 configured, excess-traffic-marking needs a value for the MTU (maximum 1447 size of a PCN-packet on the link) and both require setting a maximum 1448 size of their token buckets. It will be preferable for there to be 1449 rules to set defaults for these parameters, but then allow operators 1450 to change them, for instance if average traffic characteristics 1451 change over time. 1453 The PCN-egress-node may allow configuration of the following: 1455 o how it smooths metering of PCN-markings (eg EWMA parameters) 1457 Whichever node makes admission and flow termination decisions will 1458 contain algorithms for converting PCN-marking levels into admission 1459 or flow termination decisions. These will also require configurable 1460 parameters, for instance: 1462 o an admission control algorithm that is based on the fraction of 1463 marked packets will at least require a marking threshold setting 1464 above which it denies admission to new flows; 1466 o flow termination algorithms will probably require a parameter to 1467 delay termination of any flows until it is more certain that an 1468 anomalous event is not transient; 1470 o a parameter to control the trade-off between how quickly excess 1471 flows are terminated, and over-termination. 1473 One particular approach, [Charny07-2] would require a global 1474 parameter to be defined on all PCN-nodes, but only needs one PCN 1475 marking rate to be configured on each link. The global parameter is 1476 a scaling factor between admission and termination (the PCN-traffic 1477 rate on a link up to which flows are admitted vs the rate above which 1478 flows are terminated). [Charny07-2] discusses in full the impact of 1479 this particular approach on the operation of PCN. 1481 9.2. Performance & Provisioning OAM 1483 Monitoring of performance factors measurable from *outside* the PCN 1484 domain will be no different with PCN than with any other packet-based 1485 flow admission control system, both at the flow level (blocking 1486 probability etc) and the packet level (jitter [RFC3393], [Y.1541], 1487 loss rate [RFC4656], mean opinion score [P.800], etc). The 1488 difference is that PCN is intentionally designed to indicate 1489 *internally* which exact resource(s) are the cause of performance 1490 problems and by how much. 1492 Even better, PCN indicates which resources will probably cause 1493 problems if they are not upgraded soon. This can be achieved by the 1494 management system monitoring the total amount (in bytes) of PCN- 1495 marking generated by each queue over a period. Given possible long 1496 provisioning lead times, pre-congestion volume is the best metric to 1497 reveal whether sufficient persistent demand has occurred to warrant 1498 an upgrade. Because, even before utilisation becomes problematic, 1499 the statistical variability of traffic will cause occasional bursts 1500 of pre-congestion. This 'early warning system' decouples the process 1501 of adding customers from the provisioning process. This should cut 1502 the time to add a customer when compared against admission control 1503 provided over native DiffServ [RFC2998], because it saves having to 1504 verify the capacity planning process before adding each customer. 1506 Alternatively, before triggering an upgrade, the long term pre- 1507 congestion volume on each link can be used to balance traffic load 1508 across the PCN-domain by adjusting the link weights of the routing 1509 system. When an upgrade to a link's configured PCN-rates is 1510 required, it may also be necessary to upgrade the physical capacity 1511 available to other classes. But usually there will be sufficient 1512 physical capacity for the upgrade to go ahead as a simple 1513 configuration change. Alternatively, [Songhurst06] describes an 1514 adaptive rather than preconfigured system, where the configured PCN- 1515 threshold-rate is replaced with a high and low water mark and the 1516 marking algorithm automatically optimises how physical capacity is 1517 shared using the relative loads from PCN and other traffic classes. 1519 All the above processes require just three extra counters associated 1520 with each PCN queue: threshold-markings, excess-traffic-markings and 1521 drop. Every time a PCN packet is marked or dropped its size in bytes 1522 should be added to the appropriate counter. Then the management 1523 system can read the counters at any time and subtract a previous 1524 reading to establish the incremental volume of each type of 1525 (pre-)congestion. Readings should be taken frequently, so that 1526 anomalous events (eg re-routes) can be distinguished from regular 1527 fluctuating demand if required. 1529 9.3. Accounting OAM 1531 Accounting is only done at trust boundaries so it is out of scope of 1532 this document, which is confined to intra-domain issues. Use of PCN 1533 internal to a domain makes no difference to the flow signalling 1534 events crossing trust boundaries outside the PCN-domain, which are 1535 typically used for accounting. 1537 9.4. Fault OAM 1539 Fault OAM is about preventing faults, telling the management system 1540 (or manual operator) that the system has recovered (or not) from a 1541 failure, and about maintaining information to aid fault diagnosis. 1543 Admission blocking and particularly flow termination mechanisms 1544 should rarely be needed in practice. It would be unfortunate if they 1545 didn't work after an option had been accidentally disabled. 1546 Therefore it will be necessary to regularly test that the live system 1547 works as intended (devising a meaningful test is left as an exercise 1548 for the operator). 1550 Section 7 describes how the PCN architecture has been designed to 1551 ensure admitted flows continue gracefully after recovering 1552 automatically from link or node failures. The need to record and 1553 monitor re-routing events affecting signalling is unchanged by the 1554 addition of PCN to a DiffServ domain. Similarly, re-routing events 1555 within the PCN-domain will be recorded and monitored just as they 1556 would be without PCN. 1558 PCN-marking does make it possible to record 'near-misses'. For 1559 instance, at the PCN-egress-node a 'reporting threshold' could be set 1560 to monitor how often - and for how long - the system comes close to 1561 triggering flow blocking without actually doing so. Similarly, 1562 bursts of flow termination marking could be recorded even if they are 1563 not sufficiently sustained to trigger flow termination. Such 1564 statistics could be correlated with per-queue counts of marking 1565 volume (Section 9.2) to upgrade resources in danger of causing 1566 service degradation, or to trigger manual tracing of intermittent 1567 incipient errors that would otherwise have gone unnoticed. 1569 Finally, of course, many faults are caused by failings in the 1570 management process ('human error'): a wrongly configured address in a 1571 node, a wrong address given in a signalling protocol, a wrongly 1572 configured parameter in a queueing algorithm, a node set into a 1573 different mode from other nodes, and so on. Generally, a clean 1574 design with few configurable options ensures this class of faults can 1575 be traced more easily and prevented more often. Sound management 1576 practice at run-time also helps. For instance: a management system 1577 should be used that constrains configuration changes within system 1578 rules (eg preventing an option setting inconsistent with other 1579 nodes); configuration options should also be recorded in an offline 1580 database; and regular automatic consistency checks between live 1581 systems and the database should be performed. PCN adds nothing 1582 specific to this class of problems. 1584 9.5. Security OAM 1586 Security OAM is about using secure operational practices as well as 1587 being able to track security breaches or near-misses at run-time. 1588 PCN adds few specifics to the general good practice required in this 1589 field [RFC4778], other than those below. The correct functions of 1590 the system should be monitored (Section 9.2) in multiple independent 1591 ways and correlated to detect possible security breaches. Persistent 1592 (pre-)congestion marking should raise an alarm (both on the node 1593 doing the marking and on the PCN-egress-node metering it). 1594 Similarly, persistently poor external QoS metrics such as jitter or 1595 MOS should raise an alarm. The following are examples of symptoms 1596 that may be the result of innocent faults, rather than attacks, but 1597 until diagnosed they should be logged and trigger a security alarm: 1599 o Anomalous patterns of non-conforming incoming signals and packets 1600 rejected at the PCN-ingress-nodes (eg packets already marked PCN- 1601 capable, or traffic persistently starving token bucket policers). 1603 o PCN-capable packets arriving at a PCN-egress-node with no 1604 associated state for mapping them to a valid ingress-egress- 1605 aggregate. 1607 o A PCN-ingress-node receiving feedback signals about the pre- 1608 congestion level on a non-existent aggregate, or that are 1609 inconsistent with other signals (eg unexpected sequence numbers, 1610 inconsistent addressing, conflicting reports of the pre-congestion 1611 level, etc). 1613 o Pre-congestion marking arriving at a PCN-egress-node with 1614 (pre-)congestion markings focused on particular flows, rather than 1615 randomly distributed throughout the aggregate. 1617 10. IANA Considerations 1619 This memo includes no request to IANA. 1621 11. Security considerations 1623 Security considerations essentially come from the Trust Assumption 1624 (Section 5.1), ie that all PCN-nodes are PCN-enabled and are trusted 1625 for truthful PCN-marking and transport. PCN splits functionality 1626 between PCN-interior-nodes and PCN-boundary-nodes, and the security 1627 considerations are somewhat different for each, mainly because PCN- 1628 boundary-nodes are flow-aware and PCN-interior-nodes are not. 1630 o Because the PCN-boundary-nodes are flow-aware, they are trusted to 1631 use that awareness correctly. The degree of trust required 1632 depends on the kinds of decisions they have to make and the kinds 1633 of information they need to make them. There is nothing specific 1634 to PCN. 1636 o The PCN-ingress-nodes police packets to ensure a PCN-flow sticks 1637 within its agreed limit, and to ensure that only PCN-flows that 1638 have been admitted contribute PCN-traffic into the PCN-domain. 1639 The policer must drop (or perhaps downgrade to a different DSCP) 1640 any PCN-packets received that are outside this remit. This is 1641 similar to the existing IntServ behaviour. Between them the PCN- 1642 boundary-nodes must encircle the PCN-domain, otherwise PCN-packets 1643 could enter the PCN-domain without being subject to admission 1644 control, which would potentially destroy the QoS of existing 1645 flows. 1647 o PCN-interior-nodes are not flow-aware. This prevents some 1648 security attacks where an attacker targets specific flows in the 1649 data plane - for instance for DoS or eavesdropping. 1651 o The PCN-boundary-nodes rely on correct PCN-marking by the PCN- 1652 interior-nodes. For instance a rogue PCN-interior-node could PCN- 1653 mark all packets so that no flows were admitted. Another 1654 possibility is that it doesn't PCN-mark any packets, even when it 1655 is pre-congested. More subtly, the rogue PCN-interior-node could 1656 perform these attacks selectively on particular flows, or it could 1657 PCN-mark the correct fraction overall, but carefully choose which 1658 flows it marked. 1660 o The PCN-boundary-nodes should be able to deal with DoS attacks and 1661 state exhaustion attacks based on fast changes in per flow 1662 signalling. 1664 o The signalling between the PCN-boundary-nodes must be protected 1665 from attacks. For example the recipient needs to validate that 1666 the message is indeed from the node that claims to have sent it. 1667 Possible measures include digest authentication and protection 1668 against replay and man-in-the-middle attacks. For the specific 1669 protocol RSVP, hop-by-hop authentication is in [RFC2747], and 1670 [Behringer07] may also be useful. 1672 Operational security advice is given in Section 9.5. 1674 12. Conclusions 1676 The document describes a general architecture for flow admission and 1677 termination based on pre-congestion information in order to protect 1678 the quality of service of established inelastic flows within a single 1679 DiffServ domain. The main topic is the functional architecture. It 1680 also mentions other topics like the assumptions and open issues. 1682 13. Acknowledgements 1684 This document is a revised version of an earlier individual draft 1685 authored by: P. Eardley, J. Babiarz, K. Chan, A. Charny, R. Geib, G. 1686 Karagiannis, M. Menth, T. Tsou. They are therefore contributors to 1687 this document. 1689 Thanks to those who have made comments on this document: Lachlan 1690 Andrew, Joe Babiarz, Fred Baker, David Black, Steven Blake, Scott 1691 Bradner, Bob Briscoe, Jason Canon, Ken Carlberg, Anna Charny, Joachim 1692 Charzinski, Andras Csaszar, Lars Eggert, Ruediger Geib, Wei Gengyu, 1693 Robert Hancock, Fortune Huang, Christian Hublet, Ingemar Johansson, 1694 Georgios Karagiannis, Hein Mekkes, Michael Menth, Toby Moncaster, 1695 Daisuke Satoh, Ben Strulo, Tom Taylor, Hannes Tschofenig, Tina Tsou, 1696 Lars Westberg, Magnus Westerlund, Delei Yu. Thanks to Bob Briscoe 1697 who extensively revised the Operations and Management section. 1699 This document is the result of discussions in the PCN WG and 1700 forerunner activity in the TSVWG. A number of previous drafts were 1701 presented to TSVWG; their authors were: B, Briscoe, P. Eardley, D. 1702 Songhurst, F. Le Faucheur, A. Charny, J. Babiarz, K. Chan, S. Dudley, 1703 G. Karagiannis, A. Bader, L. Westberg, J. Zhang, V. Liatsos, X-G. 1704 Liu, A. Bhargava. 1706 14. Comments Solicited 1708 Comments and questions are encouraged and very welcome. They can be 1709 addressed to the IETF PCN working group mailing list . 1711 15. Changes 1713 15.1. Changes from -08 to -09 1715 Small changes to deal with WG Chair comments: 1717 o tweak language in various places to make it more RFC-like and less 1718 that of a scholarly work, for instance from "we propose" to "this 1719 document describes" 1721 o tweak language in various places to make it a stand alone 1722 architecture document rather than a discussion of the PCN WG. Now 1723 only mentions WG at start of Annex. 1725 o References: IDs are no longer referenced to by the draft name 1727 o References: removed some of less important references to IDs 1729 15.2. Changes from -07 to -08 1731 Small changes from second WG last call: 1733 o Section 2: added definition for PCN-admissible-rate and PCN- 1734 supportable-rate. Small changes to use these terms as follows: 1735 Section 3, bullets 2 & 9; S6.1 para 1; S6.2 para1; S6.3 bullet 3; 1736 added to Figs 1 & 2. 1738 o added the phrase "(others might be possible") before the list of 1739 approaches in Section 6.3, 7.4 & 7.5. 1741 o added references to RFC2753 (A framework for policy-based 1742 admission control) in S7.4 & S7.5. 1744 o throughout, updated references now that marking behaviour & 1745 baseline encoding are WG drafts. 1747 o a few typos corrected 1749 15.3. Changes from -06 to -07 1751 References re-formatted to pass ID nits. No other changes. 1753 15.4. Changes from -05 to -06 1755 Minor clarifications throughout, the least insignificant are as 1756 follows: 1758 o Section 1: added to the list of encoding states in an 'extended' 1759 scheme: "or perhaps further encoding states as suggested in 1760 draft-westberg-pcn-load-control" 1762 o Section 2: added definition for PCN-colouring (to clarify that the 1763 term is used consistently differently from 'PCN-marking') 1765 o Section 6.1 and 6.2: added "(others might be possible)" before the 1766 list of high level approaches for making flow admission 1767 (termination) decisions. 1769 o Section 6.2: corrected a significant typo in 2nd bullet (more -> 1770 less) 1772 o Section 6.3: corrected a couple of significant typos in Figure 2 1774 o Section 6.5 (PCN-traffic) re-written for clarity. Non PCN-traffic 1775 contributing to PCN meters is now given as an example (there may 1776 be cases where don't need to meter it). 1778 o Section 7.7: added to the text about encapsulation being done 1779 within the PCN-domain: "Note: A tunnel will not provide this 1780 behaviour if it complies with [RFC3168] tunnelling in either mode, 1781 but it will if it complies with [RFC4301] IPSec tunnelling." 1783 o Section 7.7: added mention of [RFC4301] to the text about 1784 decapsulation being done within the PCN-domain. 1786 o Section 8: deleted the text about design goals, since this is 1787 already covered adequately earlier eg in S3. 1789 o Section 11: replaced the last sentence of bullet 1 by "There is 1790 nothing specific to PCN." 1792 o Appendix: added to open issues: possibility of automatically and 1793 periodically probing. 1795 o References: Split out Normative references (RFC2474 & RFC3246). 1797 15.5. Changes from -04 to -05 1799 Minor nits removed as follows: 1801 o Further minor changes to reflect that baseline encoding is 1802 consensus, standards track document, whilst there can be 1803 (experimental track) encoding extensions 1805 o Traffic conditioning updated to reflect discussions in Dublin, 1806 mainly that PCN-interior-nodes don't police PCN-traffic (so 1807 deleted bullet in S7.1) and that it is not advised to have non 1808 PCN-traffic that shares the same capacity (on a link) as PCN- 1809 traffic (so added bullet in S6.5) 1811 o Probing moved into Appendix A and deleted the 'third viewpoint' 1812 (admission control based on the marking of a single packet like an 1813 RSVP PATH message) - since this isn't really probing, and in any 1814 case is already mentioned in S6.1. 1816 o Minor changes to S9 Operations and management - mainly to reflect 1817 that consensus on marking behaviour has simplified things so eg 1818 there are fewer parameters to configure. 1820 o A few terminology-related errors expunged, and two pictures added 1821 to help. 1823 o Re-phrased the claim about the natural decision point in S7.4 1825 o Clarified that extended encoding schemes need to explain their 1826 interactions with (or assumptions about) tunnelling (S7.7) and how 1827 they meet the guidelines of BCP124 (S6.6) 1829 o Corrected the third bullet in S6.2 (to reflect consensus about 1830 PCN-marking) 1832 15.6. Changes from -03 to -04 1834 o Minor changes throughout to reflect the consensus call about PCN- 1835 marking (as reflected in [PCN08-2]). 1837 o Minor changes throughout to reflect the current decisions about 1838 encoding (as reflected in [PCN08-1] and [Moncaster08]). 1840 o Introduction: re-structured to create new sections on Benefits, 1841 Deployment scenarios and Assumptions. 1843 o Introduction: Added pointers to other PCN documents. 1845 o Terminology: changed PCN-lower-rate to PCN-threshold-rate and PCN- 1846 upper-rate to PCN-excess-rate; excess-rate-marking to excess- 1847 traffic-marking. 1849 o Benefits: added bullet about SRLGs. 1851 o Deployment scenarios: new section combining material from various 1852 places within the document. 1854 o S6 (high level functional architecture): re-structured and edited 1855 to improve clarity, and reflect the latest PCN-marking and 1856 encoding drafts. 1858 o S6.4: added claim that the most natural place to make an admission 1859 decision is a PCN-egress-node. 1861 o S6.5: updated the bullet about non-PCN-traffic that uses the same 1862 DSCP as PCN-traffic. 1864 o S6.6: added a section about backwards compatibility with respect 1865 to [RFC4774]. 1867 o Appendix A: added bullet about end-to-end PCN. 1869 o Probing: moved to Appendix B. 1871 o Other minor clarifications, typos etc. 1873 15.7. Changes from -02 to -03 1875 o Abstract: Clarified by removing the term 'aggregated'. Follow-up 1876 clarifications later in draft: S1: expanded PCN-egress-nodes 1877 bullet to mention case where the PCN-feedback-information is about 1878 one (or a few) PCN-marks, rather than aggregated information; S3 1879 clarified PCN-meter; S5 minor changes; conclusion. 1881 o S1: added a paragraph about how the PCN-domain looks to the 1882 outside world (essentially it looks like a DiffServ domain). 1884 o S2: tweaked the PCN-traffic terminology bullet: changed PCN 1885 traffic classes to PCN behaviour aggregates, to be more in line 1886 with traditional DiffServ jargon (-> follow-up changes later in 1887 draft); included a definition of PCN-flows (and corrected a couple 1888 of 'PCN microflows' to 'PCN-flows' later in draft) 1890 o S3.5: added possibility of downgrading to best effort, where PCN- 1891 packets arrive at PCN-ingress-node already ECN marked (CE or ECN 1892 nonce) 1894 o S4: added note about whether talk about PCN operating on an 1895 interface or on a link. In S8.1 (OAM) mentioned that PCN 1896 functionality needs to be configured consistently on either the 1897 ingress or the egress interface of PCN-nodes in a PCN-domain. 1899 o S5.2: clarified that signalling protocol installs flow filter spec 1900 at PCN-ingress-node (& updates after possible re-route) 1902 o S5.6: addressing: clarified 1904 o S5.7: added tunnelling issue of N^2 scaling if you set up a mesh 1905 of tunnels between PCN-boundary-nodes 1907 o S7.3: Clarified the "third viewpoint" of probing (always probe). 1909 o S8.1: clarified that SNMP is only an example; added note that an 1910 operator may be able to not run PCN on some PCN-interior-nodes, if 1911 it knows that these links will never become (pre-)congested; added 1912 note that it may be possible to have different PCN-boundary-node 1913 behaviours for different ingress-egress-aggregates within the same 1914 PCN-domain. 1916 o Appendix: Created an Appendix about "Possible work items beyond 1917 the scope of the current PCN WG Charter". Material moved from 1918 near start of S3 and elsewhere throughout draft. Moved text about 1919 centralised decision node to Appendix. 1921 o Other minor clarifications. 1923 15.8. Changes from -01 to -02 1925 o S1: Benefits: provisioning bullet extended to stress that PCN does 1926 not use RFC2475-style traffic conditioning. 1928 o S1: Deployment models: mentioned, as variant of PCN-domain 1929 extending to end nodes, that may extend to LAN edge switch. 1931 o S3.1: Trust Assumption: added note about not needing PCN-marking 1932 capability if known that an interface cannot become pre-congested. 1934 o S4: now divided into sub-sections 1936 o S4.1: Admission control: added second proposed method for how to 1937 decide to block new flows (PCN-egress-node receives one (or 1938 several) PCN-marked packets). 1940 o S5: Probing sub-section removed. Material now in new S7. 1942 o S5.6: Addressing: clarified how PCN-ingress-node can discover 1943 address of PCN-egress-node 1945 o S5.6: Addressing: centralised node case, added that PCN-ingress- 1946 node may need to know address of PCN-egress-node 1948 o S5.8: Tunnelling: added case of "partially PCN-capable tunnel" and 1949 degraded bullet on this in S6 (Open Issues) 1951 o S7: Probing: new section. Much more comprehensive than old S5.5. 1953 o S8: Operations and Management: substantially revised. 1955 o other minor changes not affecting semantics 1957 15.9. Changes from -00 to -01 1959 In addition to clarifications and nit squashing, the main changes 1960 are: 1962 o S1: Benefits: added one about provisioning (and contrast with 1963 DiffServ SLAs) 1965 o S1: Benefits: clarified that the objective is also to stop PCN- 1966 packets being significantly delayed (previously only mentioned not 1967 dropping packets) 1969 o S1: Deployment models: added one where policing is done at ingress 1970 of access network and not at ingress of PCN-domain (assume trust 1971 between networks) 1973 o S1: Deployment models: corrected MPLS-TE to MPLS 1975 o S2: Terminology: adjusted definition of PCN-domain 1977 o S3.5: Other assumptions: corrected, so that two assumptions (PCN- 1978 nodes not performing ECN and PCN-ingress-node discarding arriving 1979 CE packet) only apply if the PCN WG decides to encode PCN-marking 1980 in the ECN-field. 1982 o S4 & S5: changed PCN-marking algorithm to marking behaviour 1984 o S4: clarified that PCN-interior-node functionality applies for 1985 each outgoing interface, and added clarification: "The 1986 functionality is also done by PCN-ingress-nodes for their outgoing 1987 interfaces (ie those 'inside' the PCN-domain)." 1989 o S4 (near end): altered to say that a PCN-node "should" dedicate 1990 some capacity to lower priority traffic so that it isn't starved 1991 (was "may") 1993 o S5: clarified to say that PCN functionality is done on an 1994 'interface' (rather than on a 'link') 1996 o S5.2: deleted erroneous mention of service level agreement 1997 o S5.5: Probing: re-written, especially to distinguish probing to 1998 test the ingress-egress-aggregate from probing to test a 1999 particular ECMP path. 2001 o S5.7: Addressing: added mention of probing; added that in the case 2002 where traffic is always tunnelled across the PCN-domain, add a 2003 note that he PCN-ingress-node needs to know the address of the 2004 PCN-egress-node. 2006 o S5.8: Tunnelling: re-written, especially to provide a clearer 2007 description of copying on tunnel entry/exit, by adding explanation 2008 (keeping tunnel encaps/decaps and PCN-marking orthogonal), 2009 deleting one bullet ("if the inner header's marking state is more 2010 sever then it is preserved" - shouldn't happen), and better 2011 referencing of other IETF documents. 2013 o S6: Open issues: stressed that "NOTE: Potential solutions are out 2014 of scope for this document" and edited a couple of sentences that 2015 were close to solution space. 2017 o S6: Open issues: added one about scenarios with only one tunnel 2018 endpoint in the PCN domain . 2020 o S6: Open issues: ECMP: added under-admission as another potential 2021 risk 2023 o S6: Open issues: added one about "Silent at start" 2025 o S10: Conclusions: a small conclusions section added 2027 16. Appendix: Possible future work items 2029 This section mentions some topics that are outside the PCN WG's 2030 current charter, but which have been mentioned as areas of interest. 2031 They might be work items for: the PCN WG after a future re- 2032 chartering; some other IETF WG; another standards body; an operator- 2033 specific usage that is not standardised. 2035 NOTE: it should be crystal clear that this section discusses 2036 possibilities only. 2038 The first set of possibilities relate to the restrictions described 2039 in Section 5: 2041 o a single PCN-domain encompasses several autonomous systems that do 2042 not trust each other, perhaps by using a mechanism like re-PCN, 2043 [Briscoe08-1]. 2045 o not all the nodes run PCN. For example, the PCN-domain is a 2046 multi-site enterprise network. The sites are connected by a VPN 2047 tunnel; although PCN doesn't operate inside the tunnel, the PCN 2048 mechanisms still work properly because of the good QoS on the 2049 virtual link (the tunnel). Another example is that PCN is 2050 deployed on the general Internet (ie widely but not universally 2051 deployed). 2053 o applying the PCN mechanisms to other types of traffic, ie beyond 2054 inelastic traffic. For instance, applying the PCN mechanisms to 2055 traffic scheduled with the Assured Forwarding per-hop behaviour. 2056 One example could be flow-rate adaptation by elastic applications 2057 that adapt according to the pre-congestion information. 2059 o the aggregation assumption doesn't hold, because the link capacity 2060 is too low. Measurement-based admission control is less accurate, 2061 with a greater risk of over-admission for instance. 2063 o the applicability of PCN mechanisms for emergency use (911, GETS, 2064 WPS, MLPP, etc.) 2066 Other possibilities include: 2068 o Probing. This is discussed in Section 16.1 below. 2070 o The PCN-domain extends to the end users. The scenario is 2071 described in [Babiarz06]. The end users need to be trusted to do 2072 their own policing. If there is sufficient traffic, then the 2073 aggregation assumption may hold. A variant is that the PCN-domain 2074 extends out as far as the LAN edge switch. 2076 o indicating pre-congestion through signalling messages rather than 2077 in-band (in the form of PCN-marked packets) 2079 o the decision-making functionality is at a centralised node rather 2080 than at the PCN-boundary-nodes. This requires that the PCN- 2081 egress-node signals PCN-feedback-information to the centralised 2082 node, and that the centralised node signals to the PCN-ingress- 2083 node the decision about admission (or termination). It may need 2084 the centralised node and the PCN-boundary-nodes to be configured 2085 with each other's addresses. The centralised case is described 2086 further in [Tsou08]. 2088 o Signalling extensions for specific protocols (eg RSVP, NSIS). For 2089 example: the details of how the signalling protocol installs the 2090 flowspec at the PCN-ingress-node for an admitted PCN-flow; and how 2091 the signalling protocol carries the PCN-feedback-information. 2092 Perhaps also for other functions such as: coping with failure of a 2093 PCN-boundary-node ([Briscoe06] considers what happens if RSVP is 2094 the QoS signalling protocol); establishing a tunnel across the 2095 PCN-domain if it is necessary to carry ECN marks transparently. 2097 o Policing by the PCN-ingress-node may not be needed if the PCN- 2098 domain can trust that the upstream network has already policed the 2099 traffic on its behalf. 2101 o PCN for Pseudowire: PCN may be used as a congestion avoidance 2102 mechanism for edge to edge pseudowire emulations [PWE3-08]. 2104 o PCN for MPLS: [RFC3270] defines how to support the DiffServ 2105 architecture in MPLS networks (Multi-protocol label switching). 2106 [RFC5129] describes how to add PCN for admission control of 2107 microflows into a set of MPLS aggregates. PCN-marking is done in 2108 MPLS's EXP field (which [MPLS08] re-names the Class of Service 2109 (CoS) field). 2111 o PCN for Ethernet: Similarly, it may be possible to extend PCN into 2112 Ethernet networks, where PCN-marking is done in the Ethernet 2113 header. NOTE: Specific consideration of this extension is outside 2114 the IETF's remit. 2116 16.1. Probing 2118 16.1.1. Introduction 2120 Probing is a potential mechanism to assist admission control. 2122 PCN's admission control, as described so far, is essentially a 2123 reactive mechanism where the PCN-egress-node monitors the pre- 2124 congestion level for traffic from each PCN-ingress-node; if the level 2125 rises then it blocks new flows on that ingress-egress-aggregate. 2126 However, it's possible that an ingress-egress-aggregate carries no 2127 traffic, and so the PCN-egress-node can't make an admission decision 2128 using the usual method described earlier. 2130 One approach is to be "optimistic" and simply admit the new flow. 2131 However it's possible to envisage a scenario where the traffic levels 2132 on other ingress-egress-aggregates are already so high that they're 2133 blocking new PCN-flows, and admitting a new flow onto this 'empty' 2134 ingress-egress-aggregate adds extra traffic onto a link that is 2135 already pre-congested - which may 'tip the balance' so that PCN's 2136 flow termination mechanism is activated or some packets are dropped. 2137 This risk could be lessened by configuring on each link sufficient 2138 'safety margin' above the PCN-threshold-rate. 2140 An alternative approach is to make PCN a more proactive mechanism. 2142 The PCN-ingress-node explicitly determines, before admitting the 2143 prospective new flow, whether the ingress-egress-aggregate can 2144 support it. This can be seen as a "pessimistic" approach, in 2145 contrast to the "optimism" of the approach above. It involves 2146 probing: a PCN-ingress-node generates and sends probe packets in 2147 order to test the pre-congestion level that the flow would 2148 experience. 2150 One possibility is that a probe packet is just a dummy data packet, 2151 generated by the PCN-ingress-node and addressed to the PCN-egress- 2152 node. 2154 16.1.2. Probing functions 2156 The probing functions are: 2158 o Make decision that probing is needed. As described above, this is 2159 when the ingress-egress-aggregate (or the ECMP path - Section 8) 2160 carries no PCN-traffic. An alternative is always to probe, ie 2161 probe before admitting every PCN-flow. 2163 o (if required) Communicate the request that probing is needed - the 2164 PCN-egress-node signals to the PCN-ingress-node that probing is 2165 needed 2167 o (if required) Generate probe traffic - the PCN-ingress-node 2168 generates the probe traffic. The appropriate number (or rate) of 2169 probe packets will depend on the PCN-marking algorithm; for 2170 example an excess-traffic-marking algorithm generates fewer PCN- 2171 marks than a threshold-marking algorithm, and so will need more 2172 probe packets. 2174 o Forward probe packets - as far as PCN-interior-nodes are 2175 concerned, probe packets are handled the same as (ordinary data) 2176 PCN-packets, in terms of routing, scheduling and PCN-marking. 2178 o Consume probe packets - the PCN-egress-node consumes probe packets 2179 to ensure that they don't travel beyond the PCN-domain. 2181 16.1.3. Discussion of rationale for probing, its downsides and open 2182 issues 2184 It is an unresolved question whether probing is really needed, but 2185 two viewpoints have been put forward as to why it is useful. The 2186 first is perhaps the most obvious: there is no PCN-traffic on the 2187 ingress-egress-aggregate. The second assumes that multipath routing 2188 ECMP is running in the PCN-domain. We now consider each in turn. 2190 The first viewpoint assumes the following: 2192 o There is no PCN-traffic on the ingress-egress-aggregate (so a 2193 normal admission decision cannot be made). 2195 o Simply admitting the new flow has a significant risk of leading to 2196 overload: packets dropped or flows terminated. 2198 On the former bullet, [Eardley07] suggests that, during the future 2199 busy hour of a national network with about 100 PCN-boundary-nodes, 2200 there are likely to be significant numbers of aggregates with very 2201 few flows under nearly all circumstances. 2203 The latter bullet could occur if new flows start on many of the empty 2204 ingress-egress-aggregates, which together overload a link in the PCN- 2205 domain. To be a problem this would probably have to happen in a 2206 short time period (flash crowd) because, after the reaction time of 2207 the system, other (non-empty) ingress-egress-aggregates that pass 2208 through the link will measure pre-congestion and so block new flows. 2209 Also, flows naturally end anyway. 2211 The downsides of probing for this viewpoint are: 2213 o Probing adds delay to the admission control process. 2215 o Sufficient probing traffic has to be generated to test the pre- 2216 congestion level of the ingress-egress-aggregate. But the probing 2217 traffic itself may cause pre-congestion, causing other PCN-flows 2218 to be blocked or even terminated - and in the flash crowd scenario 2219 there will be probing on many ingress-egress-aggregates. 2221 The second viewpoint applies in the case where there is multipath 2222 routing (ECMP) in the PCN-domain. Note that ECMP is often used on 2223 core networks. There are two possibilities: 2225 (1) If admission control is based on measurements of the ingress- 2226 egress-aggregate, then the viewpoint that probing is useful assumes: 2228 o there's a significant chance that the traffic is unevenly balanced 2229 across the ECMP paths, and hence there's a significant risk of 2230 admitting a flow that should be blocked (because it follows an 2231 ECMP path that is pre-congested) or blocking a flow that should be 2232 admitted. 2234 o Note: [Charny07-3] suggests unbalanced traffic is quite possible, 2235 even with quite a large number of flows on a PCN-link (eg 1000) 2236 when Assumption 3 (aggregation) is likely to be satisfied. 2238 (2) If admission control is based on measurements of pre-congestion 2239 on specific ECMP paths, then the viewpoint that probing is useful 2240 assumes: 2242 o There is no PCN-traffic on the ECMP path on which to base an 2243 admission decision. 2245 o Simply admitting the new flow has a significant risk of leading to 2246 overload. 2248 o The PCN-egress-node can match a packet to an ECMP path. 2250 o Note: This is similar to the first viewpoint and so similarly 2251 could occur in a flash crowd if a new flow starts more-or-less 2252 simultaneously on many of the empty ECMP paths. Because there are 2253 several (sometimes many) ECMP paths between each pair of PCN- 2254 boundary-nodes, it's presumably more likely that an ECMP path is 2255 'empty' than an ingress-egress-aggregate is. To constrain the 2256 number of ECMP paths, a few tunnels could be set-up between each 2257 pair of PCN-boundary-nodes. Tunnelling also solves the issue in 2258 the bullet immediately above (which is otherwise hard because an 2259 ECMP routing decision is made independently on each node). 2261 The downsides of probing for this viewpoint are: 2263 o Probing adds delay to the admission control process. 2265 o Sufficient probing traffic has to be generated to test the pre- 2266 congestion level of the ECMP path. But there's the risk that the 2267 probing traffic itself may cause pre-congestion, causing other 2268 PCN-flows to be blocked or even terminated. 2270 o The PCN-egress-node needs to consume the probe packets to ensure 2271 they don't travel beyond the PCN-domain, since they might confuse 2272 the destination end node. This is non-trivial, since probe 2273 packets are addressed to the destination end node, in order to 2274 test the relevant ECMP path (ie they are not addressed to the PCN- 2275 egress-node, unlike the first viewpoint above). 2277 The open issues associated with this viewpoint include: 2279 o What rate and pattern of probe packets does the PCN-ingress-node 2280 need to generate, so that there's enough traffic to make the 2281 admission decision? 2283 o What difficulty does the delay (whilst probing is done), and 2284 possible packet drops, cause applications? 2286 o Can the delay be alleviated by automatically and periodically 2287 probing on the ingress-egress-aggregate? Or does this add too 2288 much overhead? 2290 o Are there other ways of dealing with the flash crowd scenario? 2291 For instance, by limiting the rate at which new flows are 2292 admitted; or perhaps by a PCN-egress-node blocking new flows on 2293 its empty ingress-egress-aggregates when its non-empty ones are 2294 pre-congested. 2296 o (Second viewpoint only) How does the PCN-egress-node disambiguate 2297 probe packets from data packets (so it can consume the former)? 2298 The PCN-egress-node must match the characteristic setting of 2299 particular bits in the probe packet's header or body - but these 2300 bits must not be used by any PCN-interior-node's ECMP algorithm. 2301 In the general case this isn't possible, but it should be possible 2302 for a typical ECMP algorithm (which examines: the source and 2303 destination IP addresses and port numbers, the protocol ID, and 2304 the DSCP). 2306 17. References 2308 17.1. Normative References 2310 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 2311 "Definition of the Differentiated Services Field (DS 2312 Field) in the IPv4 and IPv6 Headers", RFC 2474, 2313 December 1998. 2315 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 2316 J., Courtney, W., Davari, S., Firoiu, V., and D. 2317 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 2318 Behavior)", RFC 3246, March 2002. 2320 17.2. Informative References 2322 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 2323 Services in the Internet Architecture: an Overview", 2324 RFC 1633, June 1994. 2326 [RFC2211] Wroclawski, J., "Specification of the Controlled-Load 2327 Network Element Service", RFC 2211, September 1997. 2329 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 2330 and W. Weiss, "An Architecture for Differentiated 2331 Services", RFC 2475, December 1998. 2333 [RFC2747] Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic 2334 Authentication", RFC 2747, January 2000. 2336 [RFC2753] Yavatkar, R., Pendarakis, D., and R. Guerin, "A Framework 2337 for Policy-based Admission Control", RFC 2753, 2338 January 2000. 2340 [RFC2983] Black, D., "Differentiated Services and Tunnels", 2341 RFC 2983, October 2000. 2343 [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., 2344 Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. 2345 Felstaine, "A Framework for Integrated Services Operation 2346 over Diffserv Networks", RFC 2998, November 2000. 2348 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 2349 of Explicit Congestion Notification (ECN) to IP", 2350 RFC 3168, September 2001. 2352 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 2353 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 2354 Protocol Label Switching (MPLS) Support of Differentiated 2355 Services", RFC 3270, May 2002. 2357 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 2358 Metric for IP Performance Metrics (IPPM)", RFC 3393, 2359 November 2002. 2361 [RFC3411] Harrington, D., Presuhn, R., and B. Wijnen, "An 2362 Architecture for Describing Simple Network Management 2363 Protocol (SNMP) Management Frameworks", STD 62, RFC 3411, 2364 December 2002. 2366 [RFC4216] Zhang, R. and J. Vasseur, "MPLS Inter-Autonomous System 2367 (AS) Traffic Engineering (TE) Requirements", RFC 4216, 2368 November 2005. 2370 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 2371 Internet Protocol", RFC 4301, December 2005. 2373 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 2374 RFC 4303, December 2005. 2376 [RFC4594] Babiarz, J., Chan, K., and F. Baker, "Configuration 2377 Guidelines for DiffServ Service Classes", RFC 4594, 2378 August 2006. 2380 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 2382 Zekauskas, "A One-way Active Measurement Protocol 2383 (OWAMP)", RFC 4656, September 2006. 2385 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 2386 Explicit Congestion Notification (ECN) Field", BCP 124, 2387 RFC 4774, November 2006. 2389 [RFC4778] Kaeo, M., "Operational Security Current Practices in 2390 Internet Service Provider Environments", RFC 4778, 2391 January 2007. 2393 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 2394 Marking in MPLS", RFC 5129, January 2008. 2396 [P.800] "Methods for subjective determination of transmission 2397 quality", ITU-T Recommendation P.800, August 1996. 2399 [Y.1541] "Network Performance Objectives for IP-based Services", 2400 ITU-T Recommendation Y.1541, February 2006. 2402 [MPLS08] "Multi-Protocol Label Switching (MPLS) label stack entry: 2403 "EXP" field renamed to "Traffic Class" field (work in 2404 progress)", Dec 2008. 2406 [PCN08-1] "Baseline Encoding and Transport of Pre-Congestion 2407 Information", Oct 2008. 2409 [PCN08-2] "Marking behaviour of PCN-nodes (work in progress)", 2410 Oct 2008. 2412 [PWE3-08] "Pseudowire Congestion Control Framework (work in 2413 progress)", May 2008. 2415 [Babiarz06] 2416 "SIP Controlled Admission and Preemption (work in 2417 progress)", Oct 2006. 2419 [Behringer07] 2420 "Applicability of Keying Methods for RSVP Security (work 2421 in progress)", Nov 2007. 2423 [Briscoe06] 2424 "An edge-to-edge Deployment Model for Pre-Congestion 2425 Notification: Admission Control over a DiffServ Region 2426 (work in progress)", October 2006. 2428 [Briscoe08-1] 2429 "Emulating Border Flow Policing using Re-PCN on Bulk Data 2430 (work in progress)", Sept 2008. 2432 [Briscoe08-2] 2433 "Layered Encapsulation of Congestion Notification (work in 2434 progress)", July 2008. 2436 [Charny07-1] 2437 "Comparison of Proposed PCN Approaches (work in 2438 progress)", November 2007. 2440 [Charny07-2] 2441 "Pre-Congestion Notification Using Single Marking for 2442 Admission and Termination (work in progress)", 2443 November 2007. 2445 [Charny07-3] 2446 "Email to PCN WG mailing list", November 2007, . 2449 [Charny08] 2450 "Email to PCN WG mailing list", March 2008, . 2453 [Eardley07] 2454 "Email to PCN WG mailing list", October 2007, . 2457 [Hancock02] 2458 "Slide 14 of 'NSIS: An Outline Framework for QoS 2459 Signalling'", May 2002, . 2462 [Iyer03] "An approach to alleviate link overload as observed on an 2463 IP backbone", IEEE INFOCOM , 2003, 2464 . 2466 [Lefaucheur06] 2467 "RSVP Extensions for Admission Control over Diffserv using 2468 Pre-congestion Notification (PCN) (work in progress)", 2469 June 2006. 2471 [Menth07] "PCN-Based Resilient Network Admission Control: The Impact 2472 of a Single Bit"", Technical Report , 2007, . 2476 [Menth08-1] 2477 "Edge-Assisted Marked Flow Termination (work in 2478 progress)", February 2008. 2480 [Menth08-2] 2481 "PCN Encoding for Packet-Specific Dual Marking (PSDM) 2482 (work in progress)", July 2008. 2484 [Menth08-3] 2485 "PCN-Based Admission Control and Flow Termination", 2008, 2486 . 2489 [Moncaster08] 2490 "A three state extended PCN encoding scheme (work in 2491 progress)", June 2008. 2493 [Sarker08] 2494 "Usecases and Benefits of end to end ECN support in PCN 2495 Domains (work in progress)", November 2008. 2497 [Songhurst06] 2498 "Guaranteed QoS Synthesis for Admission Control with 2499 Shared Capacity", BT Technical Report TR-CXR9-2006-001, 2500 Feburary 2006, . 2503 [Style] "Guardian Style", Note: This document uses the 2504 abbreviations 'ie' and 'eg' (not 'i.e.' and 'e.g.'), as in 2505 many style guides, eg, 2007, 2506 . 2508 [Tsou08] "Applicability Statement for the Use of Pre-Congestion 2509 Notification in a Resource-Controlled Network (work in 2510 progress)", November 2008. 2512 [Westberg08] 2513 "LC-PCN: The Load Control PCN Solution (work in 2514 progress)", November 2008. 2516 Author's Address 2518 Philip Eardley 2519 BT 2520 B54/77, Sirius House Adastral Park Martlesham Heath 2521 Ipswich, Suffolk IP5 3RE 2522 United Kingdom 2524 Email: philip.eardley@bt.com