idnits 2.17.1 draft-ietf-pcn-architecture-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2599. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2610. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2617. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2623. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 20, 2008) is 5666 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-08) exists of draft-ietf-mpls-cosfield-def-05 == Outdated reference: A later version (-07) exists of draft-ietf-pcn-baseline-encoding-01 == Outdated reference: A later version (-05) exists of draft-ietf-pcn-marking-behaviour-00 == Outdated reference: A later version (-02) exists of draft-ietf-pwe3-congestion-frmwk-01 == Outdated reference: A later version (-03) exists of draft-briscoe-re-pcn-border-cheat-02 == Outdated reference: A later version (-01) exists of draft-moncaster-pcn-3-state-encoding-00 == Outdated reference: A later version (-02) exists of draft-sarker-pcn-ecn-pcn-usecases-01 == Outdated reference: A later version (-01) exists of draft-tsou-pcn-racf-applic-00 == Outdated reference: A later version (-05) exists of draft-westberg-pcn-load-control-04 Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion and Pre-Congestion Philip. Eardley (Editor) 3 Notification Working Group BT 4 Internet-Draft October 20, 2008 5 Intended status: Informational 6 Expires: April 23, 2009 8 Pre-Congestion Notification (PCN) Architecture 9 draft-ietf-pcn-architecture-08 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 23, 2009. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 This document describes a general architecture for flow admission and 43 termination based on pre-congestion information in order to protect 44 the quality of service of established inelastic flows within a single 45 DiffServ domain. 47 Status 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 3. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 54 4. Deployment scenarios . . . . . . . . . . . . . . . . . . . . . 9 55 5. Assumptions and constraints on scope . . . . . . . . . . . . . 12 56 5.1. Assumption 1: Trust and support of PCN - controlled 57 environment . . . . . . . . . . . . . . . . . . . . . . . 12 58 5.2. Assumption 2: Real-time applications . . . . . . . . . . . 13 59 5.3. Assumption 3: Many flows and additional load . . . . . . . 13 60 5.4. Assumption 4: Emergency use out of scope . . . . . . . . . 14 61 6. High-level functional architecture . . . . . . . . . . . . . . 14 62 6.1. Flow admission . . . . . . . . . . . . . . . . . . . . . . 16 63 6.2. Flow termination . . . . . . . . . . . . . . . . . . . . . 16 64 6.3. Flow admission and/or flow termination when there are 65 only two PCN encoding states . . . . . . . . . . . . . . . 17 66 6.4. Information transport . . . . . . . . . . . . . . . . . . 18 67 6.5. PCN-traffic . . . . . . . . . . . . . . . . . . . . . . . 19 68 6.6. Backwards compatibility . . . . . . . . . . . . . . . . . 20 69 7. Detailed Functional architecture . . . . . . . . . . . . . . . 20 70 7.1. PCN-interior-node functions . . . . . . . . . . . . . . . 21 71 7.2. PCN-ingress-node functions . . . . . . . . . . . . . . . . 21 72 7.3. PCN-egress-node functions . . . . . . . . . . . . . . . . 22 73 7.4. Admission control functions . . . . . . . . . . . . . . . 23 74 7.5. Flow termination functions . . . . . . . . . . . . . . . . 23 75 7.6. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 24 76 7.7. Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . 25 77 7.8. Fault handling . . . . . . . . . . . . . . . . . . . . . . 27 78 8. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 27 79 9. Operations and Management . . . . . . . . . . . . . . . . . . 29 80 9.1. Configuration OAM . . . . . . . . . . . . . . . . . . . . 29 81 9.1.1. System options . . . . . . . . . . . . . . . . . . . . 30 82 9.1.2. Parameters . . . . . . . . . . . . . . . . . . . . . . 31 83 9.2. Performance & Provisioning OAM . . . . . . . . . . . . . . 33 84 9.3. Accounting OAM . . . . . . . . . . . . . . . . . . . . . . 34 85 9.4. Fault OAM . . . . . . . . . . . . . . . . . . . . . . . . 34 86 9.5. Security OAM . . . . . . . . . . . . . . . . . . . . . . . 35 87 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 88 11. Security considerations . . . . . . . . . . . . . . . . . . . 36 89 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 90 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 91 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38 92 15. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 93 15.1. Changes from -07 to -08 . . . . . . . . . . . . . . . . . 38 94 15.2. Changes from -06 to -07 . . . . . . . . . . . . . . . . . 38 95 15.3. Changes from -05 to -06 . . . . . . . . . . . . . . . . . 38 96 15.4. Changes from -04 to -05 . . . . . . . . . . . . . . . . . 39 97 15.5. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 40 98 15.6. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 41 99 15.7. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 42 100 15.8. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 43 101 16. Appendix: Possible work items beyond the scope of the 102 current PCN WG charter . . . . . . . . . . . . . . . . . . . . 44 103 16.1. Probing . . . . . . . . . . . . . . . . . . . . . . . . . 46 104 16.1.1. Introduction . . . . . . . . . . . . . . . . . . . . . 46 105 16.1.2. Probing functions . . . . . . . . . . . . . . . . . . 47 106 16.1.3. Discussion of rationale for probing, its downsides 107 and open issues . . . . . . . . . . . . . . . . . . . 47 108 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 50 109 17.1. Normative References . . . . . . . . . . . . . . . . . . . 50 110 17.2. Informative References . . . . . . . . . . . . . . . . . . 50 111 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 55 112 Intellectual Property and Copyright Statements . . . . . . . . . . 57 114 1. Introduction 116 The purpose of this document is to describe a general architecture 117 for flow admission and termination based on (pre-) congestion 118 information in order to protect the quality of service of flows 119 within a DiffServ domain [RFC2475]. This document defines an 120 architecture for implementing two mechanisms to protect the quality 121 of service of established inelastic flows within a single DiffServ 122 domain, where all boundary and interior nodes are PCN-enabled and are 123 trusted for correct PCN operation. Flow admission control determines 124 whether a new flow should be admitted, in order to protect the QoS of 125 existing PCN-flows in normal circumstances. However, in abnormal 126 circumstances, for instance a disaster affecting multiple nodes and 127 causing traffic re-routes, then the QoS on existing PCN-flows may 128 degrade even though care was exercised when admitting those flows. 129 Therefore we also propose a mechanism for flow termination, which 130 removes enough traffic in order to protect the QoS of the remaining 131 PCN-flows. 133 As a fundamental building block to enable these two mechanisms, PCN- 134 interior-nodes generate, encode and transport pre-congestion 135 information towards the PCN-egress-nodes. Two rates, a PCN- 136 threshold-rate and a PCN-excess-rate, are associated with each link 137 of the PCN-domain. Each rate is used by a marking behaviour that 138 determines how and when PCN-packets are marked, and how the markings 139 are encoded in packet headers. Overall the aim is to enable PCN- 140 nodes to give an "early warning" of potential congestion before there 141 is any significant build-up of PCN-packets in the queue. 143 PCN-boundary-nodes convert measurements of these PCN-markings into 144 decisions about flow admission and termination. In a PCN-domain with 145 both threshold marking and excess traffic marking enabled, then the 146 admission control mechanism limits the PCN-traffic on each link to 147 *roughly* its PCN-threshold-rate and the flow termination mechanism 148 limits the PCN-traffic on each link to *roughly* its PCN-excess-rate. 149 Other scenarios are discussed later. 151 The behaviour of PCN-interior-nodes is standardised in other 152 documents, which are summarised in this document: 154 o Marking behaviour: threshold marking and excess traffic marking 155 [I-D.ietf-pcn-marking-behaviour]. Threshold marking marks all 156 PCN-packets if the PCN traffic rate is greater than a first 157 configured rate, "PCN-threshold-rate". Excess traffic marking 158 marks a proportion of PCN-packets, such that the amount marked 159 equals the traffic rate in excess of a second configured rate, 160 "PCN-excess-rate". 162 o Encoding: a combination of the DSCP field and ECN field in the IP 163 header indicates that a packet is a PCN-packet and whether it is 164 PCN-marked. The "baseline" encoding is standardised in 165 [I-D.ietf-pcn-baseline-encoding], which standardises two PCN 166 encoding states (PCN-marked and not PCN-marked), whilst 167 (experimental) extensions to the baseline encoding can provide 168 three encoding states (threshold-marked, excess-traffic-marked, 169 not PCN-marked, or perhaps further encoding states as suggested in 170 [I-D.westberg-pcn-load-control]). PCN encoding therefore defines 171 semantics for the ECN field different from the default semantics 172 of [RFC3168], and so its encoding needs to meet the guidelines of 173 BCP 124 [RFC4774]. 175 The behaviour of PCN-boundary-nodes is described in Informational 176 documents. Several possibilities are outlined in this document; 177 detailed descriptions and comparisons are in 178 [I-D.charny-pcn-comparison] and [Menth08]. 180 This document describes the PCN architecture at a high level (Section 181 6) and in more detail (Section 7). It also defines some terminology 182 and outlines some benefits, deployment scenarios, and assumptions of 183 PCN (Sections 2-5). Finally it outlines some challenges, operations 184 and management, and security considerations, and some potential 185 future work items (Sections 8, 9, 11 and Appendix). 187 2. Terminology 189 o PCN-domain: a PCN-capable domain; a contiguous set of PCN-enabled 190 nodes that perform DiffServ scheduling [RFC2474]; the complete set 191 of PCN-nodes whose PCN-marking can in principle influence 192 decisions about flow admission and termination for the PCN-domain, 193 including the PCN-egress-nodes, which measure these PCN-marks. 195 o PCN-boundary-node: a PCN-node that connects one PCN-domain to a 196 node either in another PCN-domain or in a non PCN-domain. 198 o PCN-interior-node: a node in a PCN-domain that is not a PCN- 199 boundary-node. 201 o PCN-node: a PCN-boundary-node or a PCN-interior-node 203 o PCN-egress-node: a PCN-boundary-node in its role in handling 204 traffic as it leaves a PCN-domain. 206 o PCN-ingress-node: a PCN-boundary-node in its role in handling 207 traffic as it enters a PCN-domain. 209 o PCN-traffic, PCN-packets, PCN-BA: a PCN-domain carries traffic of 210 different DiffServ behaviour aggregates (BAs) [RFC2474]. The 211 PCN-BA uses the PCN mechanisms to carry PCN-traffic and the 212 corresponding packets are PCN-packets. The same network will 213 carry traffic of other DiffServ BAs. The PCN-BA is distinguished 214 by a combination of the DiffServ codepoint (DSCP) and ECN fields. 216 o PCN-flow: the unit of PCN-traffic that the PCN-boundary-node 217 admits (or terminates); the unit could be a single microflow (as 218 defined in [RFC2474]) or some identifiable collection of 219 microflows. 221 o Ingress-egress-aggregate: The collection of PCN-packets from all 222 PCN-flows that travel in one direction between a specific pair of 223 PCN-boundary-nodes. 225 o PCN-threshold-rate: a reference rate configured for each link in 226 the PCN-domain, which is lower than the PCN-excess-rate. It is 227 used by a marking behaviour that determines whether a packet 228 should be PCN-marked with a first encoding, "threshold-marked". 230 o PCN-excess-rate: a reference rate configured for each link in the 231 PCN-domain, which is higher than the PCN-threshold-rate. It is 232 used by a marking behaviour that determines whether a packet 233 should be PCN-marked with a second encoding, "excess-traffic- 234 marked". 236 o Threshold-marking: a PCN-marking behaviour with the objective that 237 all PCN-traffic is marked if the PCN-traffic exceeds the PCN- 238 threshold-rate. 240 o Excess-traffic-marking: a PCN-marking behaviour with the objective 241 that the amount of PCN-traffic that is PCN-marked is equal to the 242 amount that exceeds the PCN-excess-rate. 244 o Pre-congestion: a condition of a link within a PCN-domain such 245 that the PCN-node performs PCN-marking, in order to provide an 246 "early warning" of potential congestion before there is any 247 significant build-up of PCN-packets in the real queue. (Hence, by 248 analogy with ECN we call our mechanism Pre-Congestion 249 Notification.) 251 o PCN-marking: the process of setting the header in a PCN-packet 252 based on defined rules, in reaction to pre-congestion; either 253 threshold-marking or excess-traffic-marking. 255 o PCN-colouring: the process of setting the header in a PCN-packet 256 by a PCN-boundary-node; performed by a PCN-ingress-node so that 257 PCN-nodes can easily identify PCN-packets; performed by a PCN- 258 egress-node so that the header is appropriate for nodes beyond the 259 PCN-domain. 261 o PCN-feedback-information: information signalled by a PCN-egress- 262 node to a PCN-ingress-node (or a central control node), which is 263 needed for the flow admission and flow termination mechanisms. 265 o PCN-admissible-rate: the rate of PCN-traffic on a link up to which 266 PCN admission control should accept new PCN-flows. 268 o PCN-supportable-rate: the rate of PCN-traffic on a link down to 269 which PCN flow termination should, if necessary, terminate already 270 admitted PCN-flows. 272 3. Benefits 274 We believe that the key benefits of the PCN mechanisms described in 275 this document are that they are simple, scalable, and robust because: 277 o Per flow state is only required at the PCN-ingress-nodes 278 ("stateless core"). This is required for policing purposes (to 279 prevent non-admitted PCN traffic from entering the PCN-domain) and 280 so on. It is not generally required that other network entities 281 are aware of individual flows (although they may be in particular 282 deployment scenarios). 284 o Admission control is resilient: with PCN QoS is decoupled from the 285 routing system. Hence in general admitted flows can survive 286 capacity, routing or topology changes without additional 287 signalling. The PCN-admissible-rate on each link can be chosen 288 small enough that admitted traffic can still be carried after a 289 rerouting in most failure cases [Menth]. This is an important 290 feature as QoS violations in core networks due to link failures 291 are more likely than QoS violations due to increased traffic 292 volume [Iyer]. 294 o The PCN-marking behaviours only operate on the overall PCN-traffic 295 on the link, not per flow. 297 o The information of these measurements is signalled to the PCN- 298 egress-nodes by the PCN-marks in the packet headers, ie [Style] 299 "in-band". No additional signalling protocol is required for 300 transporting the PCN-marks. Therefore no secure binding is 301 required between data packets and separate congestion messages. 303 o The PCN-egress-nodes make separate measurements, operating on the 304 aggregate PCN-traffic from each PCN-ingress-node, ie not per flow. 305 Similarly, signalling by the PCN-egress-node of PCN-feedback- 306 information (which is used for flow admission and termination 307 decisions) is at the granularity of the ingress-egress-aggregate. 308 An alternative approach is that the PCN-egress-nodes monitor the 309 PCN-traffic and signal PCN-feedback-information (which is used for 310 flow admission and termination decisions) at the granularity of 311 one (or a few) PCN-marks. 313 o The admitted PCN-load is controlled dynamically. Therefore it 314 adapts as the traffic matrix changes, and also if the network 315 topology changes (eg after a link failure). Hence an operator can 316 be less conservative when deploying network capacity, and less 317 accurate in their prediction of the PCN-traffic matrix. 319 o The termination mechanism complements admission control. It 320 allows the network to recover from sudden unexpected surges of 321 PCN-traffic on some links, thus restoring QoS to the remaining 322 flows. Such scenarios are expected to be rare but not impossible. 323 They can be caused by large network failures that redirect lots of 324 admitted PCN-traffic to other links, or by malfunction of the 325 measurement-based admission control in the presence of admitted 326 flows that send for a while with an atypically low rate and then 327 increase their rates in a correlated way. 329 o Flow termination can also enable an operator to be less 330 conservative when deploying network capacity. It is an 331 alternative to running links at low utilisation in order to 332 protect against link or node failures. This is especially the 333 case with SRLGs (shared risk link groups, which are links that 334 share a resource, such as a fibre, whose failure affects all those 335 links [RFC4216]). A requirement to fully protect traffic against 336 a single SRLG failure requires low utilisation (~10%) of the link 337 bandwidth on some links before failure [PCN-email-SRLG]. 339 o The PCN-supportable-rate may be set below the maximum rate that 340 PCN-traffic can be transmitted on a link, in order to trigger 341 termination of some PCN-flows before loss (or excessive delay) of 342 PCN-packets occurs, or to keep the maximum PCN-load on a link 343 below a level configured by the operator. 345 o Provisioning of the network is decoupled from the process of 346 adding new customers. By contrast, with the DiffServ architecture 347 [RFC2475] operators rely on subscription-time Service Level 348 Agreements, which statically define the parameters of the traffic 349 that will be accepted from a customer, and so the operator has to 350 verify provision is sufficient each time a new customer is added 351 to check that the Service Level Agreement can be fulfilled. A 352 PCN-domain doesn't need such traffic conditioning. 354 4. Deployment scenarios 356 Operators of networks will want to use the PCN mechanisms in various 357 arrangements, for instance depending on how they are performing 358 admission control outside the PCN-domain (users after all are 359 concerned about QoS end-to-end), what their particular goals and 360 assumptions are, how many PCN encoding states are available, and so 361 on. 363 From the perspective of the outside world, a PCN-domain essentially 364 looks like a DiffServ domain. PCN-traffic is either transported 365 across it transparently or policed at the PCN-ingress-node (ie 366 dropped or carried at a lower QoS). One difference is that PCN- 367 traffic has better QoS guarantees than normal DiffServ traffic, 368 because the PCN mechanisms better protect the QoS of admitted flows. 369 Another difference may occur in the rare circumstance when there is a 370 failure: on the one hand some PCN-flows may get terminated, but on 371 the other hand other flows will get their QoS restored. Non PCN- 372 traffic is treated transparently, ie the PCN-domain is a normal 373 DiffServ domain. 375 An operator may choose to deploy either admission control or flow 376 termination or both. Although designed to work together, they are 377 independent mechanisms, and the use of one does not require or 378 prevent the use of the other. 380 A PCN-domain may have three encoding states (or pedantically, an 381 operator may choose to use up three encoding states for PCN): not 382 PCN-marked, threshold-marked, excess-traffic-marked. Then both PCN 383 admission control and flow termination can be supported. As 384 illustrated in Figure 1, admission control accepts new flows until 385 the PCN-traffic rate on the bottleneck link rises above the PCN- 386 threshold-rate, whilst if necessary the flow termination mechanism 387 terminates flows down to the PCN-excess-rate on the bottleneck link. 389 ==Marking behaviour== ==PCN mechanisms== 390 Rate of ^ 391 PCN-traffic on | 392 bottleneck link | (as below and also) 393 | (as below) Drop some PCN-pkts 394 | 395 scheduler rate -|------------------------------------------------ 396 (for PCN-traffic) | 397 | Some pkts Terminate some 398 | excess-traffic-marked admitted flows 399 | & & 400 | Rest of pkts Block new flows 401 | threshold-marked 402 | 403 PCN-excess-rate -|------------------------------------------------ 404 (=PCN-supportable-rate)| 405 | All pkts Block new flows 406 | threshold-marked 407 | 408 PCN-threshold-rate -|------------------------------------------------ 409 (=PCN-admissible-rate)| 410 | No pkts Admit new flows 411 | PCN-marked 412 | 414 Figure 1: Schematic of how the PCN admission control and flow 415 termination mechanisms operate as the rate of PCN-traffic increases, 416 for a PCN-domain with three encoding states. 418 On the other hand, a PCN-domain may have two encoding states (as in 419 [I-D.ietf-pcn-baseline-encoding]) (or pedantically, an operator may 420 choose to use up two encoding states for PCN): not PCN-marked, PCN- 421 marked. Then there are three possibilities, as discussed in the 422 following paragraphs (see also Section 6.3). 424 First, an operator could just use PCN's admission control, solving 425 heavy congestion (caused by re-routing) by 'just waiting' - as 426 sessions end, PCN-traffic naturally reduces, and meanwhile the 427 admission control mechanism will prevent admission of new flows that 428 use the affected links. So the PCN-domain will naturally return to 429 normal operation, but with reduced capacity. The drawback of this 430 approach would be that, until sufficient sessions have ended to 431 relieve the congestion, all PCN-flows as well as lower priority 432 services will be adversely affected. 434 Second, an operator could just rely for admission control on 435 statically provisioned capacity per PCN-ingress-node (regardless of 436 the PCN-egress-node of a flow), as is typical in the hose model of 437 the DiffServ architecture [RFC2475]. Such traffic conditioning 438 agreements can lead to focused overload: many flows happen to focus 439 on a particular link and then all flows through the congested link 440 fail catastrophically. PCN's flow termination mechanism could then 441 be used to counteract such a problem. 443 Third, both admission control and flow termination can be triggered 444 from the single type of PCN-marking; the main downside is that 445 admission control is less accurate [I-D.charny-pcn-single-marking]. 447 Within the PCN-domain there is some flexibility about how the 448 decision making functionality is distributed. These possibilities 449 are outlined in Section 7.4 and also discussed elsewhere, such as in 450 [Menth08]. 452 The flow admission and termination decisions need to be enforced 453 through per flow policing by the PCN-ingress-nodes. If there are 454 several PCN-domains on the end-to-end path, then each needs to police 455 at its PCN-ingress-nodes. One exception is if the operator runs both 456 the access network (not a PCN-domain) and the core network (a PCN- 457 domain); per flow policing could be devolved to the access network 458 and not done at the PCN-ingress-node. Note: to aid readability, the 459 rest of this draft assumes that policing is done by the PCN-ingress- 460 nodes. 462 PCN admission control has to fit with the overall approach to 463 admission control. For instance [I-D.briscoe-tsvwg-cl-architecture] 464 describes the case where RSVP signalling runs end-to-end. The PCN- 465 domain is a single RSVP hop, ie only the PCN-boundary-nodes process 466 RSVP messages, with RSVP messages processed on each hop outside the 467 PCN-domain, as in IntServ over DiffServ [RFC2998]. It would also be 468 possible for the RSVP signalling to be originated and/or terminated 469 by proxies, with application-layer signalling between the end user 470 and the proxy (eg SIP signalling with a home hub). A similar example 471 would use NSIS signalling instead of RSVP. 473 It is possible that a user wants its inelastic traffic to use the PCN 474 mechanisms but also react to ECN marking outside the PCN-domain 475 [I-D.sarker-pcn-ecn-pcn-usecases]. Two possible ways to do this are 476 to tunnel all PCN-packets across the PCN-domain, so that the ECN 477 marks are carried transparently across the PCN-domain, or to use an 478 encoding like [I-D.moncaster-pcn-3-state-encoding]. Tunnelling is 479 discussed further in Section 7.7. 481 Some possible deployment models that are outside the current PCN WG 482 charter are outlined in the Appendix. 484 5. Assumptions and constraints on scope 486 The scope of PCN is, at least initially (see Appendix), restricted by 487 the following assumptions: 489 1. these components are deployed in a single DiffServ domain, within 490 which all PCN-nodes are PCN-enabled and are trusted for truthful 491 PCN-marking and transport 493 2. all flows handled by these mechanisms are inelastic and 494 constrained to a known peak rate through policing or shaping 496 3. the number of PCN-flows across any potential bottleneck link is 497 sufficiently large that stateless, statistical mechanisms can be 498 effective. To put it another way, the aggregate bit rate of PCN- 499 traffic across any potential bottleneck link needs to be 500 sufficiently large relative to the maximum additional bit rate 501 added by one flow. This is the basic assumption of measurement- 502 based admission control. 504 4. PCN-flows may have different precedence, but the applicability of 505 the PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc.) 506 is out of scope. 508 5.1. Assumption 1: Trust and support of PCN - controlled environment 510 We assume that the PCN-domain is a controlled environment, ie all the 511 nodes in a PCN-domain run PCN and are trusted. There are several 512 reasons for proposing this assumption: 514 o The PCN-domain has to be encircled by a ring of PCN-boundary- 515 nodes, otherwise traffic could enter a PCN-BA without being 516 subject to admission control, which would potentially degrade the 517 QoS of existing PCN-flows. 519 o Similarly, a PCN-boundary-node has to trust that all the PCN-nodes 520 mark PCN-traffic consistently. A node not performing PCN-marking 521 wouldn't be able to alert when it suffered pre-congestion, which 522 potentially would lead to too many PCN-flows being admitted (or 523 too few being terminated). Worse, a rogue node could perform 524 various attacks, as discussed in the Security Considerations 525 section. 527 One way of assuring the above two points is that the entire PCN- 528 domain is run by a single operator. Another possibility is that 529 there are several operators that trust each other in their handling 530 of PCN-traffic. 532 Note: All PCN-nodes need to be trustworthy. However if it is known 533 that an interface cannot become pre-congested then it is not strictly 534 necessary for it to be capable of PCN-marking. But this must be 535 known even in unusual circumstances, eg after the failure of some 536 links. 538 5.2. Assumption 2: Real-time applications 540 We assume that any variation of source bit rate is independent of the 541 level of pre-congestion. We assume that PCN-packets come from real 542 time applications generating inelastic traffic, ie sending packets at 543 the rate the codec produces them, regardless of the availability of 544 capacity [RFC4594]. For example, voice and video requiring low 545 delay, jitter and packet loss, the Controlled Load Service, 546 [RFC2211], and the Telephony service class, [RFC4594]. This 547 assumption is to help focus the effort where it looks like PCN would 548 be most useful, ie the sorts of applications where per flow QoS is a 549 known requirement. In other words we focus on PCN providing a 550 benefit to inelastic traffic (PCN may or may not provide a benefit to 551 other types of traffic). 553 As a consequence, it is assumed that PCN-marking is being applied to 554 traffic scheduled with the expedited forwarding per-hop behaviour, 555 [RFC3246], or a per-hop behaviour with similar characteristics. 557 5.3. Assumption 3: Many flows and additional load 559 We assume that there are many PCN-flows on any bottleneck link in the 560 PCN-domain (or, to put it another way, the aggregate bit rate of PCN- 561 traffic across any potential bottleneck link is sufficiently large 562 relative to the maximum additional bit rate added by one PCN-flow). 563 Measurement-based admission control assumes that the present is a 564 reasonable prediction of the future: the network conditions are 565 measured at the time of a new flow request, however the actual 566 network performance must be acceptable during the call some time 567 later. One issue is that if there are only a few variable rate 568 flows, then the aggregate traffic level may vary a lot, perhaps 569 enough to cause some packets to get dropped. If there are many flows 570 then the aggregate traffic level should be statistically smoothed. 571 How many flows is enough depends on a number of factors such as the 572 variation in each flow's rate, the total rate of PCN-traffic, and the 573 size of the "safety margin" between the traffic level at which we 574 start admission-marking and at which packets are dropped or 575 significantly delayed. 577 We do not make explicit assumptions on how many PCN-flows are in each 578 ingress-egress-aggregate. Performance evaluation work may clarify 579 whether it is necessary to make any additional assumption on 580 aggregation at the ingress-egress-aggregate level. 582 5.4. Assumption 4: Emergency use out of scope 584 PCN-flows may have different precedence, but the applicability of the 585 PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc) is out 586 of scope for consideration by the PCN WG. 588 6. High-level functional architecture 590 The high-level approach is to split functionality between: 592 o PCN-interior-nodes 'inside' the PCN-domain, which monitor their 593 own state of pre-congestion and mark PCN-packets as appropriate. 594 They are not flow-aware, nor aware of ingress-egress-aggregates. 595 The functionality is also done by PCN-ingress-nodes for their 596 outgoing interfaces (ie those 'inside' the PCN-domain). 598 o PCN-boundary-nodes at the edge of the PCN-domain, which control 599 admission of new PCN-flows and termination of existing PCN-flows, 600 based on information from PCN-interior-nodes. This information is 601 in the form of the PCN-marked data packets (which are intercepted 602 by the PCN-egress-nodes) and not signalling messages. Generally 603 PCN-ingress-nodes are flow-aware. 605 The aim of this split is to keep the bulk of the network simple, 606 scalable and robust, whilst confining policy, application-level and 607 security interactions to the edge of the PCN-domain. For example the 608 lack of flow awareness means that the PCN-interior-nodes don't care 609 about the flow information associated with PCN-packets, nor do the 610 PCN-boundary-nodes care about which PCN-interior-nodes its ingress- 611 egress-aggregates traverse. 613 In order to generate information about the current state of the PCN- 614 domain, each PCN-node PCN-marks packets if it is "pre-congested". 615 Exactly when a PCN-node decides if it is "pre-congested" (the 616 algorithm) and exactly how packets are "PCN-marked" (the encoding) 617 will be defined in separate standards-track documents, but at a high 618 level it is as follows: 620 o the algorithms: a PCN-node meters the amount of PCN-traffic on 621 each one of its outgoing (or incoming) links. The measurement is 622 made as an aggregate of all PCN-packets, and not per flow. There 623 are two algorithms, one for threshold-marking and one for excess- 624 traffic-marking. 626 o the encoding(s): a PCN-node PCN-marks a PCN-packet by modifying a 627 combination of the DSCP and ECN fields. In the "baseline" 628 encoding [I-D.ietf-pcn-baseline-encoding], the ECN field is set to 629 11 and the DSCP is not altered. Extension encodings may be 630 defined that, at most, use a second DSCP (eg as in 631 [I-D.moncaster-pcn-3-state-encoding]) and/or set the ECN field to 632 values other than 11 (eg as in [I-D.menth-pcn-psdm-encoding]). 634 In a PCN-domain the operator may have two or three encoding states 635 available. The baseline encoding provides two encoding states (not 636 PCN-marked, PCN-marked), whilst extended encodings can provide three 637 encoding states (not PCN-marked, threshold-marked, excess-traffic- 638 marked). 640 The PCN-boundary-nodes monitor the PCN-marked packets in order to 641 extract information about the current state of the PCN-domain. Based 642 on this monitoring, a distributed decision is made about whether to 643 admit a prospective new flow or whether to terminate existing 644 flow(s). Sections 7.4 and 7.5 mention various possibilities for how 645 the functionality could be distributed. 647 PCN-marking needs to be configured on all (potentially pre-congested) 648 links in the PCN-domain to ensure that the PCN mechanisms protect all 649 links. The actual functionality can be configured on the outgoing or 650 incoming interfaces of PCN-nodes - or one algorithm could be 651 configured on the outgoing interface and the other on the incoming 652 interface. The important point is that a consistent choice is made 653 across the PCN-domain to ensure that the PCN mechanisms protect all 654 links. See [I-D.ietf-pcn-marking-behaviour] for further discussion. 656 The objective of the threshold-marking algorithm is to threshold-mark 657 all PCN-packets whenever the rate of PCN-packets is greater than some 658 configured rate, the PCN-threshold-rate. The objective of the 659 excess-traffic-marking algorithm is to excess-traffic-mark PCN- 660 packets at a rate equal to the difference between the bit rate of 661 PCN-packets and some configured rate, the PCN-excess-rate. Note that 662 this description reflects the overall intent of the algorithm rather 663 than its instantaneous behaviour, since the rate measured at a 664 particular moment depends on the detailed algorithm, its 665 implementation, and the traffic's variance as well as its rate (eg 666 marking may well continue after a recent overload even after the 667 instantaneous rate has dropped). The algorithms are specified in 668 [I-D.ietf-pcn-marking-behaviour]. 670 All the presently proposed admission and termination approaches are 671 detailed and compared in [I-D.charny-pcn-comparison] and [Menth08]. 672 The discussion below is just a brief summary. It initially assumes 673 there are three encoding states available. 675 6.1. Flow admission 677 The objective of PCN's flow admission control mechanism is to limit 678 the PCN-traffic on each link in the PCN-domain to *roughly* its PCN- 679 admissible-rate, by admitting or blocking prospective new flows, in 680 order to protect the QoS of existing PCN-flows. With three encoding 681 states available, the PCN-threshold-rate is configured by the 682 operator as equal to the PCN-admissible-rate on each link. It is set 683 lower than the traffic rate at which the link becomes congested and 684 the node drops packets. 686 Exactly how the admission control decision is made will be defined 687 separately in informational documents. At a high level two 688 approaches are proposed (others might be possible): 690 o the PCN-egress-node measures (possibly as a moving average) the 691 fraction of the PCN-traffic that is threshold-marked. The 692 fraction is measured for a specific ingress-egress-aggregate. If 693 the fraction is below a threshold value then the new flow is 694 admitted, and if the fraction is above the threshold value then it 695 is blocked. In [I-D.eardley-pcn-architecture] the fraction is 696 measured as an EWMA (exponentially weighted moving average) and 697 termed the "congestion level estimate". 699 o the PCN-egress-node monitors PCN-traffic and if it receives one 700 (or several) threshold-marked packets, then the new flow is 701 blocked, otherwise it is admitted. One possibility may be to 702 react to the marking state of an initial flow set-up packet (eg 703 RSVP PATH). Another is that after one (or several) threshold- 704 marks then all flows are blocked until after a specific period of 705 no congestion. 707 Note that the admission control decision is made for a particular 708 pair of PCN-boundary-nodes. So it is quite possible for a new flow 709 to be admitted between one pair of PCN-boundary-nodes, whilst at the 710 same time another admission request is blocked between a different 711 pair of PCN-boundary-nodes. 713 6.2. Flow termination 715 The objective of PCN's flow termination mechanism is to limit the 716 PCN-traffic on each link to *roughly* its PCN-supportable-rate, by 717 terminating some existing PCN-flows, in order to protect the QoS of 718 the remaining PCN-flows. With three encoding states available, the 719 PCN-excess-rate is configured by the operator as equal to the PCN- 720 supportable-rate on each link. It may be set lower than the traffic 721 rate at which the link becomes congested and the node drops packets. 723 Exactly how the flow termination decision is made will be defined 724 separately in informational documents. At a high level several 725 approaches are proposed (others might be possible): 727 o In one approach the PCN-egress-node measures the rate of PCN- 728 traffic that is not excess-traffic-marked, which is the amount of 729 PCN-traffic that can actually be supported, and communicates this 730 to the PCN-ingress-node. Also the PCN-ingress-node measures the 731 rate of PCN-traffic that is destined for this specific PCN-egress- 732 node, and hence it can calculate the excess amount that should be 733 terminated. 735 o Another approach instead measures the rate of excess-traffic- 736 marked traffic and terminates this amount of traffic. This 737 terminates less traffic than the previous bullet if some nodes are 738 dropping PCN-traffic. 740 o Another approach monitors PCN-packets and terminates some of the 741 PCN-flows that have an excess-traffic-marked packet. (If all such 742 flows were terminated, far too much traffic would be terminated, 743 so a random selection needs to be made from those with an excess- 744 traffic-marked packet, [I-D.menth-pcn-emft].) 746 Since flow termination is designed for "abnormal" circumstances, it 747 is quite likely that some PCN-nodes are congested and hence packets 748 are being dropped and/or significantly queued. The flow termination 749 mechanism must accommodate this. 751 Note also that the termination control decision is made for a 752 particular pair of PCN-boundary-nodes. So it is quite possible for 753 PCN-flows to be terminated between one pair of PCN-boundary-nodes, 754 whilst at the same time none are terminated between a different pair 755 of PCN-boundary-nodes. 757 6.3. Flow admission and/or flow termination when there are only two PCN 758 encoding states 760 If a PCN-domain has only two encoding states available (PCN-marked 761 and not PCN-marked), ie it is using the baseline encoding 762 [I-D.ietf-pcn-baseline-encoding], then an operator has three options 763 (others might be possible): 765 o admission control only: PCN-marking means threshold-marking, ie 766 only the threshold-marking algorithm writes PCN-marks. Only PCN 767 admission control is available. 769 o flow termination only: PCN-marking means excess-traffic-marking, 770 ie only the excess-traffic-marking algorithm writes PCN-marks. 772 Only PCN termination control is available. 774 o both admission control and flow termination: only the excess- 775 traffic-marking algorithm writes PCN-marks, however the configured 776 rate (PCN-excess-rate) is set equal to the PCN-admissible-rate, as 777 shown in Figure 2. [I-D.charny-pcn-single-marking] describes how 778 both admission control and flow termination can be triggered in 779 this case and also gives some of the pros and cons of this 780 approach. The main downside is that admission control is less 781 accurate. 783 ==Marking behaviour== ==PCN mechanisms== 784 Rate of ^ 785 PCN-traffic on | 786 bottleneck link | Terminate some 787 | Further pkts admitted flows 788 | excess-traffic-marked & 789 | Block new flows 790 | 791 | 792 U*PCN-excess-rate -|------------------------------------------------ 793 (=PCN-supportable-rate)| 794 | Some pkts Block new flows 795 | excess-traffic-marked 796 | 797 PCN-excess-rate -|------------------------------------------------ 798 (=PCN-admissible-rate)| 799 | No pkts Admit new flows 800 | PCN-marked 801 | 803 Figure 2: Schematic of how the PCN admission control and flow 804 termination mechanisms operate as the rate of PCN-traffic increases, 805 for a PCN-domain with two encoding states and using the approach of 806 [I-D.charny-pcn-single-marking]. Note: U is a global parameter for 807 all links in the PCN-domain. 809 6.4. Information transport 811 The transport of pre-congestion information from a PCN-node to a PCN- 812 egress-node is through PCN-markings in data packet headers, ie "in- 813 band": no signalling protocol messaging is needed. Signalling is 814 needed to transport PCN-feedback-information between the PCN- 815 boundary-nodes, for example to convey the fraction of PCN-marked 816 traffic from a PCN-egress-node to the relevant PCN-ingress-node. 817 Exactly what information needs to be transported will be described in 818 future documents about possible boundary mechanisms. The signalling 819 could be done by an extension of RSVP or NSIS, for instance; protocol 820 work will be done by the relevant WG, but for example 821 [I-D.lefaucheur-rsvp-ecn] describes the extensions needed for RSVP. 823 6.5. PCN-traffic 825 The following are some high-level points about how PCN works: 827 o There needs to be a way for a PCN-node to distinguish PCN-traffic 828 from other traffic. This is through a combination of the DSCP 829 field and/or ECN field. 831 o It is not advised to have non PCN-traffic that competes for the 832 same capacity as PCN-traffic but, if there is such traffic, there 833 needs to be a mechanism to limit it. "Capacity" means the 834 forwarding bandwidth on a link; "competes" means that non PCN- 835 packets will delay PCN-packets in the queue for the link. Hence 836 more non PCN-traffic results in poorer QoS for PCN. Further, the 837 unpredictable amount of non PCN-traffic makes the PCN mechanisms 838 less accurate and so reduces PCN's ability to protect the QoS of 839 admitted PCN-flows 841 o Two examples of such non PCN-traffic (ie that competes for the 842 same capacity as PCN-traffic) are: 844 1. traffic that is priority scheduled over PCN (perhaps a particular 845 application or an operator's control messages). 847 2. traffic that is scheduled at the same priority as PCN (for 848 example if the Voice-Admit codepoint is used for PCN-traffic 849 [I-D.ietf-pcn-baseline-encoding] and there is non-PCN voice-admit 850 traffic in the PCN-domain). 852 o If there is such non PCN-traffic (ie that competes for the same 853 capacity as PCN-traffic), then PCN's mechanisms should take 854 account of it, in order to improve the accuracy of the decision 855 about whether to admit (or terminate) a PCN-flow. For example, 856 one mechanism is that such non PCN-traffic contributes to the PCN 857 meters (ie is metered by the threshold-marking and excess-traffic- 858 marking algorithms). 860 o There will be non PCN-traffic that doesn't compete for the same 861 capacity as PCN-traffic, because it is forwarded at lower 862 priority. Hence it shouldn't contribute to the PCN meters. 863 Examples are best effort and assured forwarding traffic. However, 864 a PCN-node should dedicate some capacity to lower priority traffic 865 so that it isn't starved. 867 o The document assumes that the PCN mechanisms are applied to a 868 single behaviour aggregate in the PCN-domain. However, it would 869 also be possible to apply them independently to more than one 870 behaviour aggregate, which are distinguished by DSCP. 872 6.6. Backwards compatibility 874 PCN specifies semantics for the ECN field that differ from the 875 default semantics of [RFC3168]. A particular PCN encoding scheme 876 needs to describe how it meets the guidelines of BCP 124 [RFC4774] 877 for specifying alternative semantics for the ECN field. In summary 878 the approach is to: 880 o use a DSCP to allow PCN-nodes to distinguish PCN-traffic that uses 881 the alternative ECN semantics; 883 o define these semantics for use within a controlled region, the 884 PCN-domain; 886 o take appropriate action if ECN capable, non-PCN traffic arrives at 887 a PCN-ingress-node with the DSCP used by PCN. 889 For the baseline encoding [I-D.ietf-pcn-baseline-encoding], the 890 'appropriate action' is to block ECN-capable traffic that uses the 891 same DSCP as PCN from entering the PCN-domain directly. Blocking 892 means it is dropped or downgraded to a lower priority behaviour 893 aggregate, or alternatively such traffic may be tunnelled through the 894 PCN-domain. The reason that 'appropriate action' is needed is that 895 the PCN-egress-node clears the ECN field to 00. 897 Extended encoding schemes may take different 'appropriate action'. 899 7. Detailed Functional architecture 901 This section is intended to provide a systematic summary of the new 902 functional architecture in the PCN-domain. First it describes 903 functions needed at the three specific types of PCN-node; these are 904 data plane functions and are in addition to their normal router 905 functions. Then it describes further functionality needed for both 906 flow admission control and flow termination; these are signalling and 907 decision-making functions, and there are various possibilities for 908 where the functions are physically located. The section is split 909 into: 911 1. functions needed at PCN-interior-nodes 912 2. functions needed at PCN-ingress-nodes 914 3. functions needed at PCN-egress-nodes 916 4. other functions needed for flow admission control 918 5. other functions needed for flow termination control 920 Note: Probing is covered in the Appendix. 922 The section then discusses some other detailed topics: 924 1. addressing 926 2. tunnelling 928 3. fault handling 930 7.1. PCN-interior-node functions 932 Each link of the PCN-domain is configured with the following 933 functionality: 935 o Behaviour aggregate classification - determine whether an incoming 936 packet is a PCN-packet or not. 938 o Meter - measure the 'amount of PCN-traffic'. The measurement is 939 made as an aggregate of all PCN-packets, and not per flow. 941 o PCN-mark - algorithms determine whether to PCN-mark PCN-packets 942 and what packet encoding is used. 944 The functions are defined in [I-D.ietf-pcn-marking-behaviour] and the 945 baseline encoding in [I-D.ietf-pcn-baseline-encoding] (extended 946 encodings are to be defined in other documents). 948 7.2. PCN-ingress-node functions 950 Each ingress link of the PCN-domain is configured with the following 951 functionality: 953 o Packet classification - determine whether an incoming packet is 954 part of a previously admitted flow, by using a filter spec (eg 955 DSCP, source and destination addresses and port numbers). 957 o Traffic conditioning - police, by dropping or downgrading, any 958 packets received with a DSCP indicating PCN transport that do not 959 belong to an admitted flow. (A prospective PCN-flow that is 960 rejected could be blocked or admitted into a lower priority 961 behaviour aggregate.) Similarly, police packets that are part of 962 a previously admitted flow, to check that the flow keeps to the 963 agreed rate or flowspec (eg [RFC1633] for a microflow and its NSIS 964 equivalent). 966 o PCN-colour - set the DSCP and ECN fields appropriately for the 967 PCN-domain, for example as in [I-D.ietf-pcn-baseline-encoding]. 969 o Meter - some approaches to flow termination require the PCN- 970 ingress-node to measure the (aggregate) rate of PCN-traffic 971 towards a particular PCN-egress-node. 973 The first two are policing functions, needed to make sure that PCN- 974 packets admitted into the PCN-domain belong to a flow that has been 975 admitted and to ensure that the flow keeps to the flowspec agreed (eg 976 doesn't exceed an agreed maximum rate and is inelastic traffic). 977 Installing the filter spec will typically be done by the signalling 978 protocol, as will re-installing the filter, for example after a re- 979 route that changes the PCN-ingress-node (see 980 [I-D.briscoe-tsvwg-cl-architecture] for an example using RSVP). PCN- 981 colouring allows the rest of the PCN-domain to recognise PCN-packets. 983 7.3. PCN-egress-node functions 985 Each egress link of the PCN-domain is configured with the following 986 functionality: 988 o Packet classify - determine which PCN-ingress-node a PCN-packet 989 has come from. 991 o Meter - "measure PCN-traffic" or "monitor PCN-marks". 993 o PCN-colour - for PCN-packets, set the DSCP and ECN fields to the 994 appropriate values for use outside the PCN-domain. 996 The metering functionality of course depends on whether it is 997 targeted at admission control or flow termination. Alternative 998 proposals involve the PCN-egress-node "measuring" as an aggregate (ie 999 not per flow) all PCN-packets from a particular PCN-ingress-node, or 1000 "monitoring" the PCN-traffic and reacting to one (or several) PCN- 1001 marked packets. For PCN-colouring, [I-D.ietf-pcn-baseline-encoding] 1002 specifies that the PCN-egress-node re-sets the ECN field to 00; other 1003 encodings may define different behaviour. 1005 7.4. Admission control functions 1007 As well as the functions covered above, other specific admission 1008 control functions need to be performed (others might be possible): 1010 o Make decision about admission - based on the output of the PCN- 1011 egress-node's PCN meter function. In the case where it "measures 1012 PCN-traffic", the measured traffic on the ingress-egress-aggregate 1013 is compared with some reference level. In the case where it 1014 "monitors PCN-marks", then the decision is based on whether one 1015 (or several) packets is (are) PCN-marked or not (eg the RSVP PATH 1016 message). In either case, the admission decision also takes 1017 account of policy and application layer requirements [RFC2753]. 1019 o Communicate decision about admission - signal the decision to the 1020 node making the admission control request (which may be outside 1021 the PCN-domain), and to the policer (PCN-ingress-node function) 1022 for enforcement of the decision. 1024 There are various possibilities for how the functionality could be 1025 distributed (we assume the operator would configure which is used): 1027 o The decision is made at the PCN-egress-node and the decision 1028 (admit or block) is signalled to the PCN-ingress-node. 1030 o The decision is recommended by the PCN-egress-node (admit or 1031 block) but the decision is definitively made by the PCN-ingress- 1032 node. The rationale is that the PCN-egress-node naturally has the 1033 necessary information about PCN-marking on the ingress-egress- 1034 aggregate, but the PCN-ingress-node is the policy enforcement 1035 point [RFC2753], which polices incoming traffic to ensure it is 1036 part of an admitted PCN-flow. 1038 o The decision is made at the PCN-ingress-node, which requires that 1039 the PCN-egress-node signals PCN-feedback-information to the PCN- 1040 ingress-node. For example, it could signal the current fraction 1041 of PCN-traffic that is PCN-marked. 1043 o The decision is made at a centralised node (see Appendix; beyond 1044 scope of current PCN WG charter). 1046 Note: Admission control functionality is not performed by normal PCN- 1047 interior-nodes. 1049 7.5. Flow termination functions 1051 As well as the functions covered above, other specific termination 1052 control functions need to be performed (others might be possible): 1054 o PCN-meter at PCN-egress-node - similarly to flow admission, there 1055 are two types of proposals: to "measure PCN-traffic" on the 1056 ingress-egress-aggregate, and to "monitor PCN-marks" and react to 1057 one (or several) PCN-marks. 1059 o (if required) PCN-meter at PCN-ingress-node - make "measurements 1060 of PCN-traffic" being sent towards a particular PCN-egress-node; 1061 again, this is done for the ingress-egress-aggregate and not per 1062 flow. 1064 o (if required) Communicate PCN-feedback-information to the node 1065 that makes the flow termination decision. For example, as in 1066 [I-D.briscoe-tsvwg-cl-architecture], communicate the PCN-egress- 1067 node's measurements to the PCN-ingress-node. 1069 o Make decision about flow termination - use the information from 1070 the PCN-meter(s) to decide which PCN-flow or PCN-flows to 1071 terminate. The decision takes account of policy and application 1072 layer requirements [RFC2753]. 1074 o Communicate decision about flow termination - signal the decision 1075 to the node that is able to terminate the flow (which may be 1076 outside the PCN-domain), and to the policer (PCN-ingress-node 1077 function) for enforcement of the decision. 1079 There are various possibilities for how the functionality could be 1080 distributed, similar to those discussed above in the Admission 1081 control section. 1083 7.6. Addressing 1085 PCN-nodes may need to know the address of other PCN-nodes. Note: in 1086 all cases PCN-interior-nodes don't need to know the address of any 1087 other PCN-nodes (except as normal their next hop neighbours, for 1088 routing purposes). 1090 The PCN-egress-node needs to know the address of the PCN-ingress-node 1091 associated with a flow, at a minimum so that the PCN-ingress-node can 1092 be informed to enforce the admission decision (and any flow 1093 termination decision) through policing. There are various 1094 possibilities for how the PCN-egress-node can do this, ie associate 1095 the received packet to the correct ingress-egress-aggregate. It is 1096 not the intention of this document to mandate a particular mechanism. 1098 o The addressing information can be gathered from signalling. For 1099 example, regular processing of an RSVP Path message, as the PCN- 1100 ingress-node is the previous RSVP hop (PHOP) 1101 ([I-D.lefaucheur-rsvp-ecn]). Or the PCN-ingress-node could signal 1102 its address to the PCN-egress-node. 1104 o Always tunnel PCN-traffic across the PCN-domain. Then the PCN- 1105 ingress-node's address is simply the source address of the outer 1106 packet header. The PCN-ingress-node needs to learn the address of 1107 the PCN-egress-node, either by manual configuration or by one of 1108 the automated tunnel endpoint discovery mechanisms (such as 1109 signalling or probing over the data route, interrogating routing 1110 or using a centralised broker). 1112 7.7. Tunnelling 1114 Tunnels may originate and/or terminate within a PCN-domain (eg IP 1115 over IP, IP over MPLS). It is important that the PCN-marking of any 1116 packet can potentially influence PCN's flow admission control and 1117 termination - it shouldn't matter whether the packet happens to be 1118 tunnelled at the PCN-node that PCN-marks the packet, or indeed 1119 whether it's decapsulated or encapsulated by a subsequent PCN-node. 1120 This suggests that the "uniform conceptual model" described in 1121 [RFC2983] should be re-applied in the PCN context. In line with this 1122 and the approach of [RFC4303] and [I-D.briscoe-tsvwg-ecn-tunnel], the 1123 following rule is applied if encapsulation is done within the PCN- 1124 domain: 1126 o any PCN-marking is copied into the outer header 1128 Note: A tunnel will not provide this behaviour if it complies with 1129 [RFC3168] tunnelling in either mode, but it will if it complies with 1130 [RFC4301] IPSec tunnelling. 1132 Similarly, in line with the "uniform conceptual model" of [RFC2983], 1133 the "full-functionality option" of [RFC3168], and [RFC4301], the 1134 following rule is applied if decapsulation is done within the PCN- 1135 domain: 1137 o if the outer header's marking state is more severe then it is 1138 copied onto the inner header. 1140 Note: the order of increasing severity is: not PCN-marked; threshold- 1141 marking; excess-traffic-marking. 1143 An operator may wish to tunnel PCN-traffic from PCN-ingress-nodes to 1144 PCN-egress-nodes. The PCN-marks shouldn't be visible outside the 1145 PCN-domain, which can be achieved by the PCN-egress-node doing the 1146 PCN-colouring function (Section 7.3) after all the other (PCN and 1147 tunnelling) functions. The potential reasons for doing such 1148 tunnelling are: the PCN-egress-node then automatically knows the 1149 address of the relevant PCN-ingress-node for a flow; even if ECMP is 1150 running, all PCN-packets on a particular ingress-egress-aggregate 1151 follow the same path. But it also has drawbacks, for example the 1152 additional overhead in terms of bandwidth and processing, and the 1153 cost of setting up a mesh of tunnels between PCN-boundary-nodes 1154 (there is an N^2 scaling issue). 1156 Potential issues arise for a "partially PCN-capable tunnel", ie where 1157 only one tunnel endpoint is in the PCN domain: 1159 1. The tunnel originates outside a PCN-domain and ends inside it. 1160 If the packet arrives at the tunnel ingress with the same 1161 encoding as used within the PCN-domain to indicate PCN-marking, 1162 then this could lead the PCN-egress-node to falsely measure pre- 1163 congestion. 1165 2. The tunnel originates inside a PCN-domain and ends outside it. 1166 If the packet arrives at the tunnel ingress already PCN-marked, 1167 then it will still have the same encoding when it's decapsulated 1168 which could potentially confuse nodes beyond the tunnel egress. 1170 In line with the solution for partially capable DiffServ tunnels in 1171 [RFC2983], the following rules are applied: 1173 o For case (1), the tunnel egress node clears any PCN-marking on the 1174 inner header. This rule is applied before the 'copy on 1175 decapsulation' rule above. 1177 o For case (2), the tunnel ingress node clears any PCN-marking on 1178 the inner header. This rule is applied after the 'copy on 1179 encapsulation' rule above. 1181 Note that the above implies that one has to know, or determine, the 1182 characteristics of the other end of the tunnel as part of 1183 establishing it. 1185 Tunnelling constraints were a major factor in the choice of the 1186 baseline encoding. As explained in [I-D.ietf-pcn-baseline-encoding], 1187 with current tunnelling endpoints only the 11 codepoint of the ECN 1188 field survives decapsulation, and hence the baseline encoding only 1189 uses the 11 codepoint to indicate PCN-marking. Extended encoding 1190 schemes need to explain their interactions with (or assumptions 1191 about) tunnelling. A lengthy discussion of all the issues associated 1192 with layered encapsulation of congestion notification (for ECN as 1193 well as PCN) is in [I-D.briscoe-tsvwg-ecn-tunnel]. 1195 7.8. Fault handling 1197 If a PCN-interior-node (or one of its links) fails, then lower layer 1198 protection mechanisms or the regular IP routing protocol will 1199 eventually re-route around it. If the new route can carry all the 1200 admitted traffic, flows will gracefully continue. If instead this 1201 causes early warning of pre-congestion on the new route, then 1202 admission control based on pre-congestion notification will ensure 1203 new flows will not be admitted until enough existing flows have 1204 departed. Re-routing may result in heavy (pre-)congestion, when the 1205 flow termination mechanism will kick in. 1207 If a PCN-boundary-node fails then we would like the regular QoS 1208 signalling protocol to be responsible for taking appropriate action. 1209 As an example [I-D.briscoe-tsvwg-cl-architecture] considers what 1210 happens if RSVP is the QoS signalling protocol. 1212 8. Challenges 1214 Prior work on PCN and similar mechanisms has thrown up a number of 1215 considerations about PCN's design goals (things PCN should be good 1216 at) [I-D.chan-pcn-problem-statement] and some issues that have been 1217 hard to solve in a fully satisfactory manner. Taken as a whole it 1218 represents a list of trade-offs (it is unlikely that they can all be 1219 100% achieved) and perhaps as evaluation criteria to help an operator 1220 (or the IETF) decide between options. 1222 The following are open issues. They are mainly taken from 1223 [I-D.briscoe-tsvwg-cl-architecture], which also describes some 1224 possible solutions. Note that some may be considered unimportant in 1225 general or in specific deployment scenarios or by some operators. 1227 NOTE: Potential solutions are out of scope for this document. 1229 o ECMP (Equal Cost Multi-Path) Routing: The level of pre-congestion 1230 is measured on a specific ingress-egress-aggregate. However, if 1231 the PCN-domain runs ECMP, then traffic on this ingress-egress- 1232 aggregate may follow several different paths - some of the paths 1233 could be pre-congested whilst others are not. There are three 1234 potential problems: 1236 1. over-admission: a new flow is admitted (because the pre- 1237 congestion level measured by the PCN-egress-node is 1238 sufficiently diluted by unmarked packets from non-congested 1239 paths that a new flow is admitted), but its packets travel 1240 through a pre-congested PCN-node. 1242 2. under-admission: a new flow is blocked (because the pre- 1243 congestion level measured by the PCN-egress-node is 1244 sufficiently increased by PCN-marked packets from pre- 1245 congested paths that a new flow is blocked), but its packets 1246 travel along an uncongested path. 1248 3. ineffective termination: a flow is terminated, but its path 1249 doesn't travel through the (pre-)congested router(s). Since 1250 flow termination is a 'last resort', which protects the 1251 network should over-admission occur, this problem is probably 1252 more important to solve than the other two. 1254 o ECMP and signalling: It is possible that, in a PCN-domain running 1255 ECMP, the signalling packets (eg RSVP, NSIS) follow a different 1256 path than the data packets, which could matter if the signalling 1257 packets are used as probes. Whether this is an issue depends on 1258 which fields the ECMP algorithm uses; if the ECMP algorithm is 1259 restricted to the source and destination IP addresses, then it 1260 will not be an issue. ECMP and signalling interactions are a 1261 specific instance of a general issue for non-traditional routing 1262 combined with resource management along a path [Hancock]. 1264 o Tunnelling: There are scenarios where tunnelling makes it 1265 difficult to determine the path in the PCN-domain. The problem, 1266 its impact, and the potential solutions are similar to those for 1267 ECMP. 1269 o Scenarios with only one tunnel endpoint in the PCN domain may make 1270 it harder for the PCN-egress-node to gather from the signalling 1271 messages (eg RSVP, NSIS) the identity of the PCN-ingress-node. 1273 o Bi-Directional Sessions: Many applications have bi-directional 1274 sessions - hence there are two microflows that should be admitted 1275 (or terminated) as a pair - for instance a bi-directional voice 1276 call only makes sense if microflows in both directions are 1277 admitted. However, the PCN mechanisms concern admission and 1278 termination of a single flow, and coordination of the decision for 1279 both flows is a matter for the signalling protocol and out of 1280 scope of PCN. One possible example would use SIP pre-conditions. 1281 However, there are others. 1283 o Global Coordination: PCN makes its admission decision based on 1284 PCN-markings on a particular ingress-egress-aggregate. Decisions 1285 about flows through a different ingress-egress-aggregate are made 1286 independently. However, one can imagine network topologies and 1287 traffic matrices where, from a global perspective, it would be 1288 better to make a coordinated decision across all the ingress- 1289 egress-aggregates for the whole PCN-domain. For example, to block 1290 (or even terminate) flows on one ingress-egress-aggregate so that 1291 more important flows through a different ingress-egress-aggregate 1292 could be admitted. The problem may well be relatively 1293 insignificant. 1295 o Aggregate Traffic Characteristics: Even when the number of flows 1296 is stable, the traffic level through the PCN-domain will vary 1297 because the sources vary their traffic rates. PCN works best when 1298 there is not too much variability in the total traffic level at a 1299 PCN-node's interface (ie in the aggregate traffic from all 1300 sources). Too much variation means that a node may (at one 1301 moment) not be doing any PCN-marking and then (at another moment) 1302 drop packets because it is overloaded. This makes it hard to tune 1303 the admission control scheme to stop admitting new flows at the 1304 right time. Therefore the problem is more likely with fewer, 1305 burstier flows. 1307 o Flash crowds and Speed of Reaction: PCN is a measurement-based 1308 mechanism and so there is an inherent delay between packet marking 1309 by PCN-interior-nodes and any admission control reaction at PCN- 1310 boundary-nodes. For example, potentially if a big burst of 1311 admission requests occurs in a very short space of time (eg 1312 prompted by a televote), they could all get admitted before enough 1313 PCN-marks are seen to block new flows. In other words, any 1314 additional load offered within the reaction time of the mechanism 1315 must not move the PCN-domain directly from a no congestion state 1316 to overload. This 'vulnerability period' may have an impact at 1317 the signalling level, for instance QoS requests should be rate 1318 limited to bound the number of requests able to arrive within the 1319 vulnerability period. 1321 o Silent at start: after a successful admission request the source 1322 may wait some time before sending data (eg waiting for the called 1323 party to answer). Then the risk is that, in some circumstances, 1324 PCN's measurements underestimate what the pre-congestion level 1325 will be when the source does start sending data. 1327 9. Operations and Management 1329 This Section considers operations and management issues, under the 1330 FCAPS headings: OAM of Faults, Configuration, Accounting, Performance 1331 and Security. Provisioning is discussed with performance. 1333 9.1. Configuration OAM 1335 Threshold-marking and excess-traffic-marking are standardised in 1336 [I-D.ietf-pcn-marking-behaviour]. However, more diversity in PCN- 1337 boundary-node behaviours is expected, in order to interface with 1338 diverse industry architectures. It may be possible to have different 1339 PCN-boundary-node behaviours for different ingress-egress-aggregates 1340 within the same PCN-domain. 1342 A PCN marking behaviour (threshold-marking, excess-traffic-marking) 1343 is enabled on either the egress or the ingress interfaces of PCN- 1344 nodes. A consistent choice must be made across the PCN-domain to 1345 ensure that the PCN mechanisms protect all links. 1347 PCN configuration control variables fall into the following 1348 categories: 1350 o system options (enabling or disabling behaviours) 1352 o parameters (setting levels, addresses etc) 1354 One possibility is that all configurable variables sit within an SNMP 1355 management framework [RFC3411], being structured within a defined 1356 management information base (MIB) on each node, and being remotely 1357 readable and settable via a suitably secure management protocol 1358 (SNMPv3). 1360 Some configuration options and parameters have to be set once to 1361 'globally' control the whole PCN-domain. Where possible, these are 1362 identified below. This may affect operational complexity and the 1363 chances of interoperability problems between equipment from different 1364 vendors. 1366 It may be possible for an operator to configure some PCN-interior- 1367 nodes so that they don't run the PCN mechanisms, if it knows that 1368 these links will never become (pre-)congested. 1370 9.1.1. System options 1372 On PCN-interior-nodes there will be very few system options: 1374 o Whether two PCN-markings (threshold-marked and excess-traffic- 1375 marked) are enabled or only one. Typically all nodes throughout a 1376 PCN-domain will be configured the same in this respect. However, 1377 exceptions could be made. For example, if most PCN-nodes used 1378 both markings, but some legacy hardware was incapable of running 1379 two algorithms, an operator might be willing to configure these 1380 legacy nodes solely for excess-traffic-marking to enable flow 1381 termination as a back-stop. It would be sensible to place such 1382 nodes where they could be provisioned with a greater leeway over 1383 expected traffic levels. 1385 o In the case where only one PCN-marking is enabled, all nodes must 1386 be configured to generate PCN-marks from the same meter (ie either 1387 the threshold meter or the excess traffic meter). 1389 PCN-boundary-nodes (ingress and egress) will have more system 1390 options: 1392 o Which of admission and flow termination are enabled. If any PCN- 1393 interior-node is configured to generate a marking, all PCN- 1394 boundary-nodes must be able to interpret that marking (which 1395 includes understanding, in a PCN-domain that uses only one type of 1396 PCN-marking, whether they are generated by PCN-interior-nodes' 1397 threshold meters or the excess traffic meters). Therefore all 1398 PCN-boundary-nodes must be configured the same in this respect. 1400 o Where flow admission and termination decisions are made: at PCN- 1401 ingress-nodes or at PCN-egress-nodes (or at a centralised node, 1402 see Appendix). Theoretically, this configuration choice could be 1403 negotiated for each pair of PCN-boundary-nodes, but we cannot 1404 imagine why such complexity would be required, except perhaps in 1405 future inter-domain scenarios. 1407 o How PCN-markings are translated into admission control and flow 1408 termination decisions (see Section 6.1 and Section 6.2). 1410 PCN-egress-nodes will have further system options: 1412 o How the mapping should be established between each packet and its 1413 aggregate, eg by MPLS label, by IP packet filterspec; and how to 1414 take account of ECMP. 1416 o If an equipment vendor provides a choice, there may be options to 1417 select which smoothing algorithm to use for measurements. 1419 9.1.2. Parameters 1421 Like any DiffServ domain, every node within a PCN-domain will need to 1422 be configured with the DSCP(s) used to identify PCN-packets. On each 1423 interior link the main configuration parameters are the PCN- 1424 threshold-rate and PCN-excess-rate. A larger PCN-threshold-rate 1425 enables more PCN-traffic to be admitted on a link, hence improving 1426 capacity utilisation. A PCN-excess-rate set further above the PCN- 1427 threshold-rate allows greater increases in traffic (whether due to 1428 natural fluctuations or some unexpected event) before any flows are 1429 terminated, ie minimises the chances of unnecessarily triggering the 1430 termination mechanism. For instance, an operator may want to design 1431 their network so that it can cope with a failure of any single PCN- 1432 node without terminating any flows. 1434 Setting these rates on first deployment of PCN will be very similar 1435 to the traditional process for sizing an admission controlled 1436 network, depending on: the operator's requirements for minimising 1437 flow blocking (grade of service), the expected PCN traffic load on 1438 each link and its statistical characteristics (the traffic matrix), 1439 contingency for re-routing the PCN traffic matrix in the event of 1440 single or multiple failures, and the expected load from other classes 1441 relative to link capacities [Menth]. But once a domain is in 1442 operation, a PCN design goal is to be able to determine growth in 1443 these configured rates much more simply, by monitoring PCN-marking 1444 rates from actual rather than expected traffic (see Section 9.2 on 1445 Performance & Provisioning). 1447 Operators may also wish to configure a rate greater than the PCN- 1448 excess-rate that is the absolute maximum rate that a link allows for 1449 PCN-traffic. This may simply be the physical link rate, but some 1450 operators may wish to configure a logical limit to prevent starvation 1451 of other traffic classes during any brief period after PCN-traffic 1452 exceeds the PCN-excess-rate but before flow termination brings it 1453 back below this rate. 1455 Threshold-marking requires a threshold token bucket depth to be 1456 configured, excess-traffic-marking needs a value for the MTU (maximum 1457 size of a PCN-packet on the link) and both require setting a maximum 1458 size of their token buckets. It will be preferable for there to be 1459 rules to set defaults for these parameters, but then allow operators 1460 to change them, for instance if average traffic characteristics 1461 change over time. 1463 The PCN-egress-node may allow configuration of the following: 1465 o how it smooths metering of PCN-markings (eg EWMA parameters) 1467 Whichever node makes admission and flow termination decisions will 1468 contain algorithms for converting PCN-marking levels into admission 1469 or flow termination decisions. These will also require configurable 1470 parameters, for instance: 1472 o an admission control algorithm that is based on the fraction of 1473 marked packets will at least require a marking threshold setting 1474 above which it denies admission to new flows; 1476 o flow termination algorithms will probably require a parameter to 1477 delay termination of any flows until it is more certain that an 1478 anomalous event is not transient; 1480 o a parameter to control the trade-off between how quickly excess 1481 flows are terminated, and over-termination. 1483 One particular proposal, [I-D.charny-pcn-single-marking] would 1484 require a global parameter to be defined on all PCN-nodes, but only 1485 needs one PCN marking rate to be configured on each link. The global 1486 parameter is a scaling factor between admission and termination (the 1487 PCN-traffic rate on a link up to which flows are admitted vs the rate 1488 above which flows are terminated). [I-D.charny-pcn-single-marking] 1489 discusses in full the impact of this particular proposal on the 1490 operation of PCN. 1492 9.2. Performance & Provisioning OAM 1494 Monitoring of performance factors measurable from *outside* the PCN 1495 domain will be no different with PCN than with any other packet-based 1496 flow admission control system, both at the flow level (blocking 1497 probability etc) and the packet level (jitter [RFC3393], [Y.1541], 1498 loss rate [RFC4656], mean opinion score [P.800], etc). The 1499 difference is that PCN is intentionally designed to indicate 1500 *internally* which exact resource(s) are the cause of performance 1501 problems and by how much. 1503 Even better, PCN indicates which resources will probably cause 1504 problems if they are not upgraded soon. This can be achieved by the 1505 management system monitoring the total amount (in bytes) of PCN- 1506 marking generated by each queue over a period. Given possible long 1507 provisioning lead times, pre-congestion volume is the best metric to 1508 reveal whether sufficient persistent demand has occurred to warrant 1509 an upgrade. Because, even before utilisation becomes problematic, 1510 the statistical variability of traffic will cause occasional bursts 1511 of pre-congestion. This 'early warning system' decouples the process 1512 of adding customers from the provisioning process. This should cut 1513 the time to add a customer when compared against admission control 1514 provided over native DiffServ [RFC2998], because it saves having to 1515 verify the capacity planning process before adding each customer. 1517 Alternatively, before triggering an upgrade, the long term pre- 1518 congestion volume on each link can be used to balance traffic load 1519 across the PCN-domain by adjusting the link weights of the routing 1520 system. When an upgrade to a link's configured PCN-rates is 1521 required, it may also be necessary to upgrade the physical capacity 1522 available to other classes. But usually there will be sufficient 1523 physical capacity for the upgrade to go ahead as a simple 1524 configuration change. Alternatively, [Songhurst] has proposed an 1525 adaptive rather than preconfigured system, where the configured PCN- 1526 threshold-rate is replaced with a high and low water mark and the 1527 marking algorithm automatically optimises how physical capacity is 1528 shared using the relative loads from PCN and other traffic classes. 1530 All the above processes require just three extra counters associated 1531 with each PCN queue: threshold-markings, excess-traffic-markings and 1532 drop. Every time a PCN packet is marked or dropped its size in bytes 1533 should be added to the appropriate counter. Then the management 1534 system can read the counters at any time and subtract a previous 1535 reading to establish the incremental volume of each type of 1536 (pre-)congestion. Readings should be taken frequently, so that 1537 anomalous events (eg re-routes) can be distinguished from regular 1538 fluctuating demand if required. 1540 9.3. Accounting OAM 1542 Accounting is only done at trust boundaries so it is out of scope of 1543 the initial charter of the PCN WG, which is confined to intra-domain 1544 issues. Use of PCN internal to a domain makes no difference to the 1545 flow signalling events crossing trust boundaries outside the PCN- 1546 domain, which are typically used for accounting. 1548 9.4. Fault OAM 1550 Fault OAM is about preventing faults, telling the management system 1551 (or manual operator) that the system has recovered (or not) from a 1552 failure, and about maintaining information to aid fault diagnosis. 1554 Admission blocking and particularly flow termination mechanisms 1555 should rarely be needed in practice. It would be unfortunate if they 1556 didn't work after an option had been accidentally disabled. 1557 Therefore it will be necessary to regularly test that the live system 1558 works as intended (devising a meaningful test is left as an exercise 1559 for the operator). 1561 Section 7 describes how the PCN architecture has been designed to 1562 ensure admitted flows continue gracefully after recovering 1563 automatically from link or node failures. The need to record and 1564 monitor re-routing events affecting signalling is unchanged by the 1565 addition of PCN to a DiffServ domain. Similarly, re-routing events 1566 within the PCN-domain will be recorded and monitored just as they 1567 would be without PCN. 1569 PCN-marking does make it possible to record 'near-misses'. For 1570 instance, at the PCN-egress-node a 'reporting threshold' could be set 1571 to monitor how often - and for how long - the system comes close to 1572 triggering flow blocking without actually doing so. Similarly, 1573 bursts of flow termination marking could be recorded even if they are 1574 not sufficiently sustained to trigger flow termination. Such 1575 statistics could be correlated with per-queue counts of marking 1576 volume (Section 9.2) to upgrade resources in danger of causing 1577 service degradation, or to trigger manual tracing of intermittent 1578 incipient errors that would otherwise have gone unnoticed. 1580 Finally, of course, many faults are caused by failings in the 1581 management process ('human error'): a wrongly configured address in a 1582 node, a wrong address given in a signalling protocol, a wrongly 1583 configured parameter in a queueing algorithm, a node set into a 1584 different mode from other nodes, and so on. Generally, a clean 1585 design with few configurable options ensures this class of faults can 1586 be traced more easily and prevented more often. Sound management 1587 practice at run-time also helps. For instance: a management system 1588 should be used that constrains configuration changes within system 1589 rules (eg preventing an option setting inconsistent with other 1590 nodes); configuration options should also be recorded in an offline 1591 database; and regular automatic consistency checks between live 1592 systems and the database should be performed. PCN adds nothing 1593 specific to this class of problems. 1595 9.5. Security OAM 1597 Security OAM is about using secure operational practices as well as 1598 being able to track security breaches or near-misses at run-time. 1599 PCN adds few specifics to the general good practice required in this 1600 field [RFC4778], other than those below. The correct functions of 1601 the system should be monitored (Section 9.2) in multiple independent 1602 ways and correlated to detect possible security breaches. Persistent 1603 (pre-)congestion marking should raise an alarm (both on the node 1604 doing the marking and on the PCN-egress-node metering it). 1605 Similarly, persistently poor external QoS metrics such as jitter or 1606 MOS should raise an alarm. The following are examples of symptoms 1607 that may be the result of innocent faults, rather than attacks, but 1608 until diagnosed they should be logged and trigger a security alarm: 1610 o Anomalous patterns of non-conforming incoming signals and packets 1611 rejected at the PCN-ingress-nodes (eg packets already marked PCN- 1612 capable, or traffic persistently starving token bucket policers). 1614 o PCN-capable packets arriving at a PCN-egress-node with no 1615 associated state for mapping them to a valid ingress-egress- 1616 aggregate. 1618 o A PCN-ingress-node receiving feedback signals about the pre- 1619 congestion level on a non-existent aggregate, or that are 1620 inconsistent with other signals (eg unexpected sequence numbers, 1621 inconsistent addressing, conflicting reports of the pre-congestion 1622 level, etc). 1624 o Pre-congestion marking arriving at a PCN-egress-node with 1625 (pre-)congestion markings focused on particular flows, rather than 1626 randomly distributed throughout the aggregate. 1628 10. IANA Considerations 1630 This memo includes no request to IANA. 1632 11. Security considerations 1634 Security considerations essentially come from the Trust Assumption 1635 (Section 5.1), ie that all PCN-nodes are PCN-enabled and are trusted 1636 for truthful PCN-marking and transport. PCN splits functionality 1637 between PCN-interior-nodes and PCN-boundary-nodes, and the security 1638 considerations are somewhat different for each, mainly because PCN- 1639 boundary-nodes are flow-aware and PCN-interior-nodes are not. 1641 o Because the PCN-boundary-nodes are flow-aware, they are trusted to 1642 use that awareness correctly. The degree of trust required 1643 depends on the kinds of decisions they have to make and the kinds 1644 of information they need to make them. There is nothing specific 1645 to PCN. 1647 o the PCN-ingress-nodes police packets to ensure a PCN-flow sticks 1648 within its agreed limit, and to ensure that only PCN-flows that 1649 have been admitted contribute PCN-traffic into the PCN-domain. 1650 The policer must drop (or perhaps downgrade to a different DSCP) 1651 any PCN-packets received that are outside this remit. This is 1652 similar to the existing IntServ behaviour. Between them the PCN- 1653 boundary-nodes must encircle the PCN-domain, otherwise PCN-packets 1654 could enter the PCN-domain without being subject to admission 1655 control, which would potentially destroy the QoS of existing 1656 flows. 1658 o PCN-interior-nodes are not flow-aware. This prevents some 1659 security attacks where an attacker targets specific flows in the 1660 data plane - for instance for DoS or eavesdropping. 1662 o The PCN-boundary-nodes rely on correct PCN-marking by the PCN- 1663 interior-nodes. For instance a rogue PCN-interior-node could PCN- 1664 mark all packets so that no flows were admitted. Another 1665 possibility is that it doesn't PCN-mark any packets, even when it 1666 is pre-congested. More subtly, the rogue PCN-interior-node could 1667 perform these attacks selectively on particular flows, or it could 1668 PCN-mark the correct fraction overall, but carefully choose which 1669 flows it marked. 1671 o the PCN-boundary-nodes should be able to deal with DoS attacks and 1672 state exhaustion attacks based on fast changes in per flow 1673 signalling. 1675 o the signalling between the PCN-boundary-nodes must be protected 1676 from attacks. For example the recipient needs to validate that 1677 the message is indeed from the node that claims to have sent it. 1678 Possible measures include digest authentication and protection 1679 against replay and man-in-the-middle attacks. For the specific 1680 protocol RSVP, hop-by-hop authentication is in [RFC2747], and 1681 [I-D.behringer-tsvwg-rsvp-security-groupkeying] may also be 1682 useful. 1684 Operational security advice is given in Section 9.5. 1686 12. Conclusions 1688 The document describes a general architecture for flow admission and 1689 termination based on pre-congestion information in order to protect 1690 the quality of service of established inelastic flows within a single 1691 DiffServ domain. The main topic is the functional architecture. It 1692 also mentions other topics like the assumptions and open issues. 1694 13. Acknowledgements 1696 This document is a revised version of [I-D.eardley-pcn-architecture]. 1697 Its authors were: P. Eardley, J. Babiarz, K. Chan, A. Charny, R. 1698 Geib, G. Karagiannis, M. Menth, T. Tsou. They are therefore 1699 contributors to this document. 1701 Thanks to those who have made comments on 1702 [I-D.eardley-pcn-architecture] and on earlier versions of this draft: 1703 Lachlan Andrew, Joe Babiarz, Fred Baker, David Black, Steven Blake, 1704 Bob Briscoe, Jason Canon, Ken Carlberg, Anna Charny, Joachim 1705 Charzinski, Andras Csaszar, Lars Eggert, Ruediger Geib, Wei Gengyu, 1706 Robert Hancock, Fortune Huang, Christian Hublet, Ingemar Johansson, 1707 Georgios Karagiannis, Hein Mekkes, Michael Menth, Toby Moncaster, 1708 Daisuke Satoh, Ben Strulo, Tom Taylor, Hannes Tschofenig, Tina Tsou, 1709 Lars Westberg, Magnus Westerlund, Delei Yu. Thanks to Bob Briscoe 1710 who extensively revised the Operations and Management section. 1712 This document is the result of discussions in the PCN WG and 1713 forerunner activity in the TSVWG. A number of previous drafts were 1714 presented to TSVWG: [I-D.chan-pcn-problem-statement], 1715 [I-D.briscoe-tsvwg-cl-architecture], [I-D.briscoe-tsvwg-cl-phb], 1716 [I-D.charny-pcn-single-marking], [I-D.babiarz-pcn-sip-cap], 1717 [I-D.lefaucheur-rsvp-ecn], [I-D.westberg-pcn-load-control]. The 1718 authors of them were: B, Briscoe, P. Eardley, D. Songhurst, F. Le 1719 Faucheur, A. Charny, J. Babiarz, K. Chan, S. Dudley, G. Karagiannis, 1720 A. Bader, L. Westberg, J. Zhang, V. Liatsos, X-G. Liu, A. Bhargava. 1722 14. Comments Solicited 1724 Comments and questions are encouraged and very welcome. They can be 1725 addressed to the IETF PCN working group mailing list . 1727 15. Changes 1729 15.1. Changes from -07 to -08 1731 Small changes from second WG last call: 1733 o Section 2: added definition for PCN-admissible-rate and PCN- 1734 supportable-rate. Small changes to use these terms as follows: 1735 Section 3, bullets 2 & 9; S6.1 para 1; S6.2 para1; S6.3 bullet 3; 1736 added to Figs 1 & 2. 1738 o added the phrase "(others might be possible") before the list of 1739 approaches in Section 6.3, 7.4 & 7.5. 1741 o added references to RFC2753 (A framework for policy-based 1742 admission control) in S7.4 & S7.5. 1744 o throughout, updated references now that marking behaviour & 1745 baseline encoding are WG drafts. 1747 o a few typos corrected 1749 15.2. Changes from -06 to -07 1751 References re-formatted to pass ID nits. No other changes. 1753 15.3. Changes from -05 to -06 1755 Minor clarifications throughout, the least insignificant are as 1756 follows: 1758 o Section 1: added to the list of encoding states in an 'extended' 1759 scheme: "or perhaps further encoding states as suggested in 1760 draft-westberg-pcn-load-control" 1762 o Section 2: added definition for PCN-colouring (to clarify that the 1763 term is used consistently differently from 'PCN-marking') 1765 o Section 6.1 and 6.2: added "(others might be possible)" before the 1766 list of high level approaches for making flow admission 1767 (termination) decisions. 1769 o Section 6.2: corrected a significant typo in 2nd bullet (more -> 1770 less) 1772 o Section 6.3: corrected a couple of significant typos in Figure 2 1774 o Section 6.5 (PCN-traffic) re-written for clarity. Non PCN-traffic 1775 contributing to PCN meters is now given as an example (there may 1776 be cases where don't need to meter it). 1778 o Section 7.7: added to the text about encapsulation being done 1779 within the PCN-domain: "Note: A tunnel will not provide this 1780 behaviour if it complies with [RFC3168] tunnelling in either mode, 1781 but it will if it complies with [RFC4301] IPSec tunnelling." 1783 o Section 7.7: added mention of [RFC4301] to the text about 1784 decapsulation being done within the PCN-domain. 1786 o Section 8: deleted the text about design goals, since this is 1787 already covered adequately earlier eg in S3. 1789 o Section 11: replaced the last sentence of bullet 1 by "There is 1790 nothing specific to PCN." 1792 o Appendix: added to open issues: possibility of automatically and 1793 periodically probing. 1795 o References: Split out Normative references (RFC2474 & RFC3246). 1797 15.4. Changes from -04 to -05 1799 Minor nits removed as follows: 1801 o Further minor changes to reflect that baseline encoding is 1802 consensus, standards track document, whilst there can be 1803 (experimental track) encoding extensions 1805 o Traffic conditioning updated to reflect discussions in Dublin, 1806 mainly that PCN-interior-nodes don't police PCN-traffic (so 1807 deleted bullet in S7.1) and that it is not advised to have non 1808 PCN-traffic that shares the same capacity (on a link) as PCN- 1809 traffic (so added bullet in S6.5) 1811 o Probing moved into Appendix A and deleted the 'third viewpoint' 1812 (admission control based on the marking of a single packet like an 1813 RSVP PATH message) - since this isn't really probing, and in any 1814 case is already mentioned in S6.1. 1816 o Minor changes to S9 Operations and management - mainly to reflect 1817 that consensus on marking behaviour has simplified things so eg 1818 there are fewer parameters to configure. 1820 o A few terminology-related errors expunged, and two pictures added 1821 to help. 1823 o Re-phrased the claim about the natural decision point in S7.4 1825 o Clarified that extended encoding schemes need to explain their 1826 interactions with (or assumptions about) tunnelling (S7.7) and how 1827 they meet the guidelines of BCP124 (S6.6) 1829 o Corrected the third bullet in S6.2 (to reflect consensus about 1830 PCN-marking) 1832 15.5. Changes from -03 to -04 1834 o Minor changes throughout to reflect the consensus call about PCN- 1835 marking (as reflected in [I-D.eardley-pcn-marking-behaviour]). 1837 o Minor changes throughout to reflect the current decisions about 1838 encoding (as reflected in [I-D.moncaster-pcn-baseline-encoding] 1839 and [I-D.moncaster-pcn-3-state-encoding]). 1841 o Introduction: re-structured to create new sections on Benefits, 1842 Deployment scenarios and Assumptions. 1844 o Introduction: Added pointers to other PCN documents. 1846 o Terminology: changed PCN-lower-rate to PCN-threshold-rate and PCN- 1847 upper-rate to PCN-excess-rate; excess-rate-marking to excess- 1848 traffic-marking. 1850 o Benefits: added bullet about SRLGs. 1852 o Deployment scenarios: new section combining material from various 1853 places within the document. 1855 o S6 (high level functional architecture): re-structured and edited 1856 to improve clarity, and reflect the latest PCN-marking and 1857 encoding drafts. 1859 o S6.4: added claim that the most natural place to make an admission 1860 decision is a PCN-egress-node. 1862 o S6.5: updated the bullet about non-PCN-traffic that uses the same 1863 DSCP as PCN-traffic. 1865 o S6.6: added a section about backwards compatibility with respect 1866 to [RFC4774]. 1868 o Appendix A: added bullet about end-to-end PCN. 1870 o Probing: moved to Appendix B. 1872 o Other minor clarifications, typos etc. 1874 15.6. Changes from -02 to -03 1876 o Abstract: Clarified by removing the term 'aggregated'. Follow-up 1877 clarifications later in draft: S1: expanded PCN-egress-nodes 1878 bullet to mention case where the PCN-feedback-information is about 1879 one (or a few) PCN-marks, rather than aggregated information; S3 1880 clarified PCN-meter; S5 minor changes; conclusion. 1882 o S1: added a paragraph about how the PCN-domain looks to the 1883 outside world (essentially it looks like a DiffServ domain). 1885 o S2: tweaked the PCN-traffic terminology bullet: changed PCN 1886 traffic classes to PCN behaviour aggregates, to be more in line 1887 with traditional DiffServ jargon (-> follow-up changes later in 1888 draft); included a definition of PCN-flows (and corrected a couple 1889 of 'PCN microflows' to 'PCN-flows' later in draft) 1891 o S3.5: added possibility of downgrading to best effort, where PCN- 1892 packets arrive at PCN-ingress-node already ECN marked (CE or ECN 1893 nonce) 1895 o S4: added note about whether talk about PCN operating on an 1896 interface or on a link. In S8.1 (OAM) mentioned that PCN 1897 functionality needs to be configured consistently on either the 1898 ingress or the egress interface of PCN-nodes in a PCN-domain. 1900 o S5.2: clarified that signalling protocol installs flow filter spec 1901 at PCN-ingress-node (& updates after possible re-route) 1903 o S5.6: addressing: clarified 1905 o S5.7: added tunnelling issue of N^2 scaling if you set up a mesh 1906 of tunnels between PCN-boundary-nodes 1908 o S7.3: Clarified the "third viewpoint" of probing (always probe). 1910 o S8.1: clarified that SNMP is only an example; added note that an 1911 operator may be able to not run PCN on some PCN-interior-nodes, if 1912 it knows that these links will never become (pre-)congested; added 1913 note that it may be possible to have different PCN-boundary-node 1914 behaviours for different ingress-egress-aggregates within the same 1915 PCN-domain. 1917 o Appendix: Created an Appendix about "Possible work items beyond 1918 the scope of the current PCN WG Charter". Material moved from 1919 near start of S3 and elsewhere throughout draft. Moved text about 1920 centralised decision node to Appendix. 1922 o Other minor clarifications. 1924 15.7. Changes from -01 to -02 1926 o S1: Benefits: provisioning bullet extended to stress that PCN does 1927 not use RFC2475-style traffic conditioning. 1929 o S1: Deployment models: mentioned, as variant of PCN-domain 1930 extending to end nodes, that may extend to LAN edge switch. 1932 o S3.1: Trust Assumption: added note about not needing PCN-marking 1933 capability if known that an interface cannot become pre-congested. 1935 o S4: now divided into sub-sections 1937 o S4.1: Admission control: added second proposed method for how to 1938 decide to block new flows (PCN-egress-node receives one (or 1939 several) PCN-marked packets). 1941 o S5: Probing sub-section removed. Material now in new S7. 1943 o S5.6: Addressing: clarified how PCN-ingress-node can discover 1944 address of PCN-egress-node 1946 o S5.6: Addressing: centralised node case, added that PCN-ingress- 1947 node may need to know address of PCN-egress-node 1949 o S5.8: Tunnelling: added case of "partially PCN-capable tunnel" and 1950 degraded bullet on this in S6 (Open Issues) 1952 o S7: Probing: new section. Much more comprehensive than old S5.5. 1954 o S8: Operations and Management: substantially revised. 1956 o other minor changes not affecting semantics 1958 15.8. Changes from -00 to -01 1960 In addition to clarifications and nit squashing, the main changes 1961 are: 1963 o S1: Benefits: added one about provisioning (and contrast with 1964 DiffServ SLAs) 1966 o S1: Benefits: clarified that the objective is also to stop PCN- 1967 packets being significantly delayed (previously only mentioned not 1968 dropping packets) 1970 o S1: Deployment models: added one where policing is done at ingress 1971 of access network and not at ingress of PCN-domain (assume trust 1972 between networks) 1974 o S1: Deployment models: corrected MPLS-TE to MPLS 1976 o S2: Terminology: adjusted definition of PCN-domain 1978 o S3.5: Other assumptions: corrected, so that two assumptions (PCN- 1979 nodes not performing ECN and PCN-ingress-node discarding arriving 1980 CE packet) only apply if the PCN WG decides to encode PCN-marking 1981 in the ECN-field. 1983 o S4 & S5: changed PCN-marking algorithm to marking behaviour 1985 o S4: clarified that PCN-interior-node functionality applies for 1986 each outgoing interface, and added clarification: "The 1987 functionality is also done by PCN-ingress-nodes for their outgoing 1988 interfaces (ie those 'inside' the PCN-domain)." 1990 o S4 (near end): altered to say that a PCN-node "should" dedicate 1991 some capacity to lower priority traffic so that it isn't starved 1992 (was "may") 1994 o S5: clarified to say that PCN functionality is done on an 1995 'interface' (rather than on a 'link') 1997 o S5.2: deleted erroneous mention of service level agreement 1999 o S5.5: Probing: re-written, especially to distinguish probing to 2000 test the ingress-egress-aggregate from probing to test a 2001 particular ECMP path. 2003 o S5.7: Addressing: added mention of probing; added that in the case 2004 where traffic is always tunnelled across the PCN-domain, add a 2005 note that he PCN-ingress-node needs to know the address of the 2006 PCN-egress-node. 2008 o S5.8: Tunnelling: re-written, especially to provide a clearer 2009 description of copying on tunnel entry/exit, by adding explanation 2010 (keeping tunnel encaps/decaps and PCN-marking orthogonal), 2011 deleting one bullet ("if the inner header's marking state is more 2012 sever then it is preserved" - shouldn't happen), and better 2013 referencing of other IETF documents. 2015 o S6: Open issues: stressed that "NOTE: Potential solutions are out 2016 of scope for this document" and edited a couple of sentences that 2017 were close to solution space. 2019 o S6: Open issues: added one about scenarios with only one tunnel 2020 endpoint in the PCN domain . 2022 o S6: Open issues: ECMP: added under-admission as another potential 2023 risk 2025 o S6: Open issues: added one about "Silent at start" 2027 o S10: Conclusions: a small conclusions section added 2029 16. Appendix: Possible work items beyond the scope of the current PCN 2030 WG charter 2032 This section mentions some topics that are outside the PCN WG's 2033 current charter, but which have been mentioned as areas of interest. 2034 They might be work items for: the PCN WG after a future re- 2035 chartering; some other IETF WG; another standards body; an operator- 2036 specific usage that is not standardised. 2038 NOTE: it should be crystal clear that this section discusses 2039 possibilities only. 2041 The first set of possibilities relate to the restrictions on scope 2042 imposed by the PCN WG charter (see Section 5): 2044 o a single PCN-domain encompasses several autonomous systems that do 2045 not trust each other, perhaps by using a mechanism like re-PCN, 2046 [I-D.briscoe-re-pcn-border-cheat]. 2048 o not all the nodes run PCN. For example, the PCN-domain is a 2049 multi-site enterprise network. The sites are connected by a VPN 2050 tunnel; although PCN doesn't operate inside the tunnel, the PCN 2051 mechanisms still work properly because of the good QoS on the 2052 virtual link (the tunnel). Another example is that PCN is 2053 deployed on the general Internet (ie widely but not universally 2054 deployed). 2056 o applying the PCN mechanisms to other types of traffic, ie beyond 2057 inelastic traffic. For instance, applying the PCN mechanisms to 2058 traffic scheduled with the Assured Forwarding per-hop behaviour. 2059 One example could be flow-rate adaptation by elastic applications 2060 that adapt according to the pre-congestion information. 2062 o the aggregation assumption doesn't hold, because the link capacity 2063 is too low. Measurement-based admission control is less accurate, 2064 with a greater risk of over-admission for instance. 2066 o the applicability of PCN mechanisms for emergency use (911, GETS, 2067 WPS, MLPP, etc.) 2069 Other possibilities include: 2071 o Probing. This is discussed in Section 16.1 below. 2073 o The PCN-domain extends to the end users. The scenario is 2074 described in [I-D.babiarz-pcn-sip-cap]. The end users need to be 2075 trusted to do their own policing. This scenario is in the scope 2076 of the PCN WG charter if there is sufficient traffic for the 2077 aggregation assumption to hold. A variant is that the PCN-domain 2078 extends out as far as the LAN edge switch. 2080 o indicating pre-congestion through signalling messages rather than 2081 in-band (in the form of PCN-marked packets) 2083 o the decision-making functionality is at a centralised node rather 2084 than at the PCN-boundary-nodes. This requires that the PCN- 2085 egress-node signals PCN-feedback-information to the centralised 2086 node, and that the centralised node signals to the PCN-ingress- 2087 node the decision about admission (or termination). It may need 2088 the centralised node and the PCN-boundary-nodes to be configured 2089 with each other's addresses. The centralised case is described 2090 further in [I-D.tsou-pcn-racf-applic]. 2092 o Signalling extensions for specific protocols (eg RSVP, NSIS). For 2093 example: the details of how the signalling protocol installs the 2094 flowspec at the PCN-ingress-node for an admitted PCN-flow; and how 2095 the signalling protocol carries the PCN-feedback-information. 2096 Perhaps also for other functions such as: coping with failure of a 2097 PCN-boundary-node ([I-D.briscoe-tsvwg-cl-architecture] considers 2098 what happens if RSVP is the QoS signalling protocol); establishing 2099 a tunnel across the PCN-domain if it is necessary to carry ECN 2100 marks transparently. 2102 o Policing by the PCN-ingress-node may not be needed if the PCN- 2103 domain can trust that the upstream network has already policed the 2104 traffic on its behalf. 2106 o PCN for Pseudowire: PCN may be used as a congestion avoidance 2107 mechanism for edge to edge pseudowire emulations 2108 [I-D.ietf-pwe3-congestion-frmwk]. 2110 o PCN for MPLS: [RFC3270] defines how to support the DiffServ 2111 architecture in MPLS networks (Multi-protocol label switching). 2112 [RFC5129] describes how to add PCN for admission control of 2113 microflows into a set of MPLS aggregates. PCN-marking is done in 2114 MPLS's EXP field (which [I-D.ietf-mpls-cosfield-def] proposes to 2115 re-name to the Class of Service (CoS) field). 2117 o PCN for Ethernet: Similarly, it may be possible to extend PCN into 2118 Ethernet networks, where PCN-marking is done in the Ethernet 2119 header. NOTE: Specific consideration of this extension is outside 2120 the IETF's remit. 2122 16.1. Probing 2124 16.1.1. Introduction 2126 Probing is a potential mechanism to assist admission control. 2128 PCN's admission control, as described so far, is essentially a 2129 reactive mechanism where the PCN-egress-node monitors the pre- 2130 congestion level for traffic from each PCN-ingress-node; if the level 2131 rises then it blocks new flows on that ingress-egress-aggregate. 2132 However, it's possible that an ingress-egress-aggregate carries no 2133 traffic, and so the PCN-egress-node can't make an admission decision 2134 using the usual method described earlier. 2136 One approach is to be "optimistic" and simply admit the new flow. 2137 However it's possible to envisage a scenario where the traffic levels 2138 on other ingress-egress-aggregates are already so high that they're 2139 blocking new PCN-flows, and admitting a new flow onto this 'empty' 2140 ingress-egress-aggregate adds extra traffic onto a link that is 2141 already pre-congested - which may 'tip the balance' so that PCN's 2142 flow termination mechanism is activated or some packets are dropped. 2143 This risk could be lessened by configuring on each link sufficient 2144 'safety margin' above the PCN-threshold-rate. 2146 An alternative approach is to make PCN a more proactive mechanism. 2147 The PCN-ingress-node explicitly determines, before admitting the 2148 prospective new flow, whether the ingress-egress-aggregate can 2149 support it. This can be seen as a "pessimistic" approach, in 2150 contrast to the "optimism" of the approach above. It involves 2151 probing: a PCN-ingress-node generates and sends probe packets in 2152 order to test the pre-congestion level that the flow would 2153 experience. 2155 One possibility is that a probe packet is just a dummy data packet, 2156 generated by the PCN-ingress-node and addressed to the PCN-egress- 2157 node. 2159 16.1.2. Probing functions 2161 The probing functions are: 2163 o Make decision that probing is needed. As described above, this is 2164 when the ingress-egress-aggregate (or the ECMP path - Section 8) 2165 carries no PCN-traffic. An alternative is always to probe, ie 2166 probe before admitting every PCN-flow. 2168 o (if required) Communicate the request that probing is needed - the 2169 PCN-egress-node signals to the PCN-ingress-node that probing is 2170 needed 2172 o (if required) Generate probe traffic - the PCN-ingress-node 2173 generates the probe traffic. The appropriate number (or rate) of 2174 probe packets will depend on the PCN-marking algorithm; for 2175 example an excess-traffic-marking algorithm generates fewer PCN- 2176 marks than a threshold-marking algorithm, and so will need more 2177 probe packets. 2179 o Forward probe packets - as far as PCN-interior-nodes are 2180 concerned, probe packets are handled the same as (ordinary data) 2181 PCN-packets, in terms of routing, scheduling and PCN-marking. 2183 o Consume probe packets - the PCN-egress-node consumes probe packets 2184 to ensure that they don't travel beyond the PCN-domain. 2186 16.1.3. Discussion of rationale for probing, its downsides and open 2187 issues 2189 It is an unresolved question whether probing is really needed, but 2190 two viewpoints have been put forward as to why it is useful. The 2191 first is perhaps the most obvious: there is no PCN-traffic on the 2192 ingress-egress-aggregate. The second assumes that multipath routing 2193 ECMP is running in the PCN-domain. We now consider each in turn. 2195 The first viewpoint assumes the following: 2197 o There is no PCN-traffic on the ingress-egress-aggregate (so a 2198 normal admission decision cannot be made). 2200 o Simply admitting the new flow has a significant risk of leading to 2201 overload: packets dropped or flows terminated. 2203 On the former bullet, [PCN-email-traffic-empty-aggregates] suggests 2204 that, during the future busy hour of a national network with about 2205 100 PCN-boundary-nodes, there are likely to be significant numbers of 2206 aggregates with very few flows under nearly all circumstances. 2208 The latter bullet could occur if new flows start on many of the empty 2209 ingress-egress-aggregates, which together overload a link in the PCN- 2210 domain. To be a problem this would probably have to happen in a 2211 short time period (flash crowd) because, after the reaction time of 2212 the system, other (non-empty) ingress-egress-aggregates that pass 2213 through the link will measure pre-congestion and so block new flows. 2214 Also, flows naturally end anyway. 2216 The downsides of probing for this viewpoint are: 2218 o Probing adds delay to the admission control process. 2220 o Sufficient probing traffic has to be generated to test the pre- 2221 congestion level of the ingress-egress-aggregate. But the probing 2222 traffic itself may cause pre-congestion, causing other PCN-flows 2223 to be blocked or even terminated - and in the flash crowd scenario 2224 there will be probing on many ingress-egress-aggregates. 2226 The second viewpoint applies in the case where there is multipath 2227 routing (ECMP) in the PCN-domain. Note that ECMP is often used on 2228 core networks. There are two possibilities: 2230 (1) If admission control is based on measurements of the ingress- 2231 egress-aggregate, then the viewpoint that probing is useful assumes: 2233 o there's a significant chance that the traffic is unevenly balanced 2234 across the ECMP paths, and hence there's a significant risk of 2235 admitting a flow that should be blocked (because it follows an 2236 ECMP path that is pre-congested) or blocking a flow that should be 2237 admitted. 2239 o Note: [PCN-email-ECMP] suggests unbalanced traffic is quite 2240 possible, even with quite a large number of flows on a PCN-link 2241 (eg 1000) when Assumption 3 (aggregation) is likely to be 2242 satisfied. 2244 (2) If admission control is based on measurements of pre-congestion 2245 on specific ECMP paths, then the viewpoint that probing is useful 2246 assumes: 2248 o There is no PCN-traffic on the ECMP path on which to base an 2249 admission decision. 2251 o Simply admitting the new flow has a significant risk of leading to 2252 overload. 2254 o The PCN-egress-node can match a packet to an ECMP path. 2256 o Note: This is similar to the first viewpoint and so similarly 2257 could occur in a flash crowd if a new flow starts more-or-less 2258 simultaneously on many of the empty ECMP paths. Because there are 2259 several (sometimes many) ECMP paths between each pair of PCN- 2260 boundary-nodes, it's presumably more likely that an ECMP path is 2261 'empty' than an ingress-egress-aggregate is. To constrain the 2262 number of ECMP paths, a few tunnels could be set-up between each 2263 pair of PCN-boundary-nodes. Tunnelling also solves the issue in 2264 the bullet immediately above (which is otherwise hard because an 2265 ECMP routing decision is made independently on each node). 2267 The downsides of probing for this viewpoint are: 2269 o Probing adds delay to the admission control process. 2271 o Sufficient probing traffic has to be generated to test the pre- 2272 congestion level of the ECMP path. But there's the risk that the 2273 probing traffic itself may cause pre-congestion, causing other 2274 PCN-flows to be blocked or even terminated. 2276 o The PCN-egress-node needs to consume the probe packets to ensure 2277 they don't travel beyond the PCN-domain, since they might confuse 2278 the destination end node. This is non-trivial, since probe 2279 packets are addressed to the destination end node, in order to 2280 test the relevant ECMP path (ie they are not addressed to the PCN- 2281 egress-node, unlike the first viewpoint above). 2283 The open issues associated with this viewpoint include: 2285 o What rate and pattern of probe packets does the PCN-ingress-node 2286 need to generate, so that there's enough traffic to make the 2287 admission decision? 2289 o What difficulty does the delay (whilst probing is done), and 2290 possible packet drops, cause applications? 2292 o Can the delay be alleviated by automatically and periodically 2293 probing on the ingress-egress-aggregate? Or does this add too 2294 much overhead? 2296 o Are there other ways of dealing with the flash crowd scenario? 2297 For instance, by limiting the rate at which new flows are 2298 admitted; or perhaps by a PCN-egress-node blocking new flows on 2299 its empty ingress-egress-aggregates when its non-empty ones are 2300 pre-congested. 2302 o (Second viewpoint only) How does the PCN-egress-node disambiguate 2303 probe packets from data packets (so it can consume the former)? 2304 The PCN-egress-node must match the characteristic setting of 2305 particular bits in the probe packet's header or body - but these 2306 bits must not be used by any PCN-interior-node's ECMP algorithm. 2307 In the general case this isn't possible, but it should be possible 2308 for a typical ECMP algorithm (which examines: the source and 2309 destination IP addresses and port numbers, the protocol ID, and 2310 the DSCP). 2312 17. References 2314 17.1. Normative References 2316 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 2317 "Definition of the Differentiated Services Field (DS 2318 Field) in the IPv4 and IPv6 Headers", RFC 2474, 2319 December 1998. 2321 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 2322 J., Courtney, W., Davari, S., Firoiu, V., and D. 2323 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 2324 Behavior)", RFC 3246, March 2002. 2326 17.2. Informative References 2328 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 2329 Services in the Internet Architecture: an Overview", 2330 RFC 1633, June 1994. 2332 [RFC2211] Wroclawski, J., "Specification of the Controlled-Load 2333 Network Element Service", RFC 2211, September 1997. 2335 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 2336 and W. Weiss, "An Architecture for Differentiated 2337 Services", RFC 2475, December 1998. 2339 [RFC2747] Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic 2340 Authentication", RFC 2747, January 2000. 2342 [RFC2753] Yavatkar, R., Pendarakis, D., and R. Guerin, "A Framework 2343 for Policy-based Admission Control", RFC 2753, 2344 January 2000. 2346 [RFC2983] Black, D., "Differentiated Services and Tunnels", 2347 RFC 2983, October 2000. 2349 [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., 2350 Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. 2351 Felstaine, "A Framework for Integrated Services Operation 2352 over Diffserv Networks", RFC 2998, November 2000. 2354 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 2355 of Explicit Congestion Notification (ECN) to IP", 2356 RFC 3168, September 2001. 2358 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 2359 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 2360 Protocol Label Switching (MPLS) Support of Differentiated 2361 Services", RFC 3270, May 2002. 2363 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 2364 Metric for IP Performance Metrics (IPPM)", RFC 3393, 2365 November 2002. 2367 [RFC3411] Harrington, D., Presuhn, R., and B. Wijnen, "An 2368 Architecture for Describing Simple Network Management 2369 Protocol (SNMP) Management Frameworks", STD 62, RFC 3411, 2370 December 2002. 2372 [RFC4216] Zhang, R. and J. Vasseur, "MPLS Inter-Autonomous System 2373 (AS) Traffic Engineering (TE) Requirements", RFC 4216, 2374 November 2005. 2376 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 2377 Internet Protocol", RFC 4301, December 2005. 2379 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 2380 RFC 4303, December 2005. 2382 [RFC4594] Babiarz, J., Chan, K., and F. Baker, "Configuration 2383 Guidelines for DiffServ Service Classes", RFC 4594, 2384 August 2006. 2386 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 2388 Zekauskas, "A One-way Active Measurement Protocol 2389 (OWAMP)", RFC 4656, September 2006. 2391 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 2392 Explicit Congestion Notification (ECN) Field", BCP 124, 2393 RFC 4774, November 2006. 2395 [RFC4778] Kaeo, M., "Operational Security Current Practices in 2396 Internet Service Provider Environments", RFC 4778, 2397 January 2007. 2399 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 2400 Marking in MPLS", RFC 5129, January 2008. 2402 [P.800] "Methods for subjective determination of transmission 2403 quality", ITU-T Recommendation P.800, August 1996. 2405 [Y.1541] "Network Performance Objectives for IP-based Services", 2406 ITU-T Recommendation Y.1541, February 2006. 2408 [I-D.ietf-mpls-cosfield-def] 2409 Andersson, L. and R. Asati, ""EXP field" renamed to 2410 "Traffic Class field"", draft-ietf-mpls-cosfield-def-05 2411 (work in progress), October 2008. 2413 [I-D.ietf-pcn-baseline-encoding] 2414 Moncaster, T., Briscoe, B., and M. Menth, "Baseline 2415 Encoding and Transport of Pre-Congestion Information", 2416 draft-ietf-pcn-baseline-encoding-01 (work in progress), 2417 October 2008. 2419 [I-D.ietf-pcn-marking-behaviour] 2420 Eardley, P., "Marking behaviour of PCN-nodes", 2421 draft-ietf-pcn-marking-behaviour-00 (work in progress), 2422 October 2008. 2424 [I-D.ietf-pwe3-congestion-frmwk] 2425 Bryant, S., Davie, B., Martini, L., and E. Rosen, 2426 "Pseudowire Congestion Control Framework", 2427 draft-ietf-pwe3-congestion-frmwk-01 (work in progress), 2428 May 2008. 2430 [I-D.babiarz-pcn-sip-cap] 2431 Babiarz, J., "SIP Controlled Admission and Preemption", 2432 draft-babiarz-pcn-sip-cap-00 (work in progress), 2433 October 2006. 2435 [I-D.behringer-tsvwg-rsvp-security-groupkeying] 2436 Behringer, M. and F. Faucheur, "Applicability of Keying 2437 Methods for RSVP Security", 2438 draft-behringer-tsvwg-rsvp-security-groupkeying-01 (work 2439 in progress), November 2007. 2441 [I-D.briscoe-re-pcn-border-cheat] 2442 Briscoe, B., "Emulating Border Flow Policing using Re-PCN 2443 on Bulk Data", draft-briscoe-re-pcn-border-cheat-02 (work 2444 in progress), September 2008. 2446 [I-D.briscoe-tsvwg-cl-architecture] 2447 Briscoe, B., "An edge-to-edge Deployment Model for Pre- 2448 Congestion Notification: Admission Control over a 2449 DiffServ Region", draft-briscoe-tsvwg-cl-architecture-04 2450 (work in progress), October 2006. 2452 [I-D.briscoe-tsvwg-cl-phb] 2453 Briscoe, B., "Pre-Congestion Notification marking", 2454 draft-briscoe-tsvwg-cl-phb-03 (work in progress), 2455 October 2006. 2457 [I-D.briscoe-tsvwg-ecn-tunnel] 2458 Briscoe, B., "Layered Encapsulation of Congestion 2459 Notification", draft-briscoe-tsvwg-ecn-tunnel-01 (work in 2460 progress), July 2008. 2462 [I-D.chan-pcn-problem-statement] 2463 Chan, K., "Pre-Congestion Notification Problem Statement", 2464 draft-chan-pcn-problem-statement-01 (work in progress), 2465 October 2006. 2467 [I-D.charny-pcn-comparison] 2468 Charny, A., "Comparison of Proposed PCN Approaches", 2469 draft-charny-pcn-comparison-00 (work in progress), 2470 November 2007. 2472 [I-D.charny-pcn-single-marking] 2473 Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre- 2474 Congestion Notification Using Single Marking for Admission 2475 and Termination", draft-charny-pcn-single-marking-03 2476 (work in progress), November 2007. 2478 [I-D.eardley-pcn-architecture] 2479 Eardley, P., "Pre-Congestion Notification Architecture", 2480 draft-eardley-pcn-architecture-00 (work in progress), 2481 June 2007. 2483 [I-D.eardley-pcn-marking-behaviour] 2484 Eardley, P., "Marking behaviour of PCN-nodes", 2485 draft-eardley-pcn-marking-behaviour-01 (work in progress), 2486 June 2008. 2488 [I-D.lefaucheur-rsvp-ecn] 2489 Faucheur, F., "RSVP Extensions for Admission Control over 2490 Diffserv using Pre-congestion Notification (PCN)", 2491 draft-lefaucheur-rsvp-ecn-01 (work in progress), 2492 June 2006. 2494 [I-D.menth-pcn-emft] 2495 Menth, M., Lehrieder, F., Eardley, P., Charny, A., and J. 2496 Babiarz, "Edge-Assisted Marked Flow Termination", 2497 draft-menth-pcn-emft-00 (work in progress), February 2008. 2499 [I-D.menth-pcn-psdm-encoding] 2500 Menth, M., Babiarz, J., Moncaster, T., and B. Briscoe, 2501 "PCN Encoding for Packet-Specific Dual Marking (PSDM)", 2502 draft-menth-pcn-psdm-encoding-00 (work in progress), 2503 July 2008. 2505 [I-D.moncaster-pcn-3-state-encoding] 2506 Moncaster, T., Briscoe, B., and M. Menth, "A three state 2507 extended PCN encoding scheme", 2508 draft-moncaster-pcn-3-state-encoding-00 (work in 2509 progress), June 2008. 2511 [I-D.moncaster-pcn-baseline-encoding] 2512 Moncaster, T., Briscoe, B., and M. Menth, "Baseline 2513 Encoding and Transport of Pre-Congestion Information", 2514 draft-moncaster-pcn-baseline-encoding-02 (work in 2515 progress), July 2008. 2517 [I-D.sarker-pcn-ecn-pcn-usecases] 2518 Sarker, Z. and I. Johansson, "Usecases and Benefits of end 2519 to end ECN support in PCN Domains", 2520 draft-sarker-pcn-ecn-pcn-usecases-01 (work in progress), 2521 May 2008. 2523 [I-D.tsou-pcn-racf-applic] 2524 Tsou, T. and T. Taylor, "Applicability Statement for the 2525 Use of Pre-Congestion Notification in a Resource- 2526 Controlled Network", draft-tsou-pcn-racf-applic-00 (work 2527 in progress), February 2008. 2529 [I-D.westberg-pcn-load-control] 2530 Westberg, L., Bhargava, A., Bader, A., Karagiannis, G., 2531 and H. Mekkes, "LC-PCN: The Load Control PCN Solution", 2532 draft-westberg-pcn-load-control-04 (work in progress), 2533 July 2008. 2535 [Hancock] "Slide 14 of 'NSIS: An Outline Framework for QoS 2536 Signalling'", May 2002, . 2539 [Iyer] "An approach to alleviate link overload as observed on an 2540 IP backbone", IEEE INFOCOM , 2003, 2541 . 2543 [Menth] "PCN-Based Resilient Network Admission Control: The Impact 2544 of a Single Bit"", Technical Report , 2007, . 2548 [Menth08] "PCN-Based Admission Control and Flow Termination", 2008, 2549 . 2552 [PCN-email-ECMP] 2553 "Email to PCN WG mailing list", November 2007, . 2556 [PCN-email-SRLG] 2557 "Email to PCN WG mailing list", March 2008, . 2560 [PCN-email-traffic-empty-aggregates] 2561 "Email to PCN WG mailing list", October 2007, . 2564 [Songhurst] 2565 "Guaranteed QoS Synthesis for Admission Control with 2566 Shared Capacity", BT Technical Report TR-CXR9-2006-001, 2567 Feburary 2006, . 2570 [Style] "Guardian Style", Note: This document uses the 2571 abbreviations 'ie' and 'eg' (not 'i.e.' and 'e.g.'), as in 2572 many style guides, eg, 2007, 2573 . 2575 Author's Address 2577 Philip Eardley 2578 BT 2579 B54/77, Sirius House Adastral Park Martlesham Heath 2580 Ipswich, Suffolk IP5 3RE 2581 United Kingdom 2583 Email: philip.eardley@bt.com 2585 Full Copyright Statement 2587 Copyright (C) The IETF Trust (2008). 2589 This document is subject to the rights, licenses and restrictions 2590 contained in BCP 78, and except as set forth therein, the authors 2591 retain all their rights. 2593 This document and the information contained herein are provided on an 2594 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2595 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2596 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2597 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2598 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2599 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2601 Intellectual Property 2603 The IETF takes no position regarding the validity or scope of any 2604 Intellectual Property Rights or other rights that might be claimed to 2605 pertain to the implementation or use of the technology described in 2606 this document or the extent to which any license under such rights 2607 might or might not be available; nor does it represent that it has 2608 made any independent effort to identify any such rights. Information 2609 on the procedures with respect to rights in RFC documents can be 2610 found in BCP 78 and BCP 79. 2612 Copies of IPR disclosures made to the IETF Secretariat and any 2613 assurances of licenses to be made available, or the result of an 2614 attempt made to obtain a general license or permission for the use of 2615 such proprietary rights by implementers or users of this 2616 specification can be obtained from the IETF on-line IPR repository at 2617 http://www.ietf.org/ipr. 2619 The IETF invites any interested party to bring to its attention any 2620 copyrights, patents or patent applications, or other proprietary 2621 rights that may cover technology that may be required to implement 2622 this standard. Please address the information to the IETF at 2623 ietf-ipr@ietf.org. 2625 Acknowledgment 2627 Funding for the RFC Editor function is provided by the IETF 2628 Administrative Support Activity (IASA).