idnits 2.17.1 draft-ietf-pcn-architecture-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2543. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2554. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2561. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2567. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 10, 2008) is 5706 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-08) exists of draft-ietf-mpls-cosfield-def-04 == Outdated reference: A later version (-02) exists of draft-ietf-pwe3-congestion-frmwk-01 == Outdated reference: A later version (-03) exists of draft-briscoe-re-pcn-border-cheat-01 == Outdated reference: A later version (-01) exists of draft-moncaster-pcn-3-state-encoding-00 == Outdated reference: A later version (-01) exists of draft-tsou-pcn-racf-applic-00 == Outdated reference: A later version (-02) exists of draft-sarker-pcn-ecn-pcn-usecases-01 == Outdated reference: A later version (-05) exists of draft-westberg-pcn-load-control-04 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion and Pre-Congestion Philip. Eardley (Editor) 3 Notification Working Group BT 4 Internet-Draft September 10, 2008 5 Intended status: Informational 6 Expires: March 14, 2009 8 Pre-Congestion Notification (PCN) Architecture 9 draft-ietf-pcn-architecture-06 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on March 14, 2009. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 This document describes a general architecture for flow admission and 43 termination based on pre-congestion information in order to protect 44 the quality of service of established inelastic flows within a single 45 DiffServ domain. 47 Status 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 3. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 54 4. Deployment scenarios . . . . . . . . . . . . . . . . . . . . . 8 55 5. Assumptions and constraints on scope . . . . . . . . . . . . . 12 56 5.1. Assumption 1: Trust and support of PCN - controlled 57 environment . . . . . . . . . . . . . . . . . . . . . . . 12 58 5.2. Assumption 2: Real-time applications . . . . . . . . . . . 13 59 5.3. Assumption 3: Many flows and additional load . . . . . . . 13 60 5.4. Assumption 4: Emergency use out of scope . . . . . . . . . 14 61 6. High-level functional architecture . . . . . . . . . . . . . . 14 62 6.1. Flow admission . . . . . . . . . . . . . . . . . . . . . . 16 63 6.2. Flow termination . . . . . . . . . . . . . . . . . . . . . 16 64 6.3. Flow admission and/or flow termination when there are 65 only two PCN encoding states . . . . . . . . . . . . . . . 17 66 6.4. Information transport . . . . . . . . . . . . . . . . . . 18 67 6.5. PCN-traffic . . . . . . . . . . . . . . . . . . . . . . . 19 68 6.6. Backwards compatibility . . . . . . . . . . . . . . . . . 20 69 7. Detailed Functional architecture . . . . . . . . . . . . . . . 20 70 7.1. PCN-interior-node functions . . . . . . . . . . . . . . . 21 71 7.2. PCN-ingress-node functions . . . . . . . . . . . . . . . . 21 72 7.3. PCN-egress-node functions . . . . . . . . . . . . . . . . 22 73 7.4. Admission control functions . . . . . . . . . . . . . . . 23 74 7.5. Flow termination functions . . . . . . . . . . . . . . . . 23 75 7.6. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 24 76 7.7. Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . 25 77 7.8. Fault handling . . . . . . . . . . . . . . . . . . . . . . 27 78 8. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 27 79 9. Operations and Management . . . . . . . . . . . . . . . . . . 29 80 9.1. Configuration OAM . . . . . . . . . . . . . . . . . . . . 29 81 9.1.1. System options . . . . . . . . . . . . . . . . . . . . 30 82 9.1.2. Parameters . . . . . . . . . . . . . . . . . . . . . . 31 83 9.2. Performance & Provisioning OAM . . . . . . . . . . . . . . 33 84 9.3. Accounting OAM . . . . . . . . . . . . . . . . . . . . . . 34 85 9.4. Fault OAM . . . . . . . . . . . . . . . . . . . . . . . . 34 86 9.5. Security OAM . . . . . . . . . . . . . . . . . . . . . . . 35 87 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 88 11. Security considerations . . . . . . . . . . . . . . . . . . . 36 89 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 90 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 91 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38 92 15. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 93 15.1. Changes from -05 to -06 . . . . . . . . . . . . . . . . . 38 94 15.2. Changes from -04 to -05 . . . . . . . . . . . . . . . . . 39 95 15.3. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 39 96 15.4. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 40 97 15.5. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 41 98 15.6. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 42 99 16. Appendix: Possible work items beyond the scope of the 100 current PCN WG charter . . . . . . . . . . . . . . . . . . . . 44 101 16.1. Probing . . . . . . . . . . . . . . . . . . . . . . . . . 46 102 16.1.1. Introduction . . . . . . . . . . . . . . . . . . . . . 46 103 16.1.2. Probing functions . . . . . . . . . . . . . . . . . . 46 104 16.1.3. Discussion of rationale for probing, its downsides 105 and open issues . . . . . . . . . . . . . . . . . . . 47 106 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 50 107 17.1. Normative References . . . . . . . . . . . . . . . . . . . 50 108 17.2. Informative References . . . . . . . . . . . . . . . . . . 50 109 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 54 110 Intellectual Property and Copyright Statements . . . . . . . . . . 56 112 1. Introduction 114 The purpose of this document is to describe a general architecture 115 for flow admission and termination based on (pre-) congestion 116 information in order to protect the quality of service of flows 117 within a DiffServ domain [RFC2475]. This document defines an 118 architecture for implementing two mechanisms to protect the quality 119 of service of established inelastic flows within a single DiffServ 120 domain, where all boundary and interior nodes are PCN-enabled and are 121 trusted for correct PCN operation. Flow admission control determines 122 whether a new flow should be admitted, in order to protect the QoS of 123 existing PCN-flows in normal circumstances. However, in abnormal 124 circumstances, for instance a disaster affecting multiple nodes and 125 causing traffic re-routes, then the QoS on existing PCN-flows may 126 degrade even though care was exercised when admitting those flows. 127 Therefore we also propose a mechanism for flow termination, which 128 removes enough traffic in order to protect the QoS of the remaining 129 PCN-flows. 131 As a fundamental building block to enable these two mechanisms, PCN- 132 interior-nodes generate, encode and transport pre-congestion 133 information towards the PCN-egress-nodes. Two rates, a PCN- 134 threshold-rate and a PCN-excess-rate, are associated with each link 135 of the PCN-domain. Each rate is used by a marking behaviour that 136 determines how and when PCN-packets are marked, and how the markings 137 are encoded in packet headers. Overall the aim is to enable PCN- 138 nodes to give an "early warning" of potential congestion before there 139 is any significant build-up of PCN-packets in the queue. 141 PCN-boundary-nodes convert measurements of these PCN-markings into 142 decisions about flow admission and termination. In a PCN-domain with 143 both threshold marking and excess traffic marking enabled, then the 144 admission control mechanism limits the PCN-traffic on each link to 145 *roughly* its PCN-threshold-rate and the flow termination mechanism 146 limits the PCN-traffic on each link to *roughly* its PCN-excess-rate. 147 Other scenarios are discussed later. 149 The behaviour of PCN-interior-nodes is standardised in other 150 documents, which are summarised in this document: 152 o Marking behaviour: threshold marking and excess traffic marking 153 [I-D.eardley-pcn-marking-behaviour]. Threshold marking marks all 154 PCN-packets if the PCN traffic rate is greater than a first 155 configured rate, "PCN-threshold-rate". Excess traffic marking 156 marks a proportion of PCN-packets, such that the amount marked 157 equals the traffic rate in excess of a second configured rate, 158 "PCN-excess-rate". 160 o Encoding: a combination of the DSCP field and ECN field in the IP 161 header indicates that a packet is a PCN-packet and whether it is 162 PCN-marked. The "baseline" encoding is standardised in 163 [I-D.moncaster-pcn-baseline-encoding], which standardises two PCN 164 encoding states (PCN-marked and not PCN-marked), whilst 165 (experimental) extensions to the baseline encoding can provide 166 three encoding states (threshold-marked, excess-traffic-marked, 167 not PCN-marked, or perhaps further encoding states as suggested in 168 [I-D.westberg-pcn-load-control]). PCN encoding uses PCN therefore 169 defines semantics for the ECN field different from the default 170 semantics of [RFC3168], and so its encoding needs to meet the 171 guidelines of BCP 124, [RFC4774]. 173 The behaviour of PCN-boundary-nodes is described in Informational 174 documents. Several possibilities are outlined in this document; 175 detailed descriptions and comparisons are in 176 [I-D.charny-pcn-comparison] and [Menth08]. 178 This document describes the PCN architecture at a high level (Section 179 6) and in more detail (Section 7). It also defines some terminology 180 and outlines some benefits, deployment scenarios, and assumptions of 181 PCN (Sections 2-5). Finally it outlines some challenges, operations 182 and management, and security considerations, and some potential 183 future work items (Sections 8, 9, 11 and Appendix). 185 2. Terminology 187 o PCN-domain: a PCN-capable domain; a contiguous set of PCN-enabled 188 nodes that perform DiffServ scheduling [RFC2474]; the complete set 189 of PCN-nodes whose PCN-marking can in principle influence 190 decisions about flow admission and termination for the PCN-domain, 191 including the PCN-egress-nodes, which measure these PCN-marks. 193 o PCN-boundary-node: a PCN-node that connects one PCN-domain to a 194 node either in another PCN-domain or in a non PCN-domain. 196 o PCN-interior-node: a node in a PCN-domain that is not a PCN- 197 boundary-node. 199 o PCN-node: a PCN-boundary-node or a PCN-interior-node 201 o PCN-egress-node: a PCN-boundary-node in its role in handling 202 traffic as it leaves a PCN-domain. 204 o PCN-ingress-node: a PCN-boundary-node in its role in handling 205 traffic as it enters a PCN-domain. 207 o PCN-traffic, PCN-packets, PCN-BA: a PCN-domain carries traffic of 208 different DiffServ behaviour aggregates (BAs) [RFC2474]. The 209 PCN-BA uses the PCN mechanisms to carry PCN-traffic and the 210 corresponding packets are PCN-packets. The same network will 211 carry traffic of other DiffServ BAs. The PCN-BA is distinguished 212 by a combination of the DiffServ codepoint (DSCP) and ECN fields. 214 o PCN-flow: the unit of PCN-traffic that the PCN-boundary-node 215 admits (or terminates); the unit could be a single microflow (as 216 defined in [RFC2474]) or some identifiable collection of 217 microflows. 219 o Ingress-egress-aggregate: The collection of PCN-packets from all 220 PCN-flows that travel in one direction between a specific pair of 221 PCN-boundary-nodes. 223 o PCN-threshold-rate: a reference rate configured for each link in 224 the PCN-domain, which is lower than the PCN-excess-rate. It is 225 used by a marking behaviour that determines whether a packet 226 should be PCN-marked with a first encoding, "threshold-marked". 228 o PCN-excess-rate: a reference rate configured for each link in the 229 PCN-domain, which is higher than the PCN-threshold-rate. It is 230 used by a marking behaviour that determines whether a packet 231 should be PCN-marked with a second encoding, "excess-traffic- 232 marked". 234 o Threshold-marking: a PCN-marking behaviour with the objective that 235 all PCN-traffic is marked if the PCN-traffic exceeds the PCN- 236 threshold-rate. 238 o Excess-traffic-marking: a PCN-marking behaviour with the objective 239 that the amount of PCN-traffic that is PCN-marked is equal to the 240 amount that exceeds the PCN-excess-rate. 242 o Pre-congestion: a condition of a link within a PCN-domain such 243 that the PCN-node performs PCN-marking, in order to provide an 244 "early warning" of potential congestion before there is any 245 significant build-up of PCN-packets in the real queue. (Hence, by 246 analogy with ECN we call our mechanism Pre-Congestion 247 Notification.) 249 o PCN-marking: the process of setting the header in a PCN-packet 250 based on defined rules, in reaction to pre-congestion; either 251 threshold-marking or excess-traffic-marking. 253 o PCN-colouring: the process of setting the header in a PCN-packet 254 by a PCN-boundary-node; performed by a PCN-ingress-node so that 255 PCN-nodes can easily identify PCN-packets; performed by a PCN- 256 egress-node so that the header is appropriate for nodes beyond the 257 PCN-domain. 259 o PCN-feedback-information: information signalled by a PCN-egress- 260 node to a PCN-ingress-node (or a central control node), which is 261 needed for the flow admission and flow termination mechanisms. 263 3. Benefits 265 We believe that the key benefits of the PCN mechanisms described in 266 this document are that they are simple, scalable, and robust because: 268 o Per flow state is only required at the PCN-ingress-nodes 269 ("stateless core"). This is required for policing purposes (to 270 prevent non-admitted PCN traffic from entering the PCN-domain) and 271 so on. It is not generally required that other network entities 272 are aware of individual flows (although they may be in particular 273 deployment scenarios). 275 o Admission control is resilient: with PCN QoS is decoupled from the 276 routing system. Hence in general admitted flows can survive 277 capacity, routing or topology changes without additional 278 signalling. The PCN-threshold-rate on each link can be chosen 279 small enough that admitted traffic can still be carried after a 280 rerouting in most failure cases [Menth]. This is an important 281 feature as QoS violations in core networks due to link failures 282 are more likely than QoS violations due to increased traffic 283 volume [Iyer]. 285 o The PCN-marking behaviours only operate on the overall PCN-traffic 286 on the link, not per flow. 288 o The information of these measurements is signalled to the PCN- 289 egress-nodes by the PCN-marks in the packet headers, ie [Style] 290 "in-band". No additional signalling protocol is required for 291 transporting the PCN-marks. Therefore no secure binding is 292 required between data packets and separate congestion messages. 294 o The PCN-egress-nodes make separate measurements, operating on the 295 aggregate PCN-traffic from each PCN-ingress-node, ie not per flow. 296 Similarly, signalling by the PCN-egress-node of PCN-feedback- 297 information (which is used for flow admission and termination 298 decisions) is at the granularity of the ingress-egress-aggregate. 299 An alternative approach is that the PCN-egress-nodes monitor the 300 PCN-traffic and signal PCN-feedback-information (which is used for 301 flow admission and termination decisions) at the granularity of 302 one (or a few) PCN-marks. 304 o The admitted PCN-load is controlled dynamically. Therefore it 305 adapts as the traffic matrix changes, and also if the network 306 topology changes (eg after a link failure). Hence an operator can 307 be less conservative when deploying network capacity, and less 308 accurate in their prediction of the PCN-traffic matrix. 310 o The termination mechanism complements admission control. It 311 allows the network to recover from sudden unexpected surges of 312 PCN-traffic on some links, thus restoring QoS to the remaining 313 flows. Such scenarios are expected to be rare but not impossible. 314 They can be caused by large network failures that redirect lots of 315 admitted PCN-traffic to other links, or by malfunction of the 316 measurement-based admission control in the presence of admitted 317 flows that send for a while with an atypically low rate and then 318 increase their rates in a correlated way. 320 o Flow termination can also enable an operator to be less 321 conservative when deploying network capacity. It is an 322 alternative to running links at low utilisation in order to 323 protect against link or node failures. This is especially the 324 case with SRLGs (shared risk link groups, which are links that 325 share a resource, such as a fibre, whose failure affects all those 326 links [RFC4216]). A requirement to fully protect traffic against 327 a single SRLG failure requires low utilisation (~10%) of the link 328 bandwidth on some links before failure [PCN-email-SRLG]. 330 o The PCN-excess-rate may be set below the maximum rate that PCN- 331 traffic can be transmitted on a link, in order to trigger 332 termination of some PCN-flows before loss (or excessive delay) of 333 PCN-packets occurs, or to keep the maximum PCN-load on a link 334 below a level configured by the operator. 336 o Provisioning of the network is decoupled from the process of 337 adding new customers. By contrast, with the DiffServ architecture 338 [RFC2475] operators rely on subscription-time Service Level 339 Agreements, which statically define the parameters of the traffic 340 that will be accepted from a customer, and so the operator has to 341 run the provisioning process each time a new customer is added to 342 check that the Service Level Agreement can be fulfilled. A PCN- 343 domain doesn't need such traffic conditioning. 345 4. Deployment scenarios 347 Operators of networks will want to use the PCN mechanisms in various 348 arrangements, for instance depending on how they are performing 349 admission control outside the PCN-domain (users after all are 350 concerned about QoS end-to-end), what their particular goals and 351 assumptions are, how many PCN encoding states are available, and so 352 on. 354 From the perspective of the outside world, a PCN-domain essentially 355 looks like a DiffServ domain. PCN-traffic is either transported 356 across it transparently or policed at the PCN-ingress-node (ie 357 dropped or carried at a lower QoS). One difference is that PCN- 358 traffic has better QoS guarantees than normal DiffServ traffic, 359 because the PCN mechanisms better protect the QoS of admitted flows. 360 Another difference may occur in the rare circumstance when there is a 361 failure: on the one hand some PCN-flows may get terminated, but on 362 the other hand other flows will get their QoS restored. Non PCN- 363 traffic is treated transparently, ie the PCN-domain is a normal 364 DiffServ domain. 366 An operator may choose to deploy either admission control or flow 367 termination or both. Although designed to work together, they are 368 independent mechanisms, and the use of one does not require or 369 prevent the use of the other. 371 A PCN-domain may have three encoding states (or pedantically, an 372 operator may choose to use up three encoding states for PCN): not 373 PCN-marked, threshold-marked, excess-traffic-marked. Then both PCN 374 admission control and flow termination can be supported. As 375 illustrated in Figure 1, admission control accepts new flows until 376 the PCN-traffic rate on the bottleneck link rises above the PCN- 377 threshold-rate, whilst if necessary the flow termination mechanism 378 terminates flows down to the PCN-excess-rate on the bottleneck link. 380 ==Marking behaviour== ==PCN mechanisms== 381 Rate of ^ 382 PCN-traffic on | 383 bottleneck link | (as below and also) 384 | (as below) Drop some PCN-pkts 385 | 386 scheduler rate -|--------------------------------------------------- 387 (for PCN-traffic)| 388 | Some pkts Terminate some 389 | excess-traffic-marked admitted flows 390 | & & 391 | Rest of pkts Block new flows 392 | threshold-marked 393 | 394 PCN-excess-rate -|--------------------------------------------------- 395 | 396 | All pkts Block new flows 397 | threshold-marked 398 | 399 PCN-threshold-rate -|--------------------------------------------------- 400 | 401 | No pkts Admit new flows 402 | PCN-marked 403 | 405 Figure 1: Schematic of how the PCN admission control and flow 406 termination mechanisms operate as the rate of PCN-traffic increases, 407 for a PCN-domain with three encoding states. 409 On the other hand, a PCN-domain may have two encoding states (as in 410 [I-D.moncaster-pcn-baseline-encoding]) (or pedantically, an operator 411 may choose to use up two encoding states for PCN): not PCN-marked, 412 PCN-marked. Then there are three possibilities, as discussed in the 413 following paragraphs (see also Section 6.3). 415 First, an operator could just use PCN's admission control, solving 416 heavy congestion (caused by re-routing) by 'just waiting' - as 417 sessions end, PCN-traffic naturally reduces, and meanwhile the 418 admission control mechanism will prevent admission of new flows that 419 use the affected links. So the PCN-domain will naturally return to 420 normal operation, but with reduced capacity. The drawback of this 421 approach would be that, until sufficient sessions have ended to 422 relieve the congestion, all PCN-flows as well as lower priority 423 services will be adversely affected. 425 Second, an operator could just rely for admission control on 426 statically provisioned capacity per PCN-ingress-node (regardless of 427 the PCN-egress-node of a flow), as is typical in the hose model of 428 the DiffServ architecture [RFC2475]. Such traffic conditioning 429 agreements can lead to focused overload: many flows happen to focus 430 on a particular link and then all flows through the congested link 431 fail catastrophically. PCN's flow termination mechanism could then 432 be used to counteract such a problem. 434 Third, both admission control and flow termination can be triggered 435 from the single type of PCN-marking; the main downside is that 436 admission control is less accurate [I-D.charny-pcn-single-marking]. 438 Within the PCN-domain there is some flexibility about how the 439 decision making functionality is distributed. These possibilities 440 are outlined in Section 7.4 and also discussed elsewhere, such as in 441 [Menth08]. 443 The flow admission and termination decisions need to be enforced 444 through per flow policing by the PCN-ingress-nodes. If there are 445 several PCN-domains on the end-to-end path, then each needs to police 446 at its PCN-ingress-nodes. One exception is if the operator runs both 447 the access network (not a PCN-domain) and the core network (a PCN- 448 domain); per flow policing could be devolved to the access network 449 and not done at the PCN-ingress-node. Note: to aid readability, the 450 rest of this draft assumes that policing is done by the PCN-ingress- 451 nodes. 453 PCN admission control has to fit with the overall approach to 454 admission control. For instance [I-D.briscoe-tsvwg-cl-architecture] 455 describes the case where RSVP signalling runs end-to-end. The PCN- 456 domain is a single RSVP hop, ie only the PCN-boundary-nodes process 457 RSVP messages, with RSVP messages processed on each hop outside the 458 PCN-domain, as in IntServ over DiffServ [RFC2998]. It would also be 459 possible for the RSVP signalling to be originated and/or terminated 460 by proxies, with application-layer signalling between the end user 461 and the proxy (eg SIP signalling with a home hub). A similar example 462 would use NSIS signalling instead of RSVP. 464 It is possible that a user wants its inelastic traffic to use the PCN 465 mechanisms but also react to ECN marking outside the PCN-domain 466 [I-D.sarker-pcn-ecn-pcn-usecases]. Two possible ways to do this are 467 to tunnel all PCN-packets across the PCN-domain, so that the ECN 468 marks are carried transparently across the PCN-domain, or to use an 469 encoding like [I-D.moncaster-pcn-3-state-encoding]. Tunnelling is 470 discussed further in Section 7.7. 472 Some possible deployment models that are outside the current PCN WG 473 charter are outlined in the Appendix. 475 5. Assumptions and constraints on scope 477 The scope of PCN is, at least initially (see Appendix), restricted by 478 the following assumptions: 480 1. these components are deployed in a single DiffServ domain, within 481 which all PCN-nodes are PCN-enabled and are trusted for truthful 482 PCN-marking and transport 484 2. all flows handled by these mechanisms are inelastic and 485 constrained to a known peak rate through policing or shaping 487 3. the number of PCN-flows across any potential bottleneck link is 488 sufficiently large that stateless, statistical mechanisms can be 489 effective. To put it another way, the aggregate bit rate of PCN- 490 traffic across any potential bottleneck link needs to be 491 sufficiently large relative to the maximum additional bit rate 492 added by one flow. This is the basic assumption of measurement- 493 based admission control. 495 4. PCN-flows may have different precedence, but the applicability of 496 the PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc.) 497 is out of scope. 499 5.1. Assumption 1: Trust and support of PCN - controlled environment 501 We assume that the PCN-domain is a controlled environment, ie all the 502 nodes in a PCN-domain run PCN and are trusted. There are several 503 reasons for proposing this assumption: 505 o The PCN-domain has to be encircled by a ring of PCN-boundary- 506 nodes, otherwise traffic could enter a PCN-BA without being 507 subject to admission control, which would potentially degrade the 508 QoS of existing PCN-flows. 510 o Similarly, a PCN-boundary-node has to trust that all the PCN-nodes 511 mark PCN-traffic consistently. A node not performing PCN-marking 512 wouldn't be able to alert when it suffered pre-congestion, which 513 potentially would lead to too many PCN-flows being admitted (or 514 too few being terminated). Worse, a rogue node could perform 515 various attacks, as discussed in the Security Considerations 516 section. 518 One way of assuring the above two points is that the entire PCN- 519 domain is run by a single operator. Another possibility is that 520 there are several operators that trust each other in their handling 521 of PCN-traffic. 523 Note: All PCN-nodes need to be trustworthy. However if it is known 524 that an interface cannot become pre-congested then it is not strictly 525 necessary for it to be capable of PCN-marking. But this must be 526 known even in unusual circumstances, eg after the failure of some 527 links. 529 5.2. Assumption 2: Real-time applications 531 We assume that any variation of source bit rate is independent of the 532 level of pre-congestion. We assume that PCN-packets come from real 533 time applications generating inelastic traffic, ie sending packets at 534 the rate the codec produces them, regardless of the availability of 535 capacity [RFC4594]. For example, voice and video requiring low 536 delay, jitter and packet loss, the Controlled Load Service, 537 [RFC2211], and the Telephony service class, [RFC4594]. This 538 assumption is to help focus the effort where it looks like PCN would 539 be most useful, ie the sorts of applications where per flow QoS is a 540 known requirement. In other words we focus on PCN providing a 541 benefit to inelastic traffic (PCN may or may not provide a benefit to 542 other types of traffic). 544 As a consequence, it is assumed that PCN-marking is being applied to 545 traffic scheduled with the expedited forwarding per-hop behaviour, 546 [RFC3246], or a per-hop behaviour with similar characteristics. 548 5.3. Assumption 3: Many flows and additional load 550 We assume that there are many PCN-flows on any bottleneck link in the 551 PCN-domain (or, to put it another way, the aggregate bit rate of PCN- 552 traffic across any potential bottleneck link is sufficiently large 553 relative to the maximum additional bit rate added by one PCN-flow). 554 Measurement-based admission control assumes that the present is a 555 reasonable prediction of the future: the network conditions are 556 measured at the time of a new flow request, however the actual 557 network performance must be acceptable during the call some time 558 later. One issue is that if there are only a few variable rate 559 flows, then the aggregate traffic level may vary a lot, perhaps 560 enough to cause some packets to get dropped. If there are many flows 561 then the aggregate traffic level should be statistically smoothed. 562 How many flows is enough depends on a number of factors such as the 563 variation in each flow's rate, the total rate of PCN-traffic, and the 564 size of the "safety margin" between the traffic level at which we 565 start admission-marking and at which packets are dropped or 566 significantly delayed. 568 We do not make explicit assumptions on how many PCN-flows are in each 569 ingress-egress-aggregate. Performance evaluation work may clarify 570 whether it is necessary to make any additional assumption on 571 aggregation at the ingress-egress-aggregate level. 573 5.4. Assumption 4: Emergency use out of scope 575 PCN-flows may have different precedence, but the applicability of the 576 PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc) is out 577 of scope for consideration by the PCN WG. 579 6. High-level functional architecture 581 The high-level approach is to split functionality between: 583 o PCN-interior-nodes 'inside' the PCN-domain, which monitor their 584 own state of pre-congestion and mark PCN-packets as appropriate. 585 They are not flow-aware, nor aware of ingress-egress-aggregates. 586 The functionality is also done by PCN-ingress-nodes for their 587 outgoing interfaces (ie those 'inside' the PCN-domain). 589 o PCN-boundary-nodes at the edge of the PCN-domain, which control 590 admission of new PCN-flows and termination of existing PCN-flows, 591 based on information from PCN-interior-nodes. This information is 592 in the form of the PCN-marked data packets (which are intercepted 593 by the PCN-egress-nodes) and not signalling messages. Generally 594 PCN-ingress-nodes are flow-aware. 596 The aim of this split is to keep the bulk of the network simple, 597 scalable and robust, whilst confining policy, application-level and 598 security interactions to the edge of the PCN-domain. For example the 599 lack of flow awareness means that the PCN-interior-nodes don't care 600 about the flow information associated with PCN-packets, nor do the 601 PCN-boundary-nodes care about which PCN-interior-nodes its ingress- 602 egress-aggregates traverse. 604 In order to generate information about the current state of the PCN- 605 domain, each PCN-node PCN-marks packets if it is "pre-congested". 606 Exactly when a PCN-node decides if it is "pre-congested" (the 607 algorithm) and exactly how packets are "PCN-marked" (the encoding) 608 will be defined in separate standards-track documents, but at a high 609 level it is as follows: 611 o the algorithms: a PCN-node meters the amount of PCN-traffic on 612 each one of its outgoing (or incoming) links. The measurement is 613 made as an aggregate of all PCN-packets, and not per flow. There 614 are two algorithms, one for threshold-marking and one for excess- 615 traffic-marking. 617 o the encoding(s): a PCN-node PCN-marks a PCN-packet by modifying a 618 combination of the DSCP and ECN fields. In the "baseline" 619 encoding [I-D.moncaster-pcn-baseline-encoding], the ECN field is 620 set to 11 and the DSCP is not altered. Extension encodings may be 621 defined that, at most, use a second DSCP (eg as in 622 [I-D.moncaster-pcn-3-state-encoding]) and/or set the ECN field to 623 values other than 11 (eg as in [I-D.menth-pcn-psdm-encoding]). 625 In a PCN-domain the operator may have two or three encoding states 626 available. The baseline encoding provides two encoding states (not 627 PCN-marked, PCN-marked), whilst extended encodings can provide three 628 encoding states (not PCN-marked, threshold-marked, excess-traffic- 629 marked). 631 The PCN-boundary-nodes monitor the PCN-marked packets in order to 632 extract information about the current state of the PCN-domain. Based 633 on this monitoring, a distributed decision is made about whether to 634 admit a prospective new flow or whether to terminate existing 635 flow(s). Sections 7.4 and 7.5 mention various possibilities for how 636 the functionality could be distributed. 638 PCN-marking needs to be configured on all (potentially pre-congested) 639 links in the PCN-domain to ensure that the PCN mechanisms protect all 640 links. The actual functionality can be configured on the outgoing or 641 incoming interfaces of PCN-nodes - or one algorithm could be 642 configured on the outgoing interface and the other on the incoming 643 interface. The important point is that a consistent choice is made 644 across the PCN-domain to ensure that the PCN mechanisms protect all 645 links. See [I-D.eardley-pcn-marking-behaviour] for further 646 discussion. 648 The objective of the threshold-marking algorithm is to threshold-mark 649 all PCN-packets whenever the rate of PCN-packets is greater than some 650 configured rate, the PCN-threshold-rate. The objective of the 651 excess-traffic-marking algorithm is to excess-traffic-mark PCN- 652 packets at a rate equal to the difference between the bit rate of 653 PCN-packets and some configured rate, the PCN-excess-rate. Note that 654 this description reflects the overall intent of the algorithm rather 655 than its instantaneous behaviour, since the rate measured at a 656 particular moment depends on the detailed algorithm, its 657 implementation, and the traffic's variance as well as its rate (eg 658 marking may well continue after a recent overload even after the 659 instantaneous rate has dropped). The algorithms are specified in 660 [I-D.eardley-pcn-marking-behaviour]. 662 All the presently proposed admission and termination approaches are 663 detailed and compared in [I-D.charny-pcn-comparison] and [Menth08]. 664 The discussion below is just a brief summary. It initially assumes 665 there are three encoding states available. 667 6.1. Flow admission 669 The objective of PCN's flow admission control mechanism is to limit 670 the PCN-traffic on each link in the PCN-domain to *roughly* its PCN- 671 threshold-rate, by admitting or blocking prospective new flows, in 672 order to protect the QoS of existing PCN-flows. The PCN-threshold- 673 rate is a parameter that can be configured by the operator and will 674 be set lower than the traffic rate at which the link becomes 675 congested and the node drops packets. 677 Exactly how the admission control decision is made will be defined 678 separately in informational documents. At a high level two 679 approaches are proposed (others might be possible): 681 o the PCN-egress-node measures (possibly as a moving average) the 682 fraction of the PCN-traffic that is threshold-marked. The 683 fraction is measured for a specific ingress-egress-aggregate. If 684 the fraction is below a threshold value then the new flow is 685 admitted, and if the fraction is above the threshold value then it 686 is blocked. In [I-D.eardley-pcn-architecture] the fraction is 687 measured as an EWMA (exponentially weighted moving average) and 688 termed the "congestion level estimate". 690 o the PCN-egress-node monitors PCN-traffic and if it receives one 691 (or several) threshold-marked packets, then the new flow is 692 blocked, otherwise it is admitted. One possibility may be to 693 react to the marking state of an initial flow set-up packet (eg 694 RSVP PATH). Another is that after one (or several) threshold- 695 marks then all flows are blocked until after a specific period of 696 no congestion. 698 Note that the admission control decision is made for a particular 699 pair of PCN-boundary-nodes. So it is quite possible for a new flow 700 to be admitted between one pair of PCN-boundary-nodes, whilst at the 701 same time another admission request is blocked between a different 702 pair of PCN-boundary-nodes. 704 6.2. Flow termination 706 The objective of PCN's flow termination mechanism is to limit the 707 PCN-traffic on each link to *roughly* its PCN-excess-rate, by 708 terminating some existing PCN-flows, in order to protect the QoS of 709 the remaining PCN-flows. The PCN-excess-rate is a parameter that can 710 be configured by the operator and may be set lower than the traffic 711 rate at which the link becomes congested and the node drops packets. 713 Exactly how the flow termination decision is made will be defined 714 separately in informational documents. At a high level several 715 approaches are proposed (others might be possible): 717 o In one approach the PCN-egress-node measures the rate of PCN- 718 traffic that is not excess-traffic-marked, which is the amount of 719 PCN-traffic that can actually be supported, and communicates this 720 to the PCN-ingress-node. Also the PCN-ingress-node measures the 721 rate of PCN-traffic that is destined for this specific PCN-egress- 722 node, and hence it can calculate the excess amount that should be 723 terminated. 725 o Another approach instead measures the rate of excess-traffic- 726 marked traffic and terminates this amount of traffic. This 727 terminates less traffic than the previous bullet if some nodes are 728 dropping PCN-traffic. 730 o Another approach monitors PCN-packets and terminates some of the 731 PCN-flows that have an excess-traffic-marked packet. (If all such 732 flows were terminated, far too much traffic would be terminated, 733 so a random selection needs to be made from those with an excess- 734 traffic-marked packet, [I-D.menth-pcn-emft].) 736 Since flow termination is designed for "abnormal" circumstances, it 737 is quite likely that some PCN-nodes are congested and hence packets 738 are being dropped and/or significantly queued. The flow termination 739 mechanism must accommodate this. 741 Note also that the termination control decision is made for a 742 particular pair of PCN-boundary-nodes. So it is quite possible for 743 PCN-flows to be terminated between one pair of PCN-boundary-nodes, 744 whilst at the same time none are terminated between a different pair 745 of PCN-boundary-nodes. 747 6.3. Flow admission and/or flow termination when there are only two PCN 748 encoding states 750 If a PCN-domain has only two encoding states available (PCN-marked 751 and not PCN-marked), ie it is using the baseline encoding 752 [I-D.moncaster-pcn-baseline-encoding], then an operator has three 753 options: 755 o admission control only: PCN-marking means threshold-marking, ie 756 only the threshold-marking algorithm writes PCN-marks. Only PCN 757 admission control is available. 759 o flow termination only: PCN-marking means excess-traffic-marking, 760 ie only the excess-traffic-marking algorithm writes PCN-marks. 762 Only PCN termination control is available. 764 o both admission control and flow termination: only the excess- 765 traffic-marking algorithm writes PCN-marks, however the configured 766 rate (PCN-excess-rate) is set at the rate the admission control 767 mechanism needs to limit PCN-traffic to, as shown in Figure 2. 768 [I-D.charny-pcn-single-marking] describes how both admission 769 control and flow termination can be triggered in this case and 770 also gives some of the pros and cons of this approach. The main 771 downside is that admission control is less accurate. 773 ==Marking behaviour== ==PCN mechanisms== 774 Rate of ^ 775 PCN-traffic on | 776 bottleneck link | Terminate some 777 | Further pkts admitted flows 778 | excess-traffic-marked & 779 | Block new flows 780 | 781 | 782 U*PCN-excess-rate -|--------------------------------------------------- 783 | 784 | Some pkts Block new flows 785 | excess-traffic-marked 786 | 787 PCN-excess-rate -|--------------------------------------------------- 788 | 789 | No pkts Admit new flows 790 | PCN-marked 791 | 793 Figure 2: Schematic of how the PCN admission control and flow 794 termination mechanisms operate as the rate of PCN-traffic increases, 795 for a PCN-domain with two encoding states and using the approach of 796 [I-D.charny-pcn-single-marking]. Note: U is a global parameter for 797 all the PCN-links. 799 6.4. Information transport 801 The transport of pre-congestion information from a PCN-node to a PCN- 802 egress-node is through PCN-markings in data packet headers, ie "in- 803 band": no signalling protocol messaging is needed. Signalling is 804 needed to transport PCN-feedback-information between the PCN- 805 boundary-nodes, for example to convey the fraction of PCN-marked 806 traffic from a PCN-egress-node to the relevant PCN-ingress-node. 807 Exactly what information needs to be transported will be described in 808 the future documents about possible boundary mechanisms. The 809 signalling could be done by an extension of RSVP or NSIS, for 810 instance; protocol work will be done by the relevant WG, but for 811 example [I-D.lefaucheur-rsvp-ecn] describes the extensions needed for 812 RSVP. 814 6.5. PCN-traffic 816 The following are some high-level points about how PCN works: 818 o There needs to be a way for a PCN-node to distinguish PCN-traffic 819 from other traffic. This is through a combination of the DSCP 820 field and/or ECN field. 822 o It is not advised to have non PCN-traffic that competes for the 823 same capacity as PCN-traffic but, if there is such traffic, there 824 needs to be a mechanism to limit it. "Capacity" means the 825 forwarding bandwidth on a link; "competes" means that non PCN- 826 packets will delay PCN-packets in the queue for the link. Hence 827 more non PCN-traffic results in poorer QoS for PCN. Further, the 828 unpredictable amount of non PCN-traffic makes the PCN mechanisms 829 less accurate and so reduces PCN's ability to protect the QoS of 830 admitted PCN-flows 832 o Two examples of such non PCN-traffic (ie that competes for the 833 same capacity as PCN-traffic) are: 835 1. traffic that is priority scheduled over PCN (perhaps a particular 836 application or an operator's control messages). 838 2. traffic that is scheduled at the same priority as PCN (for 839 example if the Voice-Admit codepoint is used for PCN-traffic 840 [I-D.moncaster-pcn-baseline-encoding] and there is voice-admit 841 traffic in the PCN-domain). 843 o If there is such non PCN-traffic (ie that competes for the same 844 capacity as PCN-traffic), then PCN's mechanisms should take 845 account of it, in order to improve the accuracy of the decision 846 about whether to admit (or terminate) a PCN-flow. For example, 847 one mechanism is that such non PCN-traffic contributes to the PCN 848 meters (ie is metered by the threshold-marking and excess-traffic- 849 marking algorithms). 851 o There will be non PCN-traffic that doesn't compete for the same 852 capacity as PCN-traffic, because it is forwarded at lower 853 priority. Hence it shouldn't contribute to the PCN meters. 854 Examples are best effort and assured forwarding traffic. However, 855 a PCN-node should dedicate some capacity to lower priority traffic 856 so that it isn't starved. 858 o The document assumes that the PCN mechanisms are applied to a 859 single behaviour aggregate in the PCN-domain. However, it would 860 also be possible to apply them independently to more than one 861 behaviour aggregate, which are distinguished by DSCP. 863 6.6. Backwards compatibility 865 PCN specifies semantics for the ECN field that differ from the 866 default semantics of [RFC3168]. A particular PCN encoding scheme 867 needs to describe how it meets the guidelines of BCP 124 868 [RFC4774].BCP 124 [RFC4774] for specifying alternative semantics for 869 the ECN field. In summary the approach is to: 871 o use a DSCP to allow PCN-nodes to distinguish PCN-traffic that uses 872 the alternative ECN semantics; 874 o define these semantics for use within a controlled region, the 875 PCN-domain; 877 o take appropriate action if ECN capable, non-PCN traffic arrives at 878 a PCN-ingress-node with the DSCP used by PCN. 880 For the baseline encoding [I-D.moncaster-pcn-baseline-encoding], the 881 'appropriate action' is to block ECN-capable traffic that uses the 882 same DSCP as PCN from entering the PCN-domain directly. Blocking 883 means it is dropped or downgraded to a lower priority behaviour 884 aggregate, or alternatively such traffic may be tunnelled through the 885 PCN-domain. The reason that blocking is needed is that the PCN- 886 egress-node clears the ECN field to 00. 888 Extended encoding schemes may take different 'appropriate action'. 890 7. Detailed Functional architecture 892 This section is intended to provide a systematic summary of the new 893 functional architecture in the PCN-domain. First it describes 894 functions needed at the three specific types of PCN-node; these are 895 data plane functions and are in addition to their normal router 896 functions. Then it describes further functionality needed for both 897 flow admission control and flow termination; these are signalling and 898 decision-making functions, and there are various possibilities for 899 where the functions are physically located. The section is split 900 into: 902 1. functions needed at PCN-interior-nodes 903 2. functions needed at PCN-ingress-nodes 905 3. functions needed at PCN-egress-nodes 907 4. other functions needed for flow admission control 909 5. other functions needed for flow termination control 911 Note: Probing is covered in the Appendix. 913 The section then discusses some other detailed topics: 915 1. addressing 917 2. tunnelling 919 3. fault handling 921 7.1. PCN-interior-node functions 923 Each link of the PCN-domain is configured with the following 924 functionality: 926 o Behaviour aggregate classification - determine whether an incoming 927 packet is a PCN-packet or not. 929 o Meter - measure the 'amount of PCN-traffic'. The measurement is 930 made as an aggregate of all PCN-packets, and not per flow. 932 o PCN-mark - algorithms determine whether to PCN-mark PCN-packets 933 and what packet encoding is used. 935 The functions are defined in [I-D.eardley-pcn-marking-behaviour] and 936 the baseline encoding in [I-D.moncaster-pcn-baseline-encoding] 937 (extended encodings are to be defined in other documents). 939 7.2. PCN-ingress-node functions 941 Each ingress link of the PCN-domain is configured with the following 942 functionality: 944 o Packet classification - determine whether an incoming packet is 945 part of a previously admitted flow, by using a filter spec (eg 946 DSCP, source and destination addresses and port numbers). 948 o Traffic conditioning - police, by dropping or downgrading, any 949 packets received with a DSCP indicating PCN transport that do not 950 belong to an admitted flow. (A prospective PCN-flow that is 951 rejected could be blocked or admitted into a lower priority 952 behaviour aggregate.) Similarly, police packets that are part of 953 a previously admitted flow, to check that the flow keeps to the 954 agreed rate or flowspec (eg RFC 1633 [RFC1633] for a microflow and 955 its NSIS equivalent). 957 o PCN-colour - set the DSCP and ECN fields appropriately for the 958 PCN-domain, for example as in 959 [I-D.moncaster-pcn-baseline-encoding]. 961 o Meter - some approaches to flow termination require the PCN- 962 ingress-node to measure the (aggregate) rate of PCN-traffic 963 towards a particular PCN-egress-node. 965 The first two are policing functions, needed to make sure that PCN- 966 packets admitted into the PCN-domain belong to a flow that has been 967 admitted and to ensure that the flow keeps to the flowspec agreed (eg 968 doesn't exceed an agreed maximum rate and is inelastic traffic). 969 Installing the filter spec will typically be done by the signalling 970 protocol, as will re-installing the filter, for example after a re- 971 route that changes the PCN-ingress-node (see 972 [I-D.briscoe-tsvwg-cl-architecture] for an example using RSVP). PCN- 973 colouring allows the rest of the PCN-domain to recognise PCN-packets. 975 7.3. PCN-egress-node functions 977 Each egress link of the PCN-domain is configured with the following 978 functionality: 980 o Packet classify - determine which PCN-ingress-node a PCN-packet 981 has come from. 983 o Meter - "measure PCN-traffic" or "monitor PCN-marks". 985 o PCN-colour - for PCN-packets, set the DSCP and ECN fields to the 986 appropriate values for use outside the PCN-domain. 988 The metering functionality of course depends on whether it is 989 targeted at admission control or flow termination. Alternative 990 proposals involve the PCN-egress-node "measuring" as an aggregate (ie 991 not per flow) all PCN-packets from a particular PCN-ingress-node, or 992 "monitoring" the PCN-traffic and reacting to one (or several) PCN- 993 marked packets. For PCN-colouring, 994 [I-D.moncaster-pcn-baseline-encoding] specifies that the PCN-egress- 995 node re-sets the ECN field to 00; other encodings may define 996 different behaviour. 998 7.4. Admission control functions 1000 As well as the functions covered above, other specific admission 1001 control functions need to be performed: 1003 o Make decision about admission - based on the output of the PCN- 1004 egress-node's PCN meter function. In the case where it "measures 1005 PCN-traffic", the measured traffic on the ingress-egress-aggregate 1006 is compared with some reference level. In the case where it 1007 "monitors PCN-marks", then the decision is based on whether one 1008 (or several) packets is (are) PCN-marked or not (eg the RSVP PATH 1009 message). In either case, the admission decision also takes 1010 account of policy and application layer requirements. 1012 o Communicate decision about admission - signal the decision to the 1013 node making the admission control request (which may be outside 1014 the PCN-domain), and to the policer (PCN-ingress-node function) 1015 for enforcement of the decision. 1017 There are various possibilities for how the functionality could be 1018 distributed (we assume the operator would configure which is used): 1020 o The decision is made at the PCN-egress-node and the decision 1021 (admit or block) is signalled to the PCN-ingress-node. 1023 o The decision is recommended by the PCN-egress-node (admit or 1024 block) but the decision is definitively made by the PCN-ingress- 1025 node. The rationale is that the PCN-egress-node naturally has the 1026 necessary information about PCN-marking on the ingress-egress- 1027 aggregate, but the PCN-ingress-node is the policy enforcement 1028 point, which polices incoming traffic to ensure it is part of an 1029 admitted PCN-flow. 1031 o The decision is made at the PCN-ingress-node, which requires that 1032 the PCN-egress-node signals PCN-feedback-information to the PCN- 1033 ingress-node. For example, it could signal the current fraction 1034 of PCN-traffic that is PCN-marked. 1036 o The decision is made at a centralised node (see Appendix; beyond 1037 scope of current PCN WG charter). 1039 Note: Admission control functionality is not performed by normal PCN- 1040 interior-nodes. 1042 7.5. Flow termination functions 1044 As well as the functions covered above, other specific termination 1045 control functions need to be performed: 1047 o PCN-meter at PCN-egress-node - similarly to flow admission, there 1048 are two types of proposals: to "measure PCN-traffic" on the 1049 ingress-egress-aggregate, and to "monitor PCN-marks" and react to 1050 one (or several) PCN-marks. 1052 o (if required) PCN-meter at PCN-ingress-node - make "measurements 1053 of PCN-traffic" being sent towards a particular PCN-egress-node; 1054 again, this is done for the ingress-egress-aggregate and not per 1055 flow. 1057 o (if required) Communicate PCN-feedback-information to the node 1058 that makes the flow termination decision. For example, as in 1059 [I-D.briscoe-tsvwg-cl-architecture], communicate the PCN-egress- 1060 node's measurements to the PCN-ingress-node. 1062 o Make decision about flow termination - use the information from 1063 the PCN-meter(s) to decide which PCN-flow or PCN-flows to 1064 terminate. The decision takes account of policy and application 1065 layer requirements. 1067 o Communicate decision about flow termination - signal the decision 1068 to the node that is able to terminate the flow (which may be 1069 outside the PCN-domain), and to the policer (PCN-ingress-node 1070 function) for enforcement of the decision. 1072 There are various possibilities for how the functionality could be 1073 distributed, similar to those discussed above in the Admission 1074 control section. 1076 7.6. Addressing 1078 PCN-nodes may need to know the address of other PCN-nodes. Note: in 1079 all cases PCN-interior-nodes don't need to know the address of any 1080 other PCN-nodes (except as normal their next hop neighbours, for 1081 routing purposes). 1083 The PCN-egress-node needs to know the address of the PCN-ingress-node 1084 associated with a flow, at a minimum so that the PCN-ingress-node can 1085 be informed to enforce the admission decision (and any flow 1086 termination decision) through policing. There are various 1087 possibilities for how the PCN-egress-node can do this, ie associate 1088 the received packet to the correct ingress-egress-aggregate. It is 1089 not the intention of this document to mandate a particular mechanism. 1091 o The addressing information can be gathered from signalling. For 1092 example, regular processing of an RSVP Path message, as the PCN- 1093 ingress-node is the previous RSVP hop (PHOP) 1094 ([I-D.lefaucheur-rsvp-ecn]). Or the PCN-ingress-node could signal 1095 its address to the PCN-egress-node. 1097 o Always tunnel PCN-traffic across the PCN-domain. Then the PCN- 1098 ingress-node's address is simply the source address of the outer 1099 packet header. The PCN-ingress-node needs to learn the address of 1100 the PCN-egress-node, either by manual configuration or by one of 1101 the automated tunnel endpoint discovery mechanisms (such as 1102 signalling or probing over the data route, interrogating routing 1103 or using a centralised broker). 1105 7.7. Tunnelling 1107 Tunnels may originate and/or terminate within a PCN-domain (eg IP 1108 over IP, IP over MPLS). It is important that the PCN-marking of any 1109 packet can potentially influence PCN's flow admission control and 1110 termination - it shouldn't matter whether the packet happens to be 1111 tunnelled at the PCN-node that PCN-marks the packet, or indeed 1112 whether it's decapsulated or encapsulated by a subsequent PCN-node. 1113 This suggests that the "uniform conceptual model" described in 1114 [RFC2983] should be re-applied in the PCN context. In line with this 1115 and the approach of [RFC4303] and [I-D.briscoe-tsvwg-ecn-tunnel], the 1116 following rule is applied if encapsulation is done within the PCN- 1117 domain: 1119 o any PCN-marking is copied into the outer header 1121 Note: A tunnel will not provide this behaviour if it complies with 1122 [RFC3168] tunnelling in either mode, but it will if it complies with 1123 [RFC4301] IPSec tunnelling. 1125 Similarly, in line with the "uniform conceptual model" of [RFC2983], 1126 the "full-functionality option" of [RFC3168], and [RFC4301], the 1127 following rule is applied if decapsulation is done within the PCN- 1128 domain: 1130 o if the outer header's marking state is more severe then it is 1131 copied onto the inner header. 1133 Note: the order of increasing severity is: not PCN-marked; threshold- 1134 marking; excess-traffic-marking. 1136 An operator may wish to tunnel PCN-traffic from PCN-ingress-nodes to 1137 PCN-egress-nodes. The PCN-marks shouldn't be visible outside the 1138 PCN-domain, which can be achieved by the PCN-egress-node doing the 1139 PCN-colouring function (Section 7.3) after all the other (PCN and 1140 tunnelling) functions. The potential reasons for doing such 1141 tunnelling are: the PCN-egress-node then automatically knows the 1142 address of the relevant PCN-ingress-node for a flow; even if ECMP is 1143 running, all PCN-packets on a particular ingress-egress-aggregate 1144 follow the same path. But it also has drawbacks, for example the 1145 additional overhead in terms of bandwidth and processing, and the 1146 cost of setting up a mesh of tunnels between PCN-boundary-nodes 1147 (there is an N^2 scaling issue). 1149 Potential issues arise for a "partially PCN-capable tunnel", ie where 1150 only one tunnel endpoint is in the PCN domain: 1152 1. The tunnel originates outside a PCN-domain and ends inside it. 1153 If the packet arrives at the tunnel ingress with the same 1154 encoding as used within the PCN-domain to indicate PCN-marking, 1155 then this could lead the PCN-egress-node to falsely measure pre- 1156 congestion. 1158 2. The tunnel originates inside a PCN-domain and ends outside it. 1159 If the packet arrives at the tunnel ingress already PCN-marked, 1160 then it will still have the same encoding when it's decapsulated 1161 which could potentially confuse nodes beyond the tunnel egress. 1163 In line with the solution for partially capable DiffServ tunnels in 1164 [RFC2983], the following rules are applied: 1166 o For case (1), the tunnel egress node clears any PCN-marking on the 1167 inner header. This rule is applied before the 'copy on 1168 decapsulation' rule above. 1170 o For case (2), the tunnel ingress node clears any PCN-marking on 1171 the inner header. This rule is applied after the 'copy on 1172 encapsulation' rule above. 1174 Note that the above implies that one has to know, or determine, the 1175 characteristics of the other end of the tunnel as part of 1176 establishing it. 1178 Tunnelling constraints were a major factor in the choice of the 1179 baseline encoding. As explained in 1180 [I-D.moncaster-pcn-baseline-encoding], with current tunnelling 1181 endpoints only the 11 codepoint of the ECN field survives 1182 decapsulation, and hence the baseline encoding only uses the 11 1183 codepoint to indicate PCN-marking. Extended encoding schemes need to 1184 explain their interactions with (or assumptions about) tunnelling. A 1185 lengthy discussion of all the issues associated with layered 1186 encapsulation of congestion notification (for ECN as well as PCN) is 1187 in [I-D.briscoe-tsvwg-ecn-tunnel]. 1189 7.8. Fault handling 1191 If a PCN-interior-node (or one of its links) fails, then lower layer 1192 protection mechanisms or the regular IP routing protocol will 1193 eventually re-route around it. If the new route can carry all the 1194 admitted traffic, flows will gracefully continue. If instead this 1195 causes early warning of pre-congestion on the new route, then 1196 admission control based on pre-congestion notification will ensure 1197 new flows will not be admitted until enough existing flows have 1198 departed. Re-routing may result in heavy (pre-)congestion, when the 1199 flow termination mechanism will kick in. 1201 If a PCN-boundary-node fails then we would like the regular QoS 1202 signalling protocol to be responsible for taking appropriate action. 1203 As an example [I-D.briscoe-tsvwg-cl-architecture] considers what 1204 happens if RSVP is the QoS signalling protocol. 1206 8. Challenges 1208 Prior work on PCN and similar mechanisms has thrown up a number of 1209 considerations about PCN's design goals (things PCN should be good 1210 at) [I-D.chan-pcn-problem-statement] and some issues that have been 1211 hard to solve in a fully satisfactory manner. Taken as a whole it 1212 represents a list of trade-offs (it is unlikely that they can all be 1213 100% achieved) and perhaps as evaluation criteria to help an operator 1214 (or the IETF) decide between options. 1216 The following are open issues. They are mainly taken from 1217 [I-D.briscoe-tsvwg-cl-architecture], which also describes some 1218 possible solutions. Note that some may be considered unimportant in 1219 general or in specific deployment scenarios or by some operators. 1221 NOTE: Potential solutions are out of scope for this document. 1223 o ECMP (Equal Cost Multi-Path) Routing: The level of pre-congestion 1224 is measured on a specific ingress-egress-aggregate. However, if 1225 the PCN-domain runs ECMP, then traffic on this ingress-egress- 1226 aggregate may follow several different paths - some of the paths 1227 could be pre-congested whilst others are not. There are three 1228 potential problems: 1230 1. over-admission: a new flow is admitted (because the pre- 1231 congestion level measured by the PCN-egress-node is 1232 sufficiently diluted by unmarked packets from non-congested 1233 paths that a new flow is admitted), but its packets travel 1234 through a pre-congested PCN-node. 1236 2. under-admission: a new flow is blocked (because the pre- 1237 congestion level measured by the PCN-egress-node is 1238 sufficiently increased by PCN-marked packets from pre- 1239 congested paths that a new flow is blocked), but its packets 1240 travel along an uncongested path. 1242 3. ineffective termination: a flow is terminated, but its path 1243 doesn't travel through the (pre-)congested router(s). Since 1244 flow termination is a 'last resort', which protects the 1245 network should over-admission occur, this problem is probably 1246 more important to solve than the other two. 1248 o ECMP and signalling: It is possible that, in a PCN-domain running 1249 ECMP, the signalling packets (eg RSVP, NSIS) follow a different 1250 path than the data packets, which could matter if the signalling 1251 packets are used as probes. Whether this is an issue depends on 1252 which fields the ECMP algorithm uses; if the ECMP algorithm is 1253 restricted to the source and destination IP addresses, then it 1254 will not be an issue. ECMP and signalling interactions are a 1255 specific instance of a general issue for non-traditional routing 1256 combined with resource management along a path [Hancock]. 1258 o Tunnelling: There are scenarios where tunnelling makes it 1259 difficult to determine the path in the PCN-domain. The problem, 1260 its impact, and the potential solutions are similar to those for 1261 ECMP. 1263 o Scenarios with only one tunnel endpoint in the PCN domain may make 1264 it harder for the PCN-egress-node to gather from the signalling 1265 messages (eg RSVP, NSIS) the identity of the PCN-ingress-node. 1267 o Bi-Directional Sessions: Many applications have bi-directional 1268 sessions - hence there are two microflows that should be admitted 1269 (or terminated) as a pair - for instance a bi-directional voice 1270 call only makes sense if microflows in both directions are 1271 admitted. However, the PCN mechanisms concern admission and 1272 termination of a single flow, and coordination of the decision for 1273 both flows is a matter for the signalling protocol and out of 1274 scope of PCN. One possible example would use SIP pre-conditions. 1275 However, there are others. 1277 o Global Coordination: PCN makes its admission decision based on 1278 PCN-markings on a particular ingress-egress-aggregate. Decisions 1279 about flows through a different ingress-egress-aggregate are made 1280 independently. However, one can imagine network topologies and 1281 traffic matrices where, from a global perspective, it would be 1282 better to make a coordinated decision across all the ingress- 1283 egress-aggregates for the whole PCN-domain. For example, to block 1284 (or even terminate) flows on one ingress-egress-aggregate so that 1285 more important flows through a different ingress-egress-aggregate 1286 could be admitted. The problem may well be relatively 1287 insignificant. 1289 o Aggregate Traffic Characteristics: Even when the number of flows 1290 is stable, the traffic level through the PCN-domain will vary 1291 because the sources vary their traffic rates. PCN works best when 1292 there is not too much variability in the total traffic level at a 1293 PCN-node's interface (ie in the aggregate traffic from all 1294 sources). Too much variation means that a node may (at one 1295 moment) not be doing any PCN-marking and then (at another moment) 1296 drop packets because it is overloaded. This makes it hard to tune 1297 the admission control scheme to stop admitting new flows at the 1298 right time. Therefore the problem is more likely with fewer, 1299 burstier flows. 1301 o Flash crowds and Speed of Reaction: PCN is a measurement-based 1302 mechanism and so there is an inherent delay between packet marking 1303 by PCN-interior-nodes and any admission control reaction at PCN- 1304 boundary-nodes. For example, potentially if a big burst of 1305 admission requests occurs in a very short space of time (eg 1306 prompted by a televote), they could all get admitted before enough 1307 PCN-marks are seen to block new flows. In other words, any 1308 additional load offered within the reaction time of the mechanism 1309 must not move the PCN-domain directly from a no congestion state 1310 to overload. This 'vulnerability period' may have an impact at 1311 the signalling level, for instance QoS requests should be rate 1312 limited to bound the number of requests able to arrive within the 1313 vulnerability period. 1315 o Silent at start: after a successful admission request the source 1316 may wait some time before sending data (eg waiting for the called 1317 party to answer). Then the risk is that, in some circumstances, 1318 PCN's measurements underestimate what the pre-congestion level 1319 will be when the source does start sending data. 1321 9. Operations and Management 1323 This Section considers operations and management issues, under the 1324 FCAPS headings: OAM of Faults, Configuration, Accounting, Performance 1325 and Security. Provisioning is discussed with performance. 1327 9.1. Configuration OAM 1329 Threshold-marking and excess-traffic-marking are standardised in 1330 [I-D.eardley-pcn-marking-behaviour]. However, more diversity in PCN- 1331 boundary-node behaviours is expected, in order to interface with 1332 diverse industry architectures. It may be possible to have different 1333 PCN-boundary-node behaviours for different ingress-egress-aggregates 1334 within the same PCN-domain. 1336 A PCN marking behaviour (threshold-marking, excess-traffic-marking) 1337 is enabled on either the egress or the ingress interfaces of PCN- 1338 nodes. A consistent choice must be made across the PCN-domain to 1339 ensure that the PCN mechanisms protect all links. 1341 PCN configuration control variables fall into the following 1342 categories: 1344 o system options (enabling or disabling behaviours) 1346 o parameters (setting levels, addresses etc) 1348 One possibility is that all configurable variables sit within an SNMP 1349 management framework [RFC3411], being structured within a defined 1350 management information base (MIB) on each node, and being remotely 1351 readable and settable via a suitably secure management protocol 1352 (SNMPv3). 1354 Some configuration options and parameters have to be set once to 1355 'globally' control the whole PCN-domain. Where possible, these are 1356 identified below. This may affect operational complexity and the 1357 chances of interoperability problems between equipment from different 1358 vendors. 1360 It may be possible for an operator to configure some PCN-interior- 1361 nodes so that they don't run the PCN mechanisms, if it knows that 1362 these links will never become (pre-)congested. 1364 9.1.1. System options 1366 On PCN-interior-nodes there will be very few system options: 1368 o Whether two PCN-markings (threshold-marked and excess-traffic- 1369 marked) are enabled or only one. Typically all nodes throughout a 1370 PCN-domain will be configured the same in this respect. However, 1371 exceptions could be made. For example, if most PCN-nodes used 1372 both markings, but some legacy hardware was incapable of running 1373 two algorithms, an operator might be willing to configure these 1374 legacy nodes solely for excess-traffic-marking to enable flow 1375 termination as a back-stop. It would be sensible to place such 1376 nodes where they could be provisioned with a greater leeway over 1377 expected traffic levels. 1379 o In the case where only one PCN-marking is enabled, all nodes must 1380 be configured to generate PCN-marks from the same meter (ie either 1381 the threshold meter or the excess traffic meter). 1383 PCN-boundary-nodes (ingress and egress) will have more system 1384 options: 1386 o Which of admission and flow termination are enabled. If any PCN- 1387 interior-node is configured to generate a marking, all PCN- 1388 boundary-nodes must be able to interpret that marking (which 1389 includes understanding, in a PCN-domain that uses only one type of 1390 PCN-marking, whether they are generated by PCN-interior-node's 1391 threshold meters or the excess traffic meters). Therefore all 1392 PCN-boundary-nodes must be configured the same in this respect. 1394 o Where flow admission and termination decisions are made: at PCN- 1395 ingress-nodes or at PCN-egress-nodes (or at a centralised node, 1396 see Appendix). Theoretically, this configuration choice could be 1397 negotiated for each pair of PCN-boundary-nodes, but we cannot 1398 imagine why such complexity would be required, except perhaps in 1399 future inter-domain scenarios. 1401 o How PCN-markings are translated into admission control and flow 1402 termination decisions (see Section 6.1 and Section 6.2). 1404 PCN-egress-nodes will have further system options: 1406 o How the mapping should be established between each packet and its 1407 aggregate, eg by MPLS label, by IP packet filterspec; and how to 1408 take account of ECMP. 1410 o If an equipment vendor provides a choice, there may be options to 1411 select which smoothing algorithm to use for measurements. 1413 9.1.2. Parameters 1415 Like any DiffServ domain, every node within a PCN-domain will need to 1416 be configured with the DSCP(s) used to identify PCN-packets. On each 1417 interior link the main configuration parameters are the PCN- 1418 threshold-rate and PCN-excess-rate. A larger PCN-threshold-rate 1419 enables more PCN-traffic to be admitted on a link, hence improving 1420 capacity utilisation. A PCN-excess-rate set further above the PCN- 1421 threshold-rate allows greater increases in traffic (whether due to 1422 natural fluctuations or some unexpected event) before any flows are 1423 terminated, ie minimises the chances of unnecessarily triggering the 1424 termination mechanism. For instance, an operator may want to design 1425 their network so that it can cope with a failure of any single PCN- 1426 node without terminating any flows. 1428 Setting these rates on first deployment of PCN will be very similar 1429 to the traditional process for sizing an admission controlled 1430 network, depending on: the operator's requirements for minimising 1431 flow blocking (grade of service), the expected PCN traffic load on 1432 each link and its statistical characteristics (the traffic matrix), 1433 contingency for re-routing the PCN traffic matrix in the event of 1434 single or multiple failures, and the expected load from other classes 1435 relative to link capacities [Menth]. But once a domain is in 1436 operation, a PCN design goal is to be able to determine growth in 1437 these configured rates much more simply, by monitoring PCN-marking 1438 rates from actual rather than expected traffic (see Section 9.2 on 1439 Performance & Provisioning). 1441 Operators may also wish to configure a rate greater than the PCN- 1442 excess-rate that is the absolute maximum rate that a link allows for 1443 PCN-traffic. This may simply be the physical link rate, but some 1444 operators may wish to configure a logical limit to prevent starvation 1445 of other traffic classes during any brief period after PCN-traffic 1446 exceeds the PCN-excess-rate but before flow termination brings it 1447 back below this rate. 1449 Threshold-marking requires a threshold token bucket depth to be 1450 configured, excess-traffic-marking needs a value for the MTU (maximum 1451 size of a PCN-packet on the link) and both require setting a maximum 1452 size of their token buckets. It will be preferable for there to be 1453 rules to set defaults for these parameters, but then allow operators 1454 to change them, for instance if average traffic characteristics 1455 change over time. 1457 The PCN-egress-node may allow configuration of the following: 1459 o how it smooths metering of PCN-markings (eg EWMA parameters) 1461 Whichever node makes admission and flow termination decisions will 1462 contain algorithms for converting PCN-marking levels into admission 1463 or flow termination decisions. These will also require configurable 1464 parameters, for instance: 1466 o an admission control algorithm that is based on the fraction of 1467 marked packets will at least require a marking threshold setting 1468 above which it denies admission to new flows; 1470 o flow termination algorithms will probably require a parameter to 1471 delay termination of any flows until it is more certain that an 1472 anomalous event is not transient; 1474 o a parameter to control the trade-off between how quickly excess 1475 flows are terminated, and over-termination. 1477 One particular proposal, [I-D.charny-pcn-single-marking] would 1478 require a global parameter to be defined on all PCN-nodes, but only 1479 needs one PCN marking rate to be configured on each link. The global 1480 parameter is a scaling factor between admission and termination (the 1481 PCN-traffic rate on a link up to which flows are admitted vs the rate 1482 above which flows are terminated). [I-D.charny-pcn-single-marking] 1483 discusses in full the impact of this particular proposal on the 1484 operation of PCN. 1486 9.2. Performance & Provisioning OAM 1488 Monitoring of performance factors measurable from *outside* the PCN 1489 domain will be no different with PCN than with any other packet-based 1490 flow admission control system, both at the flow level (blocking 1491 probability etc) and the packet level (jitter [RFC3393], [Y.1541], 1492 loss rate [RFC4656], mean opinion score [P.800], etc). The 1493 difference is that PCN is intentionally designed to indicate 1494 *internally* which exact resource(s) are the cause of performance 1495 problems and by how much. 1497 Even better, PCN indicates which resources will probably cause 1498 problems if they are not upgraded soon. This can be achieved by the 1499 management system monitoring the total amount (in bytes) of PCN- 1500 marking generated by each queue over a period. Given possible long 1501 provisioning lead times, pre-congestion volume is the best metric to 1502 reveal whether sufficient persistent demand has occurred to warrant 1503 an upgrade. Because, even before utilisation becomes problematic, 1504 the statistical variability of traffic will cause occasional bursts 1505 of pre-congestion. This 'early warning system' decouples the process 1506 of adding customers from the provisioning process. This should cut 1507 the time to add a customer when compared against admission control 1508 provided over native DiffServ [RFC2998], because it saves having to 1509 re-run the capacity planning process before adding each customer. 1511 Alternatively, before triggering an upgrade, the long term pre- 1512 congestion volume on each link can be used to balance traffic load 1513 across the PCN-domain by adjusting the link weights of the routing 1514 system. When an upgrade to a link's configured PCN-rates is 1515 required, it may also be necessary to upgrade the physical capacity 1516 available to other classes. But usually there will be sufficient 1517 physical capacity for the upgrade to go ahead as a simple 1518 configuration change. Alternatively, [Songhurst] has proposed an 1519 adaptive rather than preconfigured system, where the configured PCN- 1520 threshold-rate is replaced with a high and low water mark and the 1521 marking algorithm automatically optimises how physical capacity is 1522 shared using the relative loads from PCN and other traffic classes. 1524 All the above processes require just three extra counters associated 1525 with each PCN queue: threshold-markings, excess-traffic-markings and 1526 drop. Every time a PCN packet is marked or dropped its size in bytes 1527 should be added to the appropriate counter. Then the management 1528 system can read the counters at any time and subtract a previous 1529 reading to establish the incremental volume of each type of 1530 (pre-)congestion. Readings should be taken frequently, so that 1531 anomalous events (eg re-routes) can be separated from regular 1532 fluctuating demand if required. 1534 9.3. Accounting OAM 1536 Accounting is only done at trust boundaries so it is out of scope of 1537 the initial charter of the PCN WG, which is confined to intra-domain 1538 issues. Use of PCN internal to a domain makes no difference to the 1539 flow signalling events crossing trust boundaries outside the PCN- 1540 domain, which are typically used for accounting. 1542 9.4. Fault OAM 1544 Fault OAM is about preventing faults, telling the management system 1545 (or manual operator) that the system has recovered (or not) from a 1546 failure, and about maintaining information to aid fault diagnosis. 1548 Admission blocking and particularly flow termination mechanisms 1549 should rarely be needed in practice. It would be unfortunate if they 1550 didn't work after an option had been accidentally disabled. 1551 Therefore it will be necessary to regularly test that the live system 1552 works as intended (devising a meaningful test is left as an exercise 1553 for the operator). 1555 Section 7 describes how the PCN architecture has been designed to 1556 ensure admitted flows continue gracefully after recovering 1557 automatically from link or node failures. The need to record and 1558 monitor re-routing events affecting signalling is unchanged by the 1559 addition of PCN to a DiffServ domain. Similarly, re-routing events 1560 within the PCN-domain will be recorded and monitored just as they 1561 would be without PCN. 1563 PCN-marking does make it possible to record 'near-misses'. For 1564 instance, at the PCN-egress-node a 'reporting threshold' could be set 1565 to monitor how often - and for how long - the system comes close to 1566 triggering flow blocking without actually doing so. Similarly, 1567 bursts of flow termination marking could be recorded even if they are 1568 not sufficiently sustained to trigger flow termination. Such 1569 statistics could be correlated with per-queue counts of marking 1570 volume (Section 9.2) to upgrade resources in danger of causing 1571 service degradation, or to trigger manual tracing of intermittent 1572 incipient errors that would otherwise have gone unnoticed. 1574 Finally, of course, many faults are caused by failings in the 1575 management process ('human error'): a wrongly configured address in a 1576 node, a wrong address given in a signalling protocol, a wrongly 1577 configured parameter in a queueing algorithm, a node set into a 1578 different mode from other nodes, and so on. Generally, a clean 1579 design with few configurable options ensures this class of faults can 1580 be traced more easily and prevented more often. Sound management 1581 practice at run-time also helps. For instance: a management system 1582 should be used that constrains configuration changes within system 1583 rules (eg preventing an option setting inconsistent with other 1584 nodes); configuration options should also be recorded in an offline 1585 database; and regular automatic consistency checks between live 1586 systems and the database should be performed. PCN adds nothing 1587 specific to this class of problems. 1589 9.5. Security OAM 1591 Security OAM is about using secure operational practices as well as 1592 being able to track security breaches or near-misses at run-time. 1593 PCN adds few specifics to the general good practice required in this 1594 field [RFC4778], other than those below. The correct functions of 1595 the system should be monitored (Section 9.2) in multiple independent 1596 ways and correlated to detect possible security breaches. Persistent 1597 (pre-)congestion marking should raise an alarm (both on the node 1598 doing the marking and on the PCN-egress-node metering it). 1599 Similarly, persistently poor external QoS metrics such as jitter or 1600 MOS should raise an alarm. The following are examples of symptoms 1601 that may be the result of innocent faults, rather than attacks, but 1602 until diagnosed they should be logged and trigger a security alarm: 1604 o Anomalous patterns of non-conforming incoming signals and packets 1605 rejected at the PCN-ingress-nodes (eg packets already marked PCN- 1606 capable, or traffic persistently starving token bucket policers). 1608 o PCN-capable packets arriving at a PCN-egress-node with no 1609 associated state for mapping them to a valid ingress-egress- 1610 aggregate. 1612 o A PCN-ingress-node receiving feedback signals about the pre- 1613 congestion level on a non-existent aggregate, or that are 1614 inconsistent with other signals (eg unexpected sequence numbers, 1615 inconsistent addressing, conflicting reports of the pre-congestion 1616 level, etc). 1618 o Pre-congestion marking arriving at an PCN-egress-node with 1619 (pre-)congestion markings focused on particular flows, rather than 1620 randomly distributed throughout the aggregate. 1622 10. IANA Considerations 1624 This memo includes no request to IANA. 1626 11. Security considerations 1628 Security considerations essentially come from the Trust Assumption 1629 (Section 5.1), ie that all PCN-nodes are PCN-enabled and are trusted 1630 for truthful PCN-marking and transport. PCN splits functionality 1631 between PCN-interior-nodes and PCN-boundary-nodes, and the security 1632 considerations are somewhat different for each, mainly because PCN- 1633 boundary-nodes are flow-aware and PCN-interior-nodes are not. 1635 o Because the PCN-boundary-nodes are flow-aware, they are trusted to 1636 use that awareness correctly. The degree of trust required 1637 depends on the kinds of decisions they have to make and the kinds 1638 of information they need to make them. There is nothing specific 1639 to PCN. 1641 o the PCN-ingress-nodes police packets to ensure a PCN-flow sticks 1642 within its agreed limit, and to ensure that only PCN-flows that 1643 have been admitted contribute PCN-traffic into the PCN-domain. 1644 The policer must drop (or perhaps downgrade to a different DSCP) 1645 any PCN-packets received that are outside this remit. This is 1646 similar to the existing IntServ behaviour. Between them the PCN- 1647 boundary-nodes must encircle the PCN-domain, otherwise PCN-packets 1648 could enter the PCN-domain without being subject to admission 1649 control, which would potentially destroy the QoS of existing 1650 flows. 1652 o PCN-interior-nodes are not flow-aware. This prevents some 1653 security attacks where an attacker targets specific flows in the 1654 data plane - for instance for DoS or eavesdropping. 1656 o The PCN-boundary-nodes rely on correct PCN-marking by the PCN- 1657 interior-nodes. For instance a rogue PCN-interior-node could PCN- 1658 mark all packets so that no flows were admitted. Another 1659 possibility is that it doesn't PCN-mark any packets, even when it 1660 is pre-congested. More subtly, the rogue PCN-interior-node could 1661 perform these attacks selectively on particular flows, or it could 1662 PCN-mark the correct fraction overall, but carefully choose which 1663 flows it marked. 1665 o the PCN-boundary-nodes should be able to deal with DoS attacks and 1666 state exhaustion attacks based on fast changes in per flow 1667 signalling. 1669 o the signalling between the PCN-boundary-nodes must be protected 1670 from attacks. For example the recipient needs to validate that 1671 the message is indeed from the node that claims to have sent it. 1672 Possible measures include digest authentication and protection 1673 against replay and man-in-the-middle attacks. For the specific 1674 protocol RSVP, hop-by-hop authentication is in [RFC2747], and 1675 [I-D.behringer-tsvwg-rsvp-security-groupkeying] may also be 1676 useful. 1678 Operational security advice is given in Section 9.5. 1680 12. Conclusions 1682 The document describes a general architecture for flow admission and 1683 termination based on pre-congestion information in order to protect 1684 the quality of service of established inelastic flows within a single 1685 DiffServ domain. The main topic is the functional architecture. It 1686 also mentions other topics like the assumptions and open issues. 1688 13. Acknowledgements 1690 This document is a revised version of [I-D.eardley-pcn-architecture]. 1691 Its authors were: P. Eardley, J. Babiarz, K. Chan, A. Charny, R. 1692 Geib, G. Karagiannis, M. Menth, T. Tsou. They are therefore 1693 contributors to this document. 1695 Thanks to those who have made comments on 1696 [I-D.eardley-pcn-architecture] and on earlier versions of this draft: 1697 Lachlan Andrew, Joe Babiarz, Fred Baker, David Black, Steven Blake, 1698 Bob Briscoe, Jason Canon, Ken Carlberg, Anna Charny, Joachim 1699 Charzinski, Andras Csaszar, Lars Eggert, Ruediger Geib, Wei Gengyu, 1700 Robert Hancock, Fortune Huang, Christian Hublet, Ingemar Johansson, 1701 Georgios Karagiannis, Hein Mekkes, Michael Menth, Toby Moncaster, Ben 1702 Strulo, Tom Taylor, Hannes Tschofenig, Tina Tsou, Lars Westberg, 1703 Magnus Westerlund, Delei Yu. Thanks to Bob Briscoe who extensively 1704 revised the Operations and Management section. 1706 This document is the result of discussions in the PCN WG and 1707 forerunner activity in the TSVWG. A number of previous drafts were 1708 presented to TSVWG: [I-D.chan-pcn-problem-statement], 1709 [I-D.briscoe-tsvwg-cl-architecture], [I-D.briscoe-tsvwg-cl-phb], 1710 [I-D.charny-pcn-single-marking], [I-D.babiarz-pcn-sip-cap], 1711 [I-D.lefaucheur-rsvp-ecn], [I-D.westberg-pcn-load-control]. The 1712 authors of them were: B, Briscoe, P. Eardley, D. Songhurst, F. Le 1713 Faucheur, A. Charny, J. Babiarz, K. Chan, S. Dudley, G. Karagiannis, 1714 A. Bader, L. Westberg, J. Zhang, V. Liatsos, X-G. Liu, A. Bhargava. 1716 14. Comments Solicited 1718 Comments and questions are encouraged and very welcome. They can be 1719 addressed to the IETF PCN working group mailing list . 1721 15. Changes 1723 15.1. Changes from -05 to -06 1725 Minor clarifications throughout, the least insignificant are as 1726 follows: 1728 o Section 1: added to the list of encoding states in an 'extended' 1729 scheme: "or perhaps further encoding states as suggested in [LC- 1730 PCN]" 1732 o Section 2: added definition for PCN-colouring (to clarify that the 1733 term is used consistently differently from 'PCN-marking') 1735 o Section 6.1 and 6.2: added "(others might be possible)" before the 1736 list of high level approaches for making flow admission 1737 (termination) decisions. 1739 o Section 6.2: corrected a significant typo in 2nd bullet (more -> 1740 less) 1742 o Section 6.3: corrected a couple of significant typos in Figure 2 1744 o Section 6.5 (PCN-traffic) re-written for clarity. Non PCN-traffic 1745 contributing to PCN meters is now given as an example (there may 1746 be cases where don't need to meter it). 1748 o Section 7.7: added to the text about encapsulation being done 1749 within the PCN-domain: "Note: A tunnel will not provide this 1750 behaviour if it complies with [RFC3168] tunnelling in either mode, 1751 but it will if it complies with [RFC4301] IPSec tunnelling." 1753 o Section 7.7: added mention of [RFC4301] to the text about 1754 decapsulation being done within the PCN-domain. 1756 o Section 8: deleted the text about design goals, since this is 1757 already covered adequately earlier eg in S3. 1759 o Section 11: replaced the last sentence of bullet 1 by "There is 1760 nothing specific to PCN." 1762 o Appendix: added to open issues: possibility of automatically and 1763 periodically probing. 1765 o References: Split out Normative references (RFC2474 & RFC3246). 1767 15.2. Changes from -04 to -05 1769 Minor nits removed as follows: 1771 o Further minor changes to reflect that baseline encoding is 1772 consensus, standards track document, whilst there can be 1773 (experimental track) encoding extensions 1775 o Traffic conditioning updated to reflect discussions in Dublin, 1776 mainly that PCN-interior-nodes don't police PCN-traffic (so 1777 deleted bullet in S7.1) and that it is not advised to have non 1778 PCN-traffic that shares the same capacity (on a link) as PCN- 1779 traffic (so added bullet in S6.5) 1781 o Probing moved into Appendix A and deleted the 'third viewpoint' 1782 (admission control based on the marking of a single packet like an 1783 RSVP PATH message) - since this isn't really probing, and in any 1784 case is already mentioned in S6.1. 1786 o Minor changes to S9 Operations and management - mainly to reflect 1787 that consensus on marking behaviour has simplified things so eg 1788 there are fewer parameters to configure. 1790 o A few terminology-related errors expunged, and two pictures added 1791 to help. 1793 o Re-phrased the claim about the natural decision point in S7.4 1795 o Clarified that extended encoding schemes need to explain their 1796 interactions with (or assumptions about) tunnelling (S7.7) and how 1797 they meet the guidelines of BCP124 (S6.6) 1799 o Corrected the third bullet in S6.2 (to reflect consensus about 1800 PCN-marking) 1802 15.3. Changes from -03 to -04 1804 o Minor changes throughout to reflect the consensus call about PCN- 1805 marking (as reflected in [I-D.eardley-pcn-marking-behaviour]). 1807 o Minor changes throughout to reflect the current decisions about 1808 encoding (as reflected in [I-D.moncaster-pcn-baseline-encoding]and 1809 [I-D.moncaster-pcn-3-state-encoding]). 1811 o Introduction: re-structured to create new sections on Benefits, 1812 Deployment scenarios and Assumptions. 1814 o Introduction: Added pointers to other PCN documents. 1816 o Terminology: changed PCN-lower-rate to PCN-threshold-rate and PCN- 1817 upper-rate to PCN-excess-rate; excess-rate-marking to excess- 1818 traffic-marking. 1820 o Benefits: added bullet about SRLGs. 1822 o Deployment scenarios: new section combining material from various 1823 places within the document. 1825 o S6 (high level functional architecture): re-structured and edited 1826 to improve clarity, and reflect the latest PCN-marking and 1827 encoding drafts. 1829 o S6.4: added claim that the most natural place to make an admission 1830 decision is a PCN-egress-node. 1832 o S6.5: updated the bullet about non-PCN-traffic that uses the same 1833 DSCP as PCN-traffic. 1835 o S6.6: added a section about backwards compatibility with respect 1836 to [RFC4774]. 1838 o Appendix A: added bullet about end-to-end PCN. 1840 o Probing: moved to Appendix B. 1842 o Other minor clarifications, typos etc. 1844 15.4. Changes from -02 to -03 1846 o Abstract: Clarified by removing the term 'aggregated'. Follow-up 1847 clarifications later in draft: S1: expanded PCN-egress-nodes 1848 bullet to mention case where the PCN-feedback-information is about 1849 one (or a few) PCN-marks, rather than aggregated information; S3 1850 clarified PCN-meter; S5 minor changes; conclusion. 1852 o S1: added a paragraph about how the PCN-domain looks to the 1853 outside world (essentially it looks like a DiffServ domain). 1855 o S2: tweaked the PCN-traffic terminology bullet: changed PCN 1856 traffic classes to PCN behaviour aggregates, to be more in line 1857 with traditional DiffServ jargon (-> follow-up changes later in 1858 draft); included a definition of PCN-flows (and corrected a couple 1859 of 'PCN microflows' to 'PCN-flows' later in draft) 1861 o S3.5: added possibility of downgrading to best effort, where PCN- 1862 packets arrive at PCN-ingress-node already ECN marked (CE or ECN 1863 nonce) 1865 o S4: added note about whether talk about PCN operating on an 1866 interface or on a link. In S8.1 (OAM) mentioned that PCN 1867 functionality needs to be configured consistently on either the 1868 ingress or the egress interface of PCN-nodes in a PCN-domain. 1870 o S5.2: clarified that signalling protocol installs flow filter spec 1871 at PCN-ingress-node (& updates after possible re-route) 1873 o S5.6: addressing: clarified 1875 o S5.7: added tunnelling issue of N^2 scaling if you set up a mesh 1876 of tunnels between PCN-boundary-nodes 1878 o S7.3: Clarified the "third viewpoint" of probing (always probe). 1880 o S8.1: clarified that SNMP is only an example; added note that an 1881 operator may be able to not run PCN on some PCN-interior-nodes, if 1882 it knows that these links will never become (pre-)congested; added 1883 note that it may be possible to have different PCN-boundary-node 1884 behaviours for different ingress-egress-aggregates within the same 1885 PCN-domain. 1887 o Appendix: Created an Appendix about "Possible work items beyond 1888 the scope of the current PCN WG Charter". Material moved from 1889 near start of S3 and elsewhere throughout draft. Moved text about 1890 centralised decision node to Appendix. 1892 o Other minor clarifications. 1894 15.5. Changes from -01 to -02 1896 o S1: Benefits: provisioning bullet extended to stress that PCN does 1897 not use RFC2475-style traffic conditioning. 1899 o S1: Deployment models: mentioned, as variant of PCN-domain 1900 extending to end nodes, that may extend to LAN edge switch. 1902 o S3.1: Trust Assumption: added note about not needing PCN-marking 1903 capability if known that an interface cannot become pre-congested. 1905 o S4: now divided into sub-sections 1906 o S4.1: Admission control: added second proposed method for how to 1907 decide to block new flows (PCN-egress-node receives one (or 1908 several) PCN-marked packets). 1910 o S5: Probing sub-section removed. Material now in new S7. 1912 o S5.6: Addressing: clarified how PCN-ingress-node can discover 1913 address of PCN-egress-node 1915 o S5.6: Addressing: centralised node case, added that PCN-ingress- 1916 node may need to know address of PCN-egress-node 1918 o S5.8: Tunnelling: added case of "partially PCN-capable tunnel" and 1919 degraded bullet on this in S6 (Open Issues) 1921 o S7: Probing: new section. Much more comprehensive than old S5.5. 1923 o S8: Operations and Management: substantially revised. 1925 o other minor changes not affecting semantics 1927 15.6. Changes from -00 to -01 1929 In addition to clarifications and nit squashing, the main changes 1930 are: 1932 o S1: Benefits: added one about provisioning (and contrast with 1933 DiffServ SLAs) 1935 o S1: Benefits: clarified that the objective is also to stop PCN- 1936 packets being significantly delayed (previously only mentioned not 1937 dropping packets) 1939 o S1: Deployment models: added one where policing is done at ingress 1940 of access network and not at ingress of PCN-domain (assume trust 1941 between networks) 1943 o S1: Deployment models: corrected MPLS-TE to MPLS 1945 o S2: Terminology: adjusted definition of PCN-domain 1947 o S3.5: Other assumptions: corrected, so that two assumptions (PCN- 1948 nodes not performing ECN and PCN-ingress-node discarding arriving 1949 CE packet) only apply if the PCN WG decides to encode PCN-marking 1950 in the ECN-field. 1952 o S4 & S5: changed PCN-marking algorithm to marking behaviour 1953 o S4: clarified that PCN-interior-node functionality applies for 1954 each outgoing interface, and added clarification: "The 1955 functionality is also done by PCN-ingress-nodes for their outgoing 1956 interfaces (ie those 'inside' the PCN-domain)." 1958 o S4 (near end): altered to say that a PCN-node "should" dedicate 1959 some capacity to lower priority traffic so that it isn't starved 1960 (was "may") 1962 o S5: clarified to say that PCN functionality is done on an 1963 'interface' (rather than on a 'link') 1965 o S5.2: deleted erroneous mention of service level agreement 1967 o S5.5: Probing: re-written, especially to distinguish probing to 1968 test the ingress-egress-aggregate from probing to test a 1969 particular ECMP path. 1971 o S5.7: Addressing: added mention of probing; added that in the case 1972 where traffic is always tunnelled across the PCN-domain, add a 1973 note that he PCN-ingress-node needs to know the address of the 1974 PCN-egress-node. 1976 o S5.8: Tunnelling: re-written, especially to provide a clearer 1977 description of copying on tunnel entry/exit, by adding explanation 1978 (keeping tunnel encaps/decaps and PCN-marking orthogonal), 1979 deleting one bullet ("if the inner header's marking state is more 1980 sever then it is preserved" - shouldn't happen), and better 1981 referencing of other IETF documents. 1983 o S6: Open issues: stressed that "NOTE: Potential solutions are out 1984 of scope for this document" and edited a couple of sentences that 1985 were close to solution space. 1987 o S6: Open issues: added one about scenarios with only one tunnel 1988 endpoint in the PCN domain . 1990 o S6: Open issues: ECMP: added under-admission as another potential 1991 risk 1993 o S6: Open issues: added one about "Silent at start" 1995 o S10: Conclusions: a small conclusions section added 1997 16. Appendix: Possible work items beyond the scope of the current PCN 1998 WG charter 2000 This section mentions some topics that are outside the PCN WG's 2001 current charter, but which have been mentioned as areas of interest. 2002 They might be work items for: the PCN WG after a future re- 2003 chartering; some other IETF WG; another standards body; an operator- 2004 specific usage that is not standardised. 2006 NOTE: it should be crystal clear that this section discusses 2007 possibilities only. 2009 The first set of possibilities relate to the restrictions on scope 2010 imposed by the PCN WG charter (see Section 5): 2012 o a single PCN-domain encompasses several autonomous systems that do 2013 not trust each other, perhaps by using a mechanism like re-ECN, 2014 [I-D.briscoe-re-pcn-border-cheat]. 2016 o not all the nodes run PCN. For example, the PCN-domain is a 2017 multi-site enterprise network. The sites are connected by a VPN 2018 tunnel; although PCN doesn't operate inside the tunnel, the PCN 2019 mechanisms still work properly because the of the good QoS on the 2020 virtual link (the tunnel). Another example is that PCN is 2021 deployed on the general Internet (ie widely but not universally 2022 deployed). 2024 o applying the PCN mechanisms to other types of traffic, ie beyond 2025 inelastic traffic. For instance, applying the PCN mechanisms to 2026 traffic scheduled with the Assured Forwarding per-hop behaviour. 2027 One example could be flow-rate adaptation by elastic applications 2028 that adapt according to the pre-congestion information. 2030 o the aggregation assumption doesn't hold, because the link capacity 2031 is too low. Measurement-based admission control is less accurate, 2032 with a greater risk of over-admission for instance. 2034 o the applicability of PCN mechanisms for emergency use (911, GETS, 2035 WPS, MLPP, etc.) 2037 Other possibilities include: 2039 o Probing. This is discussed in Section 16.1 below. 2041 o The PCN-domain extends to the end users. The scenario is 2042 described in [I-D.babiarz-pcn-sip-cap]. The end users need to be 2043 trusted to do their own policing. This scenario is in the scope 2044 of the PCN WG charter if there is sufficient traffic for the 2045 aggregation assumption to hold. A variant is that the PCN-domain 2046 extends out as far as the LAN edge switch. 2048 o indicating pre-congestion through signalling messages rather than 2049 in-band (in the form of PCN-marked packets) 2051 o the decision-making functionality is at a centralised node rather 2052 than at the PCN-boundary-nodes. This requires that the PCN- 2053 egress-node signals PCN-feedback-information to the centralised 2054 node, and that the centralised node signals to the PCN-ingress- 2055 node the decision about admission (or termination). It may need 2056 the centralised node and the PCN-boundary-nodes to be configured 2057 with each other's addresses. The centralised case is described 2058 further in [I-D.tsou-pcn-racf-applic]. 2060 o Signalling extensions for specific protocols (eg RSVP, NSIS). For 2061 example: the details of how the signalling protocol installs the 2062 flowspec at the PCN-ingress-node for an admitted PCN-flow; and how 2063 the signalling protocol carries the PCN-feedback-information. 2064 Perhaps also for other functions such as: coping with failure of a 2065 PCN-boundary-node ([I-D.briscoe-tsvwg-cl-architecture] considers 2066 what happens if RSVP is the QoS signalling protocol); establishing 2067 a tunnel across the PCN-domain if it is necessary to carry ECN 2068 marks transparently. 2070 o Policing by the PCN-ingress-node may not be needed if the PCN- 2071 domain can trust that the upstream network has already policed the 2072 traffic on its behalf. 2074 o PCN for Pseudowire: PCN may be used as a congestion avoidance 2075 mechanism for edge to edge pseudowire emulations 2076 [I-D.ietf-pwe3-congestion-frmwk]. 2078 o PCN for MPLS: [RFC3270] defines how to support the DiffServ 2079 architecture in MPLS networks (Multi-protocol label switching). 2080 [RFC5129] describes how to add PCN for admission control of 2081 microflows into a set of MPLS aggregates. PCN-marking is done in 2082 MPLS's EXP field (which [I-D.ietf-mpls-cosfield-def] proposes to 2083 re-name to the Class of Service (CoS) field). 2085 o PCN for Ethernet: Similarly, it may be possible to extend PCN into 2086 Ethernet networks, where PCN-marking is done in the Ethernet 2087 header. NOTE: Specific consideration of this extension is outside 2088 the IETF's remit. 2090 16.1. Probing 2092 16.1.1. Introduction 2094 Probing is a potential mechanism to assist admission control. 2096 PCN's admission control, as described so far, is essentially a 2097 reactive mechanism where the PCN-egress-node monitors the pre- 2098 congestion level for traffic from each PCN-ingress-node; if the level 2099 rises then it blocks new flows on that ingress-egress-aggregate. 2100 However, it's possible that an ingress-egress-aggregate carries no 2101 traffic, and so the PCN-egress-node can't make an admission decision 2102 using the usual method described earlier. 2104 One approach is to be "optimistic" and simply admit the new flow. 2105 However it's possible to envisage a scenario where the traffic levels 2106 on other ingress-egress-aggregates are already so high that they're 2107 blocking new PCN-flows, and admitting a new flow onto this 'empty' 2108 ingress-egress-aggregate adds extra traffic onto a link that is 2109 already pre-congested - which may 'tip the balance' so that PCN's 2110 flow termination mechanism is activated or some packets are dropped. 2111 This risk could be lessened by configuring on each link sufficient 2112 'safety margin' above the PCN-threshold-rate. 2114 An alternative approach is to make PCN a more proactive mechanism. 2115 The PCN-ingress-node explicitly determines, before admitting the 2116 prospective new flow, whether the ingress-egress-aggregate can 2117 support it. This can be seen as a "pessimistic" approach, in 2118 contrast to the "optimism" of the approach above. It involves 2119 probing: a PCN-ingress-node generates and sends probe packets in 2120 order to test the pre-congestion level that the flow would 2121 experience. 2123 One possibility is that a probe packet is just a dummy data packet, 2124 generated by the PCN-ingress-node and addressed to the PCN-egress- 2125 node. 2127 16.1.2. Probing functions 2129 The probing functions are: 2131 o Make decision that probing is needed. As described above, this is 2132 when the ingress-egress-aggregate (or the ECMP path - Section 8) 2133 carries no PCN-traffic. An alternative is always to probe, ie 2134 probe before admitting every PCN-flow. 2136 o (if required) Communicate the request that probing is needed - the 2137 PCN-egress-node signals to the PCN-ingress-node that probing is 2138 needed 2140 o (if required) Generate probe traffic - the PCN-ingress-node 2141 generates the probe traffic. The appropriate number (or rate) of 2142 probe packets will depend on the PCN-marking algorithm; for 2143 example an excess-traffic-marking algorithm generates fewer PCN- 2144 marks than a threshold-marking algorithm, and so will need more 2145 probe packets. 2147 o Forward probe packets - as far as PCN-interior-nodes are 2148 concerned, probe packets are handled the same as (ordinary data) 2149 PCN-packets, in terms of routing, scheduling and PCN-marking. 2151 o Consume probe packets - the PCN-egress-node consumes probe packets 2152 to ensure that they don't travel beyond the PCN-domain. 2154 16.1.3. Discussion of rationale for probing, its downsides and open 2155 issues 2157 It is an unresolved question whether probing is really needed, but 2158 two viewpoints have been put forward as to why it is useful. The 2159 first is perhaps the most obvious: there is no PCN-traffic on the 2160 ingress-egress-aggregate. The second assumes that multipath routing 2161 ECMP is running in the PCN-domain. We now consider each in turn. 2163 The first viewpoint assumes the following: 2165 o There is no PCN-traffic on the ingress-egress-aggregate (so a 2166 normal admission decision cannot be made). 2168 o Simply admitting the new flow has a significant risk of leading to 2169 overload: packets dropped or flows terminated. 2171 On the former bullet, [PCN-email-traffic-empty-aggregates] suggests 2172 that, during the future busy hour of a national network with about 2173 100 PCN-boundary-nodes, there are likely to be significant numbers of 2174 aggregates with very few flows under nearly all circumstances. 2176 The latter bullet could occur if new flows start on many of the empty 2177 ingress-egress-aggregates, which together overload a link in the PCN- 2178 domain. To be a problem this would probably have to happen in a 2179 short time period (flash crowd) because, after the reaction time of 2180 the system, other (non-empty) ingress-egress-aggregates that pass 2181 through the link will measure pre-congestion and so block new flows. 2182 Also, flows naturally end anyway. 2184 The downsides of probing for this viewpoint are: 2186 o Probing adds delay to the admission control process. 2188 o Sufficient probing traffic has to be generated to test the pre- 2189 congestion level of the ingress-egress-aggregate. But the probing 2190 traffic itself may cause pre-congestion, causing other PCN-flows 2191 to be blocked or even terminated - and in the flash crowd scenario 2192 there will be probing on many ingress-egress-aggregates. 2194 The second viewpoint applies in the case where there is multipath 2195 routing (ECMP) in the PCN-domain. Note that ECMP is often used on 2196 core networks. There are two possibilities: 2198 (1) If admission control is based on measurements of the ingress- 2199 egress-aggregate, then the viewpoint that probing is useful assumes: 2201 o there's a significant chance that the traffic is unevenly balanced 2202 across the ECMP paths, and hence there's a significant risk of 2203 admitting a flow that should be blocked (because it follows an 2204 ECMP path that is pre-congested) or blocking a flow that should be 2205 admitted. 2207 o Note: [PCN-email-ECMP] suggests unbalanced traffic is quite 2208 possible, even with quite a large number of flows on a PCN-link 2209 (eg 1000) when Assumption 3 (aggregation) is likely to be 2210 satisfied. 2212 (2) If admission control is based on measurements of pre-congestion 2213 on specific ECMP paths, then the viewpoint that probing is useful 2214 assumes: 2216 o There is no PCN-traffic on the ECMP path on which to base an 2217 admission decision. 2219 o Simply admitting the new flow has a significant risk of leading to 2220 overload. 2222 o The PCN-egress-node can match a packet to an ECMP path. 2224 o Note: This is similar to the first viewpoint and so similarly 2225 could occur in a flash crowd if a new flow starts more-or-less 2226 simultaneously on many of the empty ECMP paths. Because there are 2227 several (sometimes many) ECMP paths between each pair of PCN- 2228 boundary-nodes, it's presumably more likely that an ECMP path is 2229 'empty' than an ingress-egress-aggregate is. To constrain the 2230 number of ECMP paths, a few tunnels could be set-up between each 2231 pair of PCN-boundary-nodes. Tunnelling also solves the issue in 2232 the bullet immediately above (which is otherwise hard because an 2233 ECMP routing decision is made independently on each node). 2235 The downsides of probing for this viewpoint are: 2237 o Probing adds delay to the admission control process. 2239 o Sufficient probing traffic has to be generated to test the pre- 2240 congestion level of the ECMP path. But there's the risk that the 2241 probing traffic itself may cause pre-congestion, causing other 2242 PCN-flows to be blocked or even terminated. 2244 o The PCN-egress-node needs to consume the probe packets to ensure 2245 they don't travel beyond the PCN-domain, since they might confuse 2246 the destination end node. This is non-trivial, since probe 2247 packets are addressed to the destination end node, in order to 2248 test the relevant ECMP path (ie they are not addressed to the PCN- 2249 egress-node, unlike the first viewpoint above). 2251 The open issues associated with this viewpoint include: 2253 o What rate and pattern of probe packets does the PCN-ingress-node 2254 need to generate, so that there's enough traffic to make the 2255 admission decision? 2257 o What difficulty does the delay (whilst probing is done), and 2258 possible packet drops, cause applications? 2260 o Can the delay be alleviated by automatically and periodically 2261 probing on the ingress-egress-aggregate? Or does this add too 2262 much overhead? 2264 o Are there other ways of dealing with the flash crowd scenario? 2265 For instance, by limiting the rate at which new flows are 2266 admitted; or perhaps by a PCN-egress-node blocking new flows on 2267 its empty ingress-egress-aggregates when its non-empty ones are 2268 pre-congested. 2270 o (Second viewpoint only) How does the PCN-egress-node disambiguate 2271 probe packets from data packets (so it can consume the former)? 2272 The PCN-egress-node must match the characteristic setting of 2273 particular bits in the probe packet's header or body - but these 2274 bits must not be used by any PCN-interior-node's ECMP algorithm. 2275 In the general case this isn't possible, but it should be possible 2276 for a typical ECMP algorithm (which examines: the source and 2277 destination IP addresses and port numbers, the protocol ID, and 2278 the DSCP). 2280 17. References 2281 17.1. Normative References 2283 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 2284 "Definition of the Differentiated Services Field (DS 2285 Field) in the IPv4 and IPv6 Headers", RFC 2474, 2286 December 1998. 2288 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 2289 J., Courtney, W., Davari, S., Firoiu, V., and D. 2290 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 2291 Behavior)", RFC 3246, March 2002. 2293 17.2. Informative References 2295 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 2296 Services in the Internet Architecture: an Overview", 2297 RFC 1633, June 1994. 2299 [RFC2211] Wroclawski, J., "Specification of the Controlled-Load 2300 Network Element Service", RFC 2211, September 1997. 2302 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 2303 and W. Weiss, "An Architecture for Differentiated 2304 Services", RFC 2475, December 1998. 2306 [RFC2747] Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic 2307 Authentication", RFC 2747, January 2000. 2309 [RFC2983] Black, D., "Differentiated Services and Tunnels", 2310 RFC 2983, October 2000. 2312 [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., 2313 Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. 2314 Felstaine, "A Framework for Integrated Services Operation 2315 over Diffserv Networks", RFC 2998, November 2000. 2317 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 2318 of Explicit Congestion Notification (ECN) to IP", 2319 RFC 3168, September 2001. 2321 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 2322 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 2323 Protocol Label Switching (MPLS) Support of Differentiated 2324 Services", RFC 3270, May 2002. 2326 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 2327 Metric for IP Performance Metrics (IPPM)", RFC 3393, 2328 November 2002. 2330 [RFC3411] Harrington, D., Presuhn, R., and B. Wijnen, "An 2331 Architecture for Describing Simple Network Management 2332 Protocol (SNMP) Management Frameworks", STD 62, RFC 3411, 2333 December 2002. 2335 [RFC4216] Zhang, R. and J. Vasseur, "MPLS Inter-Autonomous System 2336 (AS) Traffic Engineering (TE) Requirements", RFC 4216, 2337 November 2005. 2339 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 2340 Internet Protocol", RFC 4301, December 2005. 2342 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 2343 RFC 4303, December 2005. 2345 [RFC4594] Babiarz, J., Chan, K., and F. Baker, "Configuration 2346 Guidelines for DiffServ Service Classes", RFC 4594, 2347 August 2006. 2349 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 2350 Zekauskas, "A One-way Active Measurement Protocol 2351 (OWAMP)", RFC 4656, September 2006. 2353 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 2354 Explicit Congestion Notification (ECN) Field", BCP 124, 2355 RFC 4774, November 2006. 2357 [RFC4778] Kaeo, M., "Operational Security Current Practices in 2358 Internet Service Provider Environments", RFC 4778, 2359 January 2007. 2361 [RFC5129] "Explicit Congestion Marking in MPLS", RFC 5129, 2362 January 2008. 2364 [P.800] "Methods for subjective determination of transmission 2365 quality", ITU-T Recommendation P.800, August 1996. 2367 [Y.1541] "Network Performance Objectives for IP-based Services", 2368 ITU-T Recommendation Y.1541, February 2006. 2370 [I-D.ietf-mpls-cosfield-def] 2371 ""EXP field" renamed to "CoS Field"", July 2008, . 2374 [I-D.ietf-pwe3-congestion-frmwk] 2375 "Pseudowire Congestion Control Framework", May 2008, . 2379 [I-D.babiarz-pcn-sip-cap] 2380 Babiarz, J., "SIP Controlled Admission and Preemption", 2381 draft-babiarz-pcn-sip-cap-00 (work in progress), 2382 October 2006. 2384 [I-D.behringer-tsvwg-rsvp-security-groupkeying] 2385 "Applicability of Keying Methods for RSVP Security", 2386 November 2007, . 2389 [I-D.briscoe-re-pcn-border-cheat] 2390 "Emulating Border Flow Policing using Re-ECN on Bulk 2391 Data", February 2008, . 2394 [I-D.briscoe-tsvwg-cl-architecture] 2395 Briscoe, B., "An edge-to-edge Deployment Model for Pre- 2396 Congestion Notification: Admission Control over a 2397 DiffServ Region", draft-briscoe-tsvwg-cl-architecture-04 2398 (work in progress), October 2006. 2400 [I-D.briscoe-tsvwg-cl-phb] 2401 Briscoe, B., "Pre-Congestion Notification marking", 2402 draft-briscoe-tsvwg-cl-phb-03 (work in progress), 2403 October 2006. 2405 [I-D.briscoe-tsvwg-ecn-tunnel] 2406 "Layered Encapsulation of Congestion Notification", 2407 July 2008, . 2410 [I-D.chan-pcn-problem-statement] 2411 Chan, K., "Pre-Congestion Notification Problem Statement", 2412 draft-chan-pcn-problem-statement-01 (work in progress), 2413 October 2006. 2415 [I-D.charny-pcn-comparison] 2416 "Pre-Congestion Notification Using Single Marking for 2417 Admission and Termination", November 2007, . 2421 [I-D.charny-pcn-single-marking] 2422 "Pre-Congestion Notification Using Single Marking for 2423 Admission and Termination", November 2007, . 2427 [I-D.eardley-pcn-architecture] 2428 "Pre-Congestion Notification Architecture", June 2007, . 2432 [I-D.eardley-pcn-marking-behaviour] 2433 "Marking behaviour of PCN-nodes", June 2008, . 2437 [I-D.lefaucheur-rsvp-ecn] 2438 Faucheur, F., "RSVP Extensions for Admission Control over 2439 Diffserv using Pre-congestion Notification (PCN)", 2440 draft-lefaucheur-rsvp-ecn-01 (work in progress), 2441 June 2006. 2443 [I-D.menth-pcn-emft] 2444 "Edge-Assisted Marked Flow Termination", February 2008, 2445 . 2447 [I-D.menth-pcn-psdm-encoding] 2448 "PCN Encoding for Packet-Specific Dual Marking (PSDM)", 2449 July 2008, . 2452 [I-D.moncaster-pcn-3-state-encoding] 2453 "A three state extended PCN encoding scheme", June 2008, < 2454 http://www.ietf.org/internet-drafts/ 2455 draft-moncaster-pcn-3-state-encoding-00.txt>. 2457 [I-D.moncaster-pcn-baseline-encoding] 2458 "Baseline Encoding and Transport of Pre-Congestion 2459 Information", July 2008, . 2463 [I-D.tsou-pcn-racf-applic] 2464 "Applicability Statement for the Use of Pre-Congestion 2465 Notification in a Resource-Controlled Network", 2466 February 2008, . 2469 [I-D.sarker-pcn-ecn-pcn-usecases] 2470 "Usecases and Benefits of end to end ECN support in PCN 2471 Domains", May 2008, . 2474 [I-D.westberg-pcn-load-control] 2475 "LC-PCN: The Load Control PCN Solution", July 2008, . 2479 [Hancock] "Slide 14 of 'NSIS: An Outline Framework for QoS 2480 Signalling'", May 2002, . 2483 [Iyer] "An approach to alleviate link overload as observed on an 2484 IP backbone", IEEE INFOCOM , 2003, 2485 . 2487 [Menth] "PCN-Based Resilient Network Admission Control: The Impact 2488 of a Single Bit"", Technical Report , 2007, . 2492 [Menth08] "PCN-Based Admission Control and Flow Termination", 2008, 2493 . 2496 [PCN-email-ECMP] 2497 "Email to PCN WG mailing list", November 2007, . 2500 [PCN-email-SRLG] 2501 "Email to PCN WG mailing list", March 2008, . 2504 [PCN-email-traffic-empty-aggregates] 2505 "Email to PCN WG mailing list", October 2007, . 2508 [Songhurst] 2509 "Guaranteed QoS Synthesis for Admission Control with 2510 Shared Capacity", BT Technical Report TR-CXR9-2006-001, 2511 Feburary 2006, . 2514 [Style] "Guardian Style", Note: This document uses the 2515 abbreviations 'ie' and 'eg' (not 'i.e.' and 'e.g.'), as in 2516 many style guides, eg, 2007, 2517 . 2519 Author's Address 2521 Philip Eardley 2522 BT 2523 B54/77, Sirius House Adastral Park Martlesham Heath 2524 Ipswich, Suffolk IP5 3RE 2525 United Kingdom 2527 Email: philip.eardley@bt.com 2529 Full Copyright Statement 2531 Copyright (C) The IETF Trust (2008). 2533 This document is subject to the rights, licenses and restrictions 2534 contained in BCP 78, and except as set forth therein, the authors 2535 retain all their rights. 2537 This document and the information contained herein are provided on an 2538 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2539 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2540 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2541 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2542 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2543 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2545 Intellectual Property 2547 The IETF takes no position regarding the validity or scope of any 2548 Intellectual Property Rights or other rights that might be claimed to 2549 pertain to the implementation or use of the technology described in 2550 this document or the extent to which any license under such rights 2551 might or might not be available; nor does it represent that it has 2552 made any independent effort to identify any such rights. Information 2553 on the procedures with respect to rights in RFC documents can be 2554 found in BCP 78 and BCP 79. 2556 Copies of IPR disclosures made to the IETF Secretariat and any 2557 assurances of licenses to be made available, or the result of an 2558 attempt made to obtain a general license or permission for the use of 2559 such proprietary rights by implementers or users of this 2560 specification can be obtained from the IETF on-line IPR repository at 2561 http://www.ietf.org/ipr. 2563 The IETF invites any interested party to bring to its attention any 2564 copyrights, patents or patent applications, or other proprietary 2565 rights that may cover technology that may be required to implement 2566 this standard. Please address the information to the IETF at 2567 ietf-ipr@ietf.org. 2569 Acknowledgment 2571 Funding for the RFC Editor function is provided by the IETF 2572 Administrative Support Activity (IASA).