idnits 2.17.1 draft-ietf-pcn-architecture-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2556. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2567. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2574. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2580. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 30, 2008) is 5687 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-08) exists of draft-ietf-mpls-cosfield-def-04 == Outdated reference: A later version (-02) exists of draft-ietf-pwe3-congestion-frmwk-01 == Outdated reference: A later version (-03) exists of draft-briscoe-re-pcn-border-cheat-02 == Outdated reference: A later version (-01) exists of draft-moncaster-pcn-3-state-encoding-00 == Outdated reference: A later version (-02) exists of draft-sarker-pcn-ecn-pcn-usecases-01 == Outdated reference: A later version (-01) exists of draft-tsou-pcn-racf-applic-00 == Outdated reference: A later version (-05) exists of draft-westberg-pcn-load-control-04 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion and Pre-Congestion Philip. Eardley (Editor) 3 Notification Working Group BT 4 Internet-Draft September 30, 2008 5 Intended status: Informational 6 Expires: April 3, 2009 8 Pre-Congestion Notification (PCN) Architecture 9 draft-ietf-pcn-architecture-07 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 3, 2009. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 This document describes a general architecture for flow admission and 43 termination based on pre-congestion information in order to protect 44 the quality of service of established inelastic flows within a single 45 DiffServ domain. 47 Status 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 3. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 54 4. Deployment scenarios . . . . . . . . . . . . . . . . . . . . . 8 55 5. Assumptions and constraints on scope . . . . . . . . . . . . . 12 56 5.1. Assumption 1: Trust and support of PCN - controlled 57 environment . . . . . . . . . . . . . . . . . . . . . . . 12 58 5.2. Assumption 2: Real-time applications . . . . . . . . . . . 13 59 5.3. Assumption 3: Many flows and additional load . . . . . . . 13 60 5.4. Assumption 4: Emergency use out of scope . . . . . . . . . 14 61 6. High-level functional architecture . . . . . . . . . . . . . . 14 62 6.1. Flow admission . . . . . . . . . . . . . . . . . . . . . . 16 63 6.2. Flow termination . . . . . . . . . . . . . . . . . . . . . 16 64 6.3. Flow admission and/or flow termination when there are 65 only two PCN encoding states . . . . . . . . . . . . . . . 17 66 6.4. Information transport . . . . . . . . . . . . . . . . . . 18 67 6.5. PCN-traffic . . . . . . . . . . . . . . . . . . . . . . . 19 68 6.6. Backwards compatibility . . . . . . . . . . . . . . . . . 20 69 7. Detailed Functional architecture . . . . . . . . . . . . . . . 20 70 7.1. PCN-interior-node functions . . . . . . . . . . . . . . . 21 71 7.2. PCN-ingress-node functions . . . . . . . . . . . . . . . . 21 72 7.3. PCN-egress-node functions . . . . . . . . . . . . . . . . 22 73 7.4. Admission control functions . . . . . . . . . . . . . . . 23 74 7.5. Flow termination functions . . . . . . . . . . . . . . . . 23 75 7.6. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 24 76 7.7. Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . 25 77 7.8. Fault handling . . . . . . . . . . . . . . . . . . . . . . 27 78 8. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 27 79 9. Operations and Management . . . . . . . . . . . . . . . . . . 29 80 9.1. Configuration OAM . . . . . . . . . . . . . . . . . . . . 29 81 9.1.1. System options . . . . . . . . . . . . . . . . . . . . 30 82 9.1.2. Parameters . . . . . . . . . . . . . . . . . . . . . . 31 83 9.2. Performance & Provisioning OAM . . . . . . . . . . . . . . 33 84 9.3. Accounting OAM . . . . . . . . . . . . . . . . . . . . . . 34 85 9.4. Fault OAM . . . . . . . . . . . . . . . . . . . . . . . . 34 86 9.5. Security OAM . . . . . . . . . . . . . . . . . . . . . . . 35 87 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 88 11. Security considerations . . . . . . . . . . . . . . . . . . . 36 89 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 90 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 91 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38 92 15. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 93 15.1. Changes from -06 to -07 . . . . . . . . . . . . . . . . . 38 94 15.2. Changes from -05 to -06 . . . . . . . . . . . . . . . . . 38 95 15.3. Changes from -04 to -05 . . . . . . . . . . . . . . . . . 39 96 15.4. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 39 97 15.5. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 40 98 15.6. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 41 99 15.7. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 42 100 16. Appendix: Possible work items beyond the scope of the 101 current PCN WG charter . . . . . . . . . . . . . . . . . . . . 44 102 16.1. Probing . . . . . . . . . . . . . . . . . . . . . . . . . 46 103 16.1.1. Introduction . . . . . . . . . . . . . . . . . . . . . 46 104 16.1.2. Probing functions . . . . . . . . . . . . . . . . . . 46 105 16.1.3. Discussion of rationale for probing, its downsides 106 and open issues . . . . . . . . . . . . . . . . . . . 47 107 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 50 108 17.1. Normative References . . . . . . . . . . . . . . . . . . . 50 109 17.2. Informative References . . . . . . . . . . . . . . . . . . 50 110 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 55 111 Intellectual Property and Copyright Statements . . . . . . . . . . 56 113 1. Introduction 115 The purpose of this document is to describe a general architecture 116 for flow admission and termination based on (pre-) congestion 117 information in order to protect the quality of service of flows 118 within a DiffServ domain [RFC2475]. This document defines an 119 architecture for implementing two mechanisms to protect the quality 120 of service of established inelastic flows within a single DiffServ 121 domain, where all boundary and interior nodes are PCN-enabled and are 122 trusted for correct PCN operation. Flow admission control determines 123 whether a new flow should be admitted, in order to protect the QoS of 124 existing PCN-flows in normal circumstances. However, in abnormal 125 circumstances, for instance a disaster affecting multiple nodes and 126 causing traffic re-routes, then the QoS on existing PCN-flows may 127 degrade even though care was exercised when admitting those flows. 128 Therefore we also propose a mechanism for flow termination, which 129 removes enough traffic in order to protect the QoS of the remaining 130 PCN-flows. 132 As a fundamental building block to enable these two mechanisms, PCN- 133 interior-nodes generate, encode and transport pre-congestion 134 information towards the PCN-egress-nodes. Two rates, a PCN- 135 threshold-rate and a PCN-excess-rate, are associated with each link 136 of the PCN-domain. Each rate is used by a marking behaviour that 137 determines how and when PCN-packets are marked, and how the markings 138 are encoded in packet headers. Overall the aim is to enable PCN- 139 nodes to give an "early warning" of potential congestion before there 140 is any significant build-up of PCN-packets in the queue. 142 PCN-boundary-nodes convert measurements of these PCN-markings into 143 decisions about flow admission and termination. In a PCN-domain with 144 both threshold marking and excess traffic marking enabled, then the 145 admission control mechanism limits the PCN-traffic on each link to 146 *roughly* its PCN-threshold-rate and the flow termination mechanism 147 limits the PCN-traffic on each link to *roughly* its PCN-excess-rate. 148 Other scenarios are discussed later. 150 The behaviour of PCN-interior-nodes is standardised in other 151 documents, which are summarised in this document: 153 o Marking behaviour: threshold marking and excess traffic marking 154 [I-D.eardley-pcn-marking-behaviour]. Threshold marking marks all 155 PCN-packets if the PCN traffic rate is greater than a first 156 configured rate, "PCN-threshold-rate". Excess traffic marking 157 marks a proportion of PCN-packets, such that the amount marked 158 equals the traffic rate in excess of a second configured rate, 159 "PCN-excess-rate". 161 o Encoding: a combination of the DSCP field and ECN field in the IP 162 header indicates that a packet is a PCN-packet and whether it is 163 PCN-marked. The "baseline" encoding is standardised in 164 [I-D.moncaster-pcn-baseline-encoding], which standardises two PCN 165 encoding states (PCN-marked and not PCN-marked), whilst 166 (experimental) extensions to the baseline encoding can provide 167 three encoding states (threshold-marked, excess-traffic-marked, 168 not PCN-marked, or perhaps further encoding states as suggested in 169 [I-D.westberg-pcn-load-control]). PCN encoding uses PCN therefore 170 defines semantics for the ECN field different from the default 171 semantics of [RFC3168], and so its encoding needs to meet the 172 guidelines of BCP 124, [RFC4774]. 174 The behaviour of PCN-boundary-nodes is described in Informational 175 documents. Several possibilities are outlined in this document; 176 detailed descriptions and comparisons are in 177 [I-D.charny-pcn-comparison] and [Menth08]. 179 This document describes the PCN architecture at a high level (Section 180 6) and in more detail (Section 7). It also defines some terminology 181 and outlines some benefits, deployment scenarios, and assumptions of 182 PCN (Sections 2-5). Finally it outlines some challenges, operations 183 and management, and security considerations, and some potential 184 future work items (Sections 8, 9, 11 and Appendix). 186 2. Terminology 188 o PCN-domain: a PCN-capable domain; a contiguous set of PCN-enabled 189 nodes that perform DiffServ scheduling [RFC2474]; the complete set 190 of PCN-nodes whose PCN-marking can in principle influence 191 decisions about flow admission and termination for the PCN-domain, 192 including the PCN-egress-nodes, which measure these PCN-marks. 194 o PCN-boundary-node: a PCN-node that connects one PCN-domain to a 195 node either in another PCN-domain or in a non PCN-domain. 197 o PCN-interior-node: a node in a PCN-domain that is not a PCN- 198 boundary-node. 200 o PCN-node: a PCN-boundary-node or a PCN-interior-node 202 o PCN-egress-node: a PCN-boundary-node in its role in handling 203 traffic as it leaves a PCN-domain. 205 o PCN-ingress-node: a PCN-boundary-node in its role in handling 206 traffic as it enters a PCN-domain. 208 o PCN-traffic, PCN-packets, PCN-BA: a PCN-domain carries traffic of 209 different DiffServ behaviour aggregates (BAs) [RFC2474]. The 210 PCN-BA uses the PCN mechanisms to carry PCN-traffic and the 211 corresponding packets are PCN-packets. The same network will 212 carry traffic of other DiffServ BAs. The PCN-BA is distinguished 213 by a combination of the DiffServ codepoint (DSCP) and ECN fields. 215 o PCN-flow: the unit of PCN-traffic that the PCN-boundary-node 216 admits (or terminates); the unit could be a single microflow (as 217 defined in [RFC2474]) or some identifiable collection of 218 microflows. 220 o Ingress-egress-aggregate: The collection of PCN-packets from all 221 PCN-flows that travel in one direction between a specific pair of 222 PCN-boundary-nodes. 224 o PCN-threshold-rate: a reference rate configured for each link in 225 the PCN-domain, which is lower than the PCN-excess-rate. It is 226 used by a marking behaviour that determines whether a packet 227 should be PCN-marked with a first encoding, "threshold-marked". 229 o PCN-excess-rate: a reference rate configured for each link in the 230 PCN-domain, which is higher than the PCN-threshold-rate. It is 231 used by a marking behaviour that determines whether a packet 232 should be PCN-marked with a second encoding, "excess-traffic- 233 marked". 235 o Threshold-marking: a PCN-marking behaviour with the objective that 236 all PCN-traffic is marked if the PCN-traffic exceeds the PCN- 237 threshold-rate. 239 o Excess-traffic-marking: a PCN-marking behaviour with the objective 240 that the amount of PCN-traffic that is PCN-marked is equal to the 241 amount that exceeds the PCN-excess-rate. 243 o Pre-congestion: a condition of a link within a PCN-domain such 244 that the PCN-node performs PCN-marking, in order to provide an 245 "early warning" of potential congestion before there is any 246 significant build-up of PCN-packets in the real queue. (Hence, by 247 analogy with ECN we call our mechanism Pre-Congestion 248 Notification.) 250 o PCN-marking: the process of setting the header in a PCN-packet 251 based on defined rules, in reaction to pre-congestion; either 252 threshold-marking or excess-traffic-marking. 254 o PCN-colouring: the process of setting the header in a PCN-packet 255 by a PCN-boundary-node; performed by a PCN-ingress-node so that 256 PCN-nodes can easily identify PCN-packets; performed by a PCN- 257 egress-node so that the header is appropriate for nodes beyond the 258 PCN-domain. 260 o PCN-feedback-information: information signalled by a PCN-egress- 261 node to a PCN-ingress-node (or a central control node), which is 262 needed for the flow admission and flow termination mechanisms. 264 3. Benefits 266 We believe that the key benefits of the PCN mechanisms described in 267 this document are that they are simple, scalable, and robust because: 269 o Per flow state is only required at the PCN-ingress-nodes 270 ("stateless core"). This is required for policing purposes (to 271 prevent non-admitted PCN traffic from entering the PCN-domain) and 272 so on. It is not generally required that other network entities 273 are aware of individual flows (although they may be in particular 274 deployment scenarios). 276 o Admission control is resilient: with PCN QoS is decoupled from the 277 routing system. Hence in general admitted flows can survive 278 capacity, routing or topology changes without additional 279 signalling. The PCN-threshold-rate on each link can be chosen 280 small enough that admitted traffic can still be carried after a 281 rerouting in most failure cases [Menth]. This is an important 282 feature as QoS violations in core networks due to link failures 283 are more likely than QoS violations due to increased traffic 284 volume [Iyer]. 286 o The PCN-marking behaviours only operate on the overall PCN-traffic 287 on the link, not per flow. 289 o The information of these measurements is signalled to the PCN- 290 egress-nodes by the PCN-marks in the packet headers, ie [Style] 291 "in-band". No additional signalling protocol is required for 292 transporting the PCN-marks. Therefore no secure binding is 293 required between data packets and separate congestion messages. 295 o The PCN-egress-nodes make separate measurements, operating on the 296 aggregate PCN-traffic from each PCN-ingress-node, ie not per flow. 297 Similarly, signalling by the PCN-egress-node of PCN-feedback- 298 information (which is used for flow admission and termination 299 decisions) is at the granularity of the ingress-egress-aggregate. 300 An alternative approach is that the PCN-egress-nodes monitor the 301 PCN-traffic and signal PCN-feedback-information (which is used for 302 flow admission and termination decisions) at the granularity of 303 one (or a few) PCN-marks. 305 o The admitted PCN-load is controlled dynamically. Therefore it 306 adapts as the traffic matrix changes, and also if the network 307 topology changes (eg after a link failure). Hence an operator can 308 be less conservative when deploying network capacity, and less 309 accurate in their prediction of the PCN-traffic matrix. 311 o The termination mechanism complements admission control. It 312 allows the network to recover from sudden unexpected surges of 313 PCN-traffic on some links, thus restoring QoS to the remaining 314 flows. Such scenarios are expected to be rare but not impossible. 315 They can be caused by large network failures that redirect lots of 316 admitted PCN-traffic to other links, or by malfunction of the 317 measurement-based admission control in the presence of admitted 318 flows that send for a while with an atypically low rate and then 319 increase their rates in a correlated way. 321 o Flow termination can also enable an operator to be less 322 conservative when deploying network capacity. It is an 323 alternative to running links at low utilisation in order to 324 protect against link or node failures. This is especially the 325 case with SRLGs (shared risk link groups, which are links that 326 share a resource, such as a fibre, whose failure affects all those 327 links [RFC4216]). A requirement to fully protect traffic against 328 a single SRLG failure requires low utilisation (~10%) of the link 329 bandwidth on some links before failure [PCN-email-SRLG]. 331 o The PCN-excess-rate may be set below the maximum rate that PCN- 332 traffic can be transmitted on a link, in order to trigger 333 termination of some PCN-flows before loss (or excessive delay) of 334 PCN-packets occurs, or to keep the maximum PCN-load on a link 335 below a level configured by the operator. 337 o Provisioning of the network is decoupled from the process of 338 adding new customers. By contrast, with the DiffServ architecture 339 [RFC2475] operators rely on subscription-time Service Level 340 Agreements, which statically define the parameters of the traffic 341 that will be accepted from a customer, and so the operator has to 342 run the provisioning process each time a new customer is added to 343 check that the Service Level Agreement can be fulfilled. A PCN- 344 domain doesn't need such traffic conditioning. 346 4. Deployment scenarios 348 Operators of networks will want to use the PCN mechanisms in various 349 arrangements, for instance depending on how they are performing 350 admission control outside the PCN-domain (users after all are 351 concerned about QoS end-to-end), what their particular goals and 352 assumptions are, how many PCN encoding states are available, and so 353 on. 355 From the perspective of the outside world, a PCN-domain essentially 356 looks like a DiffServ domain. PCN-traffic is either transported 357 across it transparently or policed at the PCN-ingress-node (ie 358 dropped or carried at a lower QoS). One difference is that PCN- 359 traffic has better QoS guarantees than normal DiffServ traffic, 360 because the PCN mechanisms better protect the QoS of admitted flows. 361 Another difference may occur in the rare circumstance when there is a 362 failure: on the one hand some PCN-flows may get terminated, but on 363 the other hand other flows will get their QoS restored. Non PCN- 364 traffic is treated transparently, ie the PCN-domain is a normal 365 DiffServ domain. 367 An operator may choose to deploy either admission control or flow 368 termination or both. Although designed to work together, they are 369 independent mechanisms, and the use of one does not require or 370 prevent the use of the other. 372 A PCN-domain may have three encoding states (or pedantically, an 373 operator may choose to use up three encoding states for PCN): not 374 PCN-marked, threshold-marked, excess-traffic-marked. Then both PCN 375 admission control and flow termination can be supported. As 376 illustrated in Figure 1, admission control accepts new flows until 377 the PCN-traffic rate on the bottleneck link rises above the PCN- 378 threshold-rate, whilst if necessary the flow termination mechanism 379 terminates flows down to the PCN-excess-rate on the bottleneck link. 381 ==Marking behaviour== ==PCN mechanisms== 382 Rate of ^ 383 PCN-traffic on | 384 bottleneck link | (as below and also) 385 | (as below) Drop some PCN-pkts 386 | 387 scheduler rate -|--------------------------------------------------- 388 (for PCN-traffic)| 389 | Some pkts Terminate some 390 | excess-traffic-marked admitted flows 391 | & & 392 | Rest of pkts Block new flows 393 | threshold-marked 394 | 395 PCN-excess-rate -|--------------------------------------------------- 396 | 397 | All pkts Block new flows 398 | threshold-marked 399 | 400 PCN-threshold-rate -|--------------------------------------------------- 401 | 402 | No pkts Admit new flows 403 | PCN-marked 404 | 406 Figure 1: Schematic of how the PCN admission control and flow 407 termination mechanisms operate as the rate of PCN-traffic increases, 408 for a PCN-domain with three encoding states. 410 On the other hand, a PCN-domain may have two encoding states (as in 411 [I-D.moncaster-pcn-baseline-encoding]) (or pedantically, an operator 412 may choose to use up two encoding states for PCN): not PCN-marked, 413 PCN-marked. Then there are three possibilities, as discussed in the 414 following paragraphs (see also Section 6.3). 416 First, an operator could just use PCN's admission control, solving 417 heavy congestion (caused by re-routing) by 'just waiting' - as 418 sessions end, PCN-traffic naturally reduces, and meanwhile the 419 admission control mechanism will prevent admission of new flows that 420 use the affected links. So the PCN-domain will naturally return to 421 normal operation, but with reduced capacity. The drawback of this 422 approach would be that, until sufficient sessions have ended to 423 relieve the congestion, all PCN-flows as well as lower priority 424 services will be adversely affected. 426 Second, an operator could just rely for admission control on 427 statically provisioned capacity per PCN-ingress-node (regardless of 428 the PCN-egress-node of a flow), as is typical in the hose model of 429 the DiffServ architecture [RFC2475]. Such traffic conditioning 430 agreements can lead to focused overload: many flows happen to focus 431 on a particular link and then all flows through the congested link 432 fail catastrophically. PCN's flow termination mechanism could then 433 be used to counteract such a problem. 435 Third, both admission control and flow termination can be triggered 436 from the single type of PCN-marking; the main downside is that 437 admission control is less accurate [I-D.charny-pcn-single-marking]. 439 Within the PCN-domain there is some flexibility about how the 440 decision making functionality is distributed. These possibilities 441 are outlined in Section 7.4 and also discussed elsewhere, such as in 442 [Menth08]. 444 The flow admission and termination decisions need to be enforced 445 through per flow policing by the PCN-ingress-nodes. If there are 446 several PCN-domains on the end-to-end path, then each needs to police 447 at its PCN-ingress-nodes. One exception is if the operator runs both 448 the access network (not a PCN-domain) and the core network (a PCN- 449 domain); per flow policing could be devolved to the access network 450 and not done at the PCN-ingress-node. Note: to aid readability, the 451 rest of this draft assumes that policing is done by the PCN-ingress- 452 nodes. 454 PCN admission control has to fit with the overall approach to 455 admission control. For instance [I-D.briscoe-tsvwg-cl-architecture] 456 describes the case where RSVP signalling runs end-to-end. The PCN- 457 domain is a single RSVP hop, ie only the PCN-boundary-nodes process 458 RSVP messages, with RSVP messages processed on each hop outside the 459 PCN-domain, as in IntServ over DiffServ [RFC2998]. It would also be 460 possible for the RSVP signalling to be originated and/or terminated 461 by proxies, with application-layer signalling between the end user 462 and the proxy (eg SIP signalling with a home hub). A similar example 463 would use NSIS signalling instead of RSVP. 465 It is possible that a user wants its inelastic traffic to use the PCN 466 mechanisms but also react to ECN marking outside the PCN-domain 467 [I-D.sarker-pcn-ecn-pcn-usecases]. Two possible ways to do this are 468 to tunnel all PCN-packets across the PCN-domain, so that the ECN 469 marks are carried transparently across the PCN-domain, or to use an 470 encoding like [I-D.moncaster-pcn-3-state-encoding]. Tunnelling is 471 discussed further in Section 7.7. 473 Some possible deployment models that are outside the current PCN WG 474 charter are outlined in the Appendix. 476 5. Assumptions and constraints on scope 478 The scope of PCN is, at least initially (see Appendix), restricted by 479 the following assumptions: 481 1. these components are deployed in a single DiffServ domain, within 482 which all PCN-nodes are PCN-enabled and are trusted for truthful 483 PCN-marking and transport 485 2. all flows handled by these mechanisms are inelastic and 486 constrained to a known peak rate through policing or shaping 488 3. the number of PCN-flows across any potential bottleneck link is 489 sufficiently large that stateless, statistical mechanisms can be 490 effective. To put it another way, the aggregate bit rate of PCN- 491 traffic across any potential bottleneck link needs to be 492 sufficiently large relative to the maximum additional bit rate 493 added by one flow. This is the basic assumption of measurement- 494 based admission control. 496 4. PCN-flows may have different precedence, but the applicability of 497 the PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc.) 498 is out of scope. 500 5.1. Assumption 1: Trust and support of PCN - controlled environment 502 We assume that the PCN-domain is a controlled environment, ie all the 503 nodes in a PCN-domain run PCN and are trusted. There are several 504 reasons for proposing this assumption: 506 o The PCN-domain has to be encircled by a ring of PCN-boundary- 507 nodes, otherwise traffic could enter a PCN-BA without being 508 subject to admission control, which would potentially degrade the 509 QoS of existing PCN-flows. 511 o Similarly, a PCN-boundary-node has to trust that all the PCN-nodes 512 mark PCN-traffic consistently. A node not performing PCN-marking 513 wouldn't be able to alert when it suffered pre-congestion, which 514 potentially would lead to too many PCN-flows being admitted (or 515 too few being terminated). Worse, a rogue node could perform 516 various attacks, as discussed in the Security Considerations 517 section. 519 One way of assuring the above two points is that the entire PCN- 520 domain is run by a single operator. Another possibility is that 521 there are several operators that trust each other in their handling 522 of PCN-traffic. 524 Note: All PCN-nodes need to be trustworthy. However if it is known 525 that an interface cannot become pre-congested then it is not strictly 526 necessary for it to be capable of PCN-marking. But this must be 527 known even in unusual circumstances, eg after the failure of some 528 links. 530 5.2. Assumption 2: Real-time applications 532 We assume that any variation of source bit rate is independent of the 533 level of pre-congestion. We assume that PCN-packets come from real 534 time applications generating inelastic traffic, ie sending packets at 535 the rate the codec produces them, regardless of the availability of 536 capacity [RFC4594]. For example, voice and video requiring low 537 delay, jitter and packet loss, the Controlled Load Service, 538 [RFC2211], and the Telephony service class, [RFC4594]. This 539 assumption is to help focus the effort where it looks like PCN would 540 be most useful, ie the sorts of applications where per flow QoS is a 541 known requirement. In other words we focus on PCN providing a 542 benefit to inelastic traffic (PCN may or may not provide a benefit to 543 other types of traffic). 545 As a consequence, it is assumed that PCN-marking is being applied to 546 traffic scheduled with the expedited forwarding per-hop behaviour, 547 [RFC3246], or a per-hop behaviour with similar characteristics. 549 5.3. Assumption 3: Many flows and additional load 551 We assume that there are many PCN-flows on any bottleneck link in the 552 PCN-domain (or, to put it another way, the aggregate bit rate of PCN- 553 traffic across any potential bottleneck link is sufficiently large 554 relative to the maximum additional bit rate added by one PCN-flow). 555 Measurement-based admission control assumes that the present is a 556 reasonable prediction of the future: the network conditions are 557 measured at the time of a new flow request, however the actual 558 network performance must be acceptable during the call some time 559 later. One issue is that if there are only a few variable rate 560 flows, then the aggregate traffic level may vary a lot, perhaps 561 enough to cause some packets to get dropped. If there are many flows 562 then the aggregate traffic level should be statistically smoothed. 563 How many flows is enough depends on a number of factors such as the 564 variation in each flow's rate, the total rate of PCN-traffic, and the 565 size of the "safety margin" between the traffic level at which we 566 start admission-marking and at which packets are dropped or 567 significantly delayed. 569 We do not make explicit assumptions on how many PCN-flows are in each 570 ingress-egress-aggregate. Performance evaluation work may clarify 571 whether it is necessary to make any additional assumption on 572 aggregation at the ingress-egress-aggregate level. 574 5.4. Assumption 4: Emergency use out of scope 576 PCN-flows may have different precedence, but the applicability of the 577 PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc) is out 578 of scope for consideration by the PCN WG. 580 6. High-level functional architecture 582 The high-level approach is to split functionality between: 584 o PCN-interior-nodes 'inside' the PCN-domain, which monitor their 585 own state of pre-congestion and mark PCN-packets as appropriate. 586 They are not flow-aware, nor aware of ingress-egress-aggregates. 587 The functionality is also done by PCN-ingress-nodes for their 588 outgoing interfaces (ie those 'inside' the PCN-domain). 590 o PCN-boundary-nodes at the edge of the PCN-domain, which control 591 admission of new PCN-flows and termination of existing PCN-flows, 592 based on information from PCN-interior-nodes. This information is 593 in the form of the PCN-marked data packets (which are intercepted 594 by the PCN-egress-nodes) and not signalling messages. Generally 595 PCN-ingress-nodes are flow-aware. 597 The aim of this split is to keep the bulk of the network simple, 598 scalable and robust, whilst confining policy, application-level and 599 security interactions to the edge of the PCN-domain. For example the 600 lack of flow awareness means that the PCN-interior-nodes don't care 601 about the flow information associated with PCN-packets, nor do the 602 PCN-boundary-nodes care about which PCN-interior-nodes its ingress- 603 egress-aggregates traverse. 605 In order to generate information about the current state of the PCN- 606 domain, each PCN-node PCN-marks packets if it is "pre-congested". 607 Exactly when a PCN-node decides if it is "pre-congested" (the 608 algorithm) and exactly how packets are "PCN-marked" (the encoding) 609 will be defined in separate standards-track documents, but at a high 610 level it is as follows: 612 o the algorithms: a PCN-node meters the amount of PCN-traffic on 613 each one of its outgoing (or incoming) links. The measurement is 614 made as an aggregate of all PCN-packets, and not per flow. There 615 are two algorithms, one for threshold-marking and one for excess- 616 traffic-marking. 618 o the encoding(s): a PCN-node PCN-marks a PCN-packet by modifying a 619 combination of the DSCP and ECN fields. In the "baseline" 620 encoding [I-D.moncaster-pcn-baseline-encoding], the ECN field is 621 set to 11 and the DSCP is not altered. Extension encodings may be 622 defined that, at most, use a second DSCP (eg as in 623 [I-D.moncaster-pcn-3-state-encoding]) and/or set the ECN field to 624 values other than 11 (eg as in [I-D.menth-pcn-psdm-encoding]). 626 In a PCN-domain the operator may have two or three encoding states 627 available. The baseline encoding provides two encoding states (not 628 PCN-marked, PCN-marked), whilst extended encodings can provide three 629 encoding states (not PCN-marked, threshold-marked, excess-traffic- 630 marked). 632 The PCN-boundary-nodes monitor the PCN-marked packets in order to 633 extract information about the current state of the PCN-domain. Based 634 on this monitoring, a distributed decision is made about whether to 635 admit a prospective new flow or whether to terminate existing 636 flow(s). Sections 7.4 and 7.5 mention various possibilities for how 637 the functionality could be distributed. 639 PCN-marking needs to be configured on all (potentially pre-congested) 640 links in the PCN-domain to ensure that the PCN mechanisms protect all 641 links. The actual functionality can be configured on the outgoing or 642 incoming interfaces of PCN-nodes - or one algorithm could be 643 configured on the outgoing interface and the other on the incoming 644 interface. The important point is that a consistent choice is made 645 across the PCN-domain to ensure that the PCN mechanisms protect all 646 links. See [I-D.eardley-pcn-marking-behaviour] for further 647 discussion. 649 The objective of the threshold-marking algorithm is to threshold-mark 650 all PCN-packets whenever the rate of PCN-packets is greater than some 651 configured rate, the PCN-threshold-rate. The objective of the 652 excess-traffic-marking algorithm is to excess-traffic-mark PCN- 653 packets at a rate equal to the difference between the bit rate of 654 PCN-packets and some configured rate, the PCN-excess-rate. Note that 655 this description reflects the overall intent of the algorithm rather 656 than its instantaneous behaviour, since the rate measured at a 657 particular moment depends on the detailed algorithm, its 658 implementation, and the traffic's variance as well as its rate (eg 659 marking may well continue after a recent overload even after the 660 instantaneous rate has dropped). The algorithms are specified in 661 [I-D.eardley-pcn-marking-behaviour]. 663 All the presently proposed admission and termination approaches are 664 detailed and compared in [I-D.charny-pcn-comparison] and [Menth08]. 665 The discussion below is just a brief summary. It initially assumes 666 there are three encoding states available. 668 6.1. Flow admission 670 The objective of PCN's flow admission control mechanism is to limit 671 the PCN-traffic on each link in the PCN-domain to *roughly* its PCN- 672 threshold-rate, by admitting or blocking prospective new flows, in 673 order to protect the QoS of existing PCN-flows. The PCN-threshold- 674 rate is a parameter that can be configured by the operator and will 675 be set lower than the traffic rate at which the link becomes 676 congested and the node drops packets. 678 Exactly how the admission control decision is made will be defined 679 separately in informational documents. At a high level two 680 approaches are proposed (others might be possible): 682 o the PCN-egress-node measures (possibly as a moving average) the 683 fraction of the PCN-traffic that is threshold-marked. The 684 fraction is measured for a specific ingress-egress-aggregate. If 685 the fraction is below a threshold value then the new flow is 686 admitted, and if the fraction is above the threshold value then it 687 is blocked. In [I-D.eardley-pcn-architecture] the fraction is 688 measured as an EWMA (exponentially weighted moving average) and 689 termed the "congestion level estimate". 691 o the PCN-egress-node monitors PCN-traffic and if it receives one 692 (or several) threshold-marked packets, then the new flow is 693 blocked, otherwise it is admitted. One possibility may be to 694 react to the marking state of an initial flow set-up packet (eg 695 RSVP PATH). Another is that after one (or several) threshold- 696 marks then all flows are blocked until after a specific period of 697 no congestion. 699 Note that the admission control decision is made for a particular 700 pair of PCN-boundary-nodes. So it is quite possible for a new flow 701 to be admitted between one pair of PCN-boundary-nodes, whilst at the 702 same time another admission request is blocked between a different 703 pair of PCN-boundary-nodes. 705 6.2. Flow termination 707 The objective of PCN's flow termination mechanism is to limit the 708 PCN-traffic on each link to *roughly* its PCN-excess-rate, by 709 terminating some existing PCN-flows, in order to protect the QoS of 710 the remaining PCN-flows. The PCN-excess-rate is a parameter that can 711 be configured by the operator and may be set lower than the traffic 712 rate at which the link becomes congested and the node drops packets. 714 Exactly how the flow termination decision is made will be defined 715 separately in informational documents. At a high level several 716 approaches are proposed (others might be possible): 718 o In one approach the PCN-egress-node measures the rate of PCN- 719 traffic that is not excess-traffic-marked, which is the amount of 720 PCN-traffic that can actually be supported, and communicates this 721 to the PCN-ingress-node. Also the PCN-ingress-node measures the 722 rate of PCN-traffic that is destined for this specific PCN-egress- 723 node, and hence it can calculate the excess amount that should be 724 terminated. 726 o Another approach instead measures the rate of excess-traffic- 727 marked traffic and terminates this amount of traffic. This 728 terminates less traffic than the previous bullet if some nodes are 729 dropping PCN-traffic. 731 o Another approach monitors PCN-packets and terminates some of the 732 PCN-flows that have an excess-traffic-marked packet. (If all such 733 flows were terminated, far too much traffic would be terminated, 734 so a random selection needs to be made from those with an excess- 735 traffic-marked packet, [I-D.menth-pcn-emft].) 737 Since flow termination is designed for "abnormal" circumstances, it 738 is quite likely that some PCN-nodes are congested and hence packets 739 are being dropped and/or significantly queued. The flow termination 740 mechanism must accommodate this. 742 Note also that the termination control decision is made for a 743 particular pair of PCN-boundary-nodes. So it is quite possible for 744 PCN-flows to be terminated between one pair of PCN-boundary-nodes, 745 whilst at the same time none are terminated between a different pair 746 of PCN-boundary-nodes. 748 6.3. Flow admission and/or flow termination when there are only two PCN 749 encoding states 751 If a PCN-domain has only two encoding states available (PCN-marked 752 and not PCN-marked), ie it is using the baseline encoding 753 [I-D.moncaster-pcn-baseline-encoding], then an operator has three 754 options: 756 o admission control only: PCN-marking means threshold-marking, ie 757 only the threshold-marking algorithm writes PCN-marks. Only PCN 758 admission control is available. 760 o flow termination only: PCN-marking means excess-traffic-marking, 761 ie only the excess-traffic-marking algorithm writes PCN-marks. 763 Only PCN termination control is available. 765 o both admission control and flow termination: only the excess- 766 traffic-marking algorithm writes PCN-marks, however the configured 767 rate (PCN-excess-rate) is set at the rate the admission control 768 mechanism needs to limit PCN-traffic to, as shown in Figure 2. 769 [I-D.charny-pcn-single-marking] describes how both admission 770 control and flow termination can be triggered in this case and 771 also gives some of the pros and cons of this approach. The main 772 downside is that admission control is less accurate. 774 ==Marking behaviour== ==PCN mechanisms== 775 Rate of ^ 776 PCN-traffic on | 777 bottleneck link | Terminate some 778 | Further pkts admitted flows 779 | excess-traffic-marked & 780 | Block new flows 781 | 782 | 783 U*PCN-excess-rate -|--------------------------------------------------- 784 | 785 | Some pkts Block new flows 786 | excess-traffic-marked 787 | 788 PCN-excess-rate -|--------------------------------------------------- 789 | 790 | No pkts Admit new flows 791 | PCN-marked 792 | 794 Figure 2: Schematic of how the PCN admission control and flow 795 termination mechanisms operate as the rate of PCN-traffic increases, 796 for a PCN-domain with two encoding states and using the approach of 797 [I-D.charny-pcn-single-marking]. Note: U is a global parameter for 798 all the PCN-links. 800 6.4. Information transport 802 The transport of pre-congestion information from a PCN-node to a PCN- 803 egress-node is through PCN-markings in data packet headers, ie "in- 804 band": no signalling protocol messaging is needed. Signalling is 805 needed to transport PCN-feedback-information between the PCN- 806 boundary-nodes, for example to convey the fraction of PCN-marked 807 traffic from a PCN-egress-node to the relevant PCN-ingress-node. 808 Exactly what information needs to be transported will be described in 809 the future documents about possible boundary mechanisms. The 810 signalling could be done by an extension of RSVP or NSIS, for 811 instance; protocol work will be done by the relevant WG, but for 812 example [I-D.lefaucheur-rsvp-ecn] describes the extensions needed for 813 RSVP. 815 6.5. PCN-traffic 817 The following are some high-level points about how PCN works: 819 o There needs to be a way for a PCN-node to distinguish PCN-traffic 820 from other traffic. This is through a combination of the DSCP 821 field and/or ECN field. 823 o It is not advised to have non PCN-traffic that competes for the 824 same capacity as PCN-traffic but, if there is such traffic, there 825 needs to be a mechanism to limit it. "Capacity" means the 826 forwarding bandwidth on a link; "competes" means that non PCN- 827 packets will delay PCN-packets in the queue for the link. Hence 828 more non PCN-traffic results in poorer QoS for PCN. Further, the 829 unpredictable amount of non PCN-traffic makes the PCN mechanisms 830 less accurate and so reduces PCN's ability to protect the QoS of 831 admitted PCN-flows 833 o Two examples of such non PCN-traffic (ie that competes for the 834 same capacity as PCN-traffic) are: 836 1. traffic that is priority scheduled over PCN (perhaps a particular 837 application or an operator's control messages). 839 2. traffic that is scheduled at the same priority as PCN (for 840 example if the Voice-Admit codepoint is used for PCN-traffic 841 [I-D.moncaster-pcn-baseline-encoding] and there is voice-admit 842 traffic in the PCN-domain). 844 o If there is such non PCN-traffic (ie that competes for the same 845 capacity as PCN-traffic), then PCN's mechanisms should take 846 account of it, in order to improve the accuracy of the decision 847 about whether to admit (or terminate) a PCN-flow. For example, 848 one mechanism is that such non PCN-traffic contributes to the PCN 849 meters (ie is metered by the threshold-marking and excess-traffic- 850 marking algorithms). 852 o There will be non PCN-traffic that doesn't compete for the same 853 capacity as PCN-traffic, because it is forwarded at lower 854 priority. Hence it shouldn't contribute to the PCN meters. 855 Examples are best effort and assured forwarding traffic. However, 856 a PCN-node should dedicate some capacity to lower priority traffic 857 so that it isn't starved. 859 o The document assumes that the PCN mechanisms are applied to a 860 single behaviour aggregate in the PCN-domain. However, it would 861 also be possible to apply them independently to more than one 862 behaviour aggregate, which are distinguished by DSCP. 864 6.6. Backwards compatibility 866 PCN specifies semantics for the ECN field that differ from the 867 default semantics of [RFC3168]. A particular PCN encoding scheme 868 needs to describe how it meets the guidelines of BCP 124 869 [RFC4774].BCP 124 [RFC4774] for specifying alternative semantics for 870 the ECN field. In summary the approach is to: 872 o use a DSCP to allow PCN-nodes to distinguish PCN-traffic that uses 873 the alternative ECN semantics; 875 o define these semantics for use within a controlled region, the 876 PCN-domain; 878 o take appropriate action if ECN capable, non-PCN traffic arrives at 879 a PCN-ingress-node with the DSCP used by PCN. 881 For the baseline encoding [I-D.moncaster-pcn-baseline-encoding], the 882 'appropriate action' is to block ECN-capable traffic that uses the 883 same DSCP as PCN from entering the PCN-domain directly. Blocking 884 means it is dropped or downgraded to a lower priority behaviour 885 aggregate, or alternatively such traffic may be tunnelled through the 886 PCN-domain. The reason that blocking is needed is that the PCN- 887 egress-node clears the ECN field to 00. 889 Extended encoding schemes may take different 'appropriate action'. 891 7. Detailed Functional architecture 893 This section is intended to provide a systematic summary of the new 894 functional architecture in the PCN-domain. First it describes 895 functions needed at the three specific types of PCN-node; these are 896 data plane functions and are in addition to their normal router 897 functions. Then it describes further functionality needed for both 898 flow admission control and flow termination; these are signalling and 899 decision-making functions, and there are various possibilities for 900 where the functions are physically located. The section is split 901 into: 903 1. functions needed at PCN-interior-nodes 904 2. functions needed at PCN-ingress-nodes 906 3. functions needed at PCN-egress-nodes 908 4. other functions needed for flow admission control 910 5. other functions needed for flow termination control 912 Note: Probing is covered in the Appendix. 914 The section then discusses some other detailed topics: 916 1. addressing 918 2. tunnelling 920 3. fault handling 922 7.1. PCN-interior-node functions 924 Each link of the PCN-domain is configured with the following 925 functionality: 927 o Behaviour aggregate classification - determine whether an incoming 928 packet is a PCN-packet or not. 930 o Meter - measure the 'amount of PCN-traffic'. The measurement is 931 made as an aggregate of all PCN-packets, and not per flow. 933 o PCN-mark - algorithms determine whether to PCN-mark PCN-packets 934 and what packet encoding is used. 936 The functions are defined in [I-D.eardley-pcn-marking-behaviour] and 937 the baseline encoding in [I-D.moncaster-pcn-baseline-encoding] 938 (extended encodings are to be defined in other documents). 940 7.2. PCN-ingress-node functions 942 Each ingress link of the PCN-domain is configured with the following 943 functionality: 945 o Packet classification - determine whether an incoming packet is 946 part of a previously admitted flow, by using a filter spec (eg 947 DSCP, source and destination addresses and port numbers). 949 o Traffic conditioning - police, by dropping or downgrading, any 950 packets received with a DSCP indicating PCN transport that do not 951 belong to an admitted flow. (A prospective PCN-flow that is 952 rejected could be blocked or admitted into a lower priority 953 behaviour aggregate.) Similarly, police packets that are part of 954 a previously admitted flow, to check that the flow keeps to the 955 agreed rate or flowspec (eg RFC 1633 [RFC1633] for a microflow and 956 its NSIS equivalent). 958 o PCN-colour - set the DSCP and ECN fields appropriately for the 959 PCN-domain, for example as in 960 [I-D.moncaster-pcn-baseline-encoding]. 962 o Meter - some approaches to flow termination require the PCN- 963 ingress-node to measure the (aggregate) rate of PCN-traffic 964 towards a particular PCN-egress-node. 966 The first two are policing functions, needed to make sure that PCN- 967 packets admitted into the PCN-domain belong to a flow that has been 968 admitted and to ensure that the flow keeps to the flowspec agreed (eg 969 doesn't exceed an agreed maximum rate and is inelastic traffic). 970 Installing the filter spec will typically be done by the signalling 971 protocol, as will re-installing the filter, for example after a re- 972 route that changes the PCN-ingress-node (see 973 [I-D.briscoe-tsvwg-cl-architecture] for an example using RSVP). PCN- 974 colouring allows the rest of the PCN-domain to recognise PCN-packets. 976 7.3. PCN-egress-node functions 978 Each egress link of the PCN-domain is configured with the following 979 functionality: 981 o Packet classify - determine which PCN-ingress-node a PCN-packet 982 has come from. 984 o Meter - "measure PCN-traffic" or "monitor PCN-marks". 986 o PCN-colour - for PCN-packets, set the DSCP and ECN fields to the 987 appropriate values for use outside the PCN-domain. 989 The metering functionality of course depends on whether it is 990 targeted at admission control or flow termination. Alternative 991 proposals involve the PCN-egress-node "measuring" as an aggregate (ie 992 not per flow) all PCN-packets from a particular PCN-ingress-node, or 993 "monitoring" the PCN-traffic and reacting to one (or several) PCN- 994 marked packets. For PCN-colouring, 995 [I-D.moncaster-pcn-baseline-encoding] specifies that the PCN-egress- 996 node re-sets the ECN field to 00; other encodings may define 997 different behaviour. 999 7.4. Admission control functions 1001 As well as the functions covered above, other specific admission 1002 control functions need to be performed: 1004 o Make decision about admission - based on the output of the PCN- 1005 egress-node's PCN meter function. In the case where it "measures 1006 PCN-traffic", the measured traffic on the ingress-egress-aggregate 1007 is compared with some reference level. In the case where it 1008 "monitors PCN-marks", then the decision is based on whether one 1009 (or several) packets is (are) PCN-marked or not (eg the RSVP PATH 1010 message). In either case, the admission decision also takes 1011 account of policy and application layer requirements. 1013 o Communicate decision about admission - signal the decision to the 1014 node making the admission control request (which may be outside 1015 the PCN-domain), and to the policer (PCN-ingress-node function) 1016 for enforcement of the decision. 1018 There are various possibilities for how the functionality could be 1019 distributed (we assume the operator would configure which is used): 1021 o The decision is made at the PCN-egress-node and the decision 1022 (admit or block) is signalled to the PCN-ingress-node. 1024 o The decision is recommended by the PCN-egress-node (admit or 1025 block) but the decision is definitively made by the PCN-ingress- 1026 node. The rationale is that the PCN-egress-node naturally has the 1027 necessary information about PCN-marking on the ingress-egress- 1028 aggregate, but the PCN-ingress-node is the policy enforcement 1029 point, which polices incoming traffic to ensure it is part of an 1030 admitted PCN-flow. 1032 o The decision is made at the PCN-ingress-node, which requires that 1033 the PCN-egress-node signals PCN-feedback-information to the PCN- 1034 ingress-node. For example, it could signal the current fraction 1035 of PCN-traffic that is PCN-marked. 1037 o The decision is made at a centralised node (see Appendix; beyond 1038 scope of current PCN WG charter). 1040 Note: Admission control functionality is not performed by normal PCN- 1041 interior-nodes. 1043 7.5. Flow termination functions 1045 As well as the functions covered above, other specific termination 1046 control functions need to be performed: 1048 o PCN-meter at PCN-egress-node - similarly to flow admission, there 1049 are two types of proposals: to "measure PCN-traffic" on the 1050 ingress-egress-aggregate, and to "monitor PCN-marks" and react to 1051 one (or several) PCN-marks. 1053 o (if required) PCN-meter at PCN-ingress-node - make "measurements 1054 of PCN-traffic" being sent towards a particular PCN-egress-node; 1055 again, this is done for the ingress-egress-aggregate and not per 1056 flow. 1058 o (if required) Communicate PCN-feedback-information to the node 1059 that makes the flow termination decision. For example, as in 1060 [I-D.briscoe-tsvwg-cl-architecture], communicate the PCN-egress- 1061 node's measurements to the PCN-ingress-node. 1063 o Make decision about flow termination - use the information from 1064 the PCN-meter(s) to decide which PCN-flow or PCN-flows to 1065 terminate. The decision takes account of policy and application 1066 layer requirements. 1068 o Communicate decision about flow termination - signal the decision 1069 to the node that is able to terminate the flow (which may be 1070 outside the PCN-domain), and to the policer (PCN-ingress-node 1071 function) for enforcement of the decision. 1073 There are various possibilities for how the functionality could be 1074 distributed, similar to those discussed above in the Admission 1075 control section. 1077 7.6. Addressing 1079 PCN-nodes may need to know the address of other PCN-nodes. Note: in 1080 all cases PCN-interior-nodes don't need to know the address of any 1081 other PCN-nodes (except as normal their next hop neighbours, for 1082 routing purposes). 1084 The PCN-egress-node needs to know the address of the PCN-ingress-node 1085 associated with a flow, at a minimum so that the PCN-ingress-node can 1086 be informed to enforce the admission decision (and any flow 1087 termination decision) through policing. There are various 1088 possibilities for how the PCN-egress-node can do this, ie associate 1089 the received packet to the correct ingress-egress-aggregate. It is 1090 not the intention of this document to mandate a particular mechanism. 1092 o The addressing information can be gathered from signalling. For 1093 example, regular processing of an RSVP Path message, as the PCN- 1094 ingress-node is the previous RSVP hop (PHOP) 1095 ([I-D.lefaucheur-rsvp-ecn]). Or the PCN-ingress-node could signal 1096 its address to the PCN-egress-node. 1098 o Always tunnel PCN-traffic across the PCN-domain. Then the PCN- 1099 ingress-node's address is simply the source address of the outer 1100 packet header. The PCN-ingress-node needs to learn the address of 1101 the PCN-egress-node, either by manual configuration or by one of 1102 the automated tunnel endpoint discovery mechanisms (such as 1103 signalling or probing over the data route, interrogating routing 1104 or using a centralised broker). 1106 7.7. Tunnelling 1108 Tunnels may originate and/or terminate within a PCN-domain (eg IP 1109 over IP, IP over MPLS). It is important that the PCN-marking of any 1110 packet can potentially influence PCN's flow admission control and 1111 termination - it shouldn't matter whether the packet happens to be 1112 tunnelled at the PCN-node that PCN-marks the packet, or indeed 1113 whether it's decapsulated or encapsulated by a subsequent PCN-node. 1114 This suggests that the "uniform conceptual model" described in 1115 [RFC2983] should be re-applied in the PCN context. In line with this 1116 and the approach of [RFC4303] and [I-D.briscoe-tsvwg-ecn-tunnel], the 1117 following rule is applied if encapsulation is done within the PCN- 1118 domain: 1120 o any PCN-marking is copied into the outer header 1122 Note: A tunnel will not provide this behaviour if it complies with 1123 [RFC3168] tunnelling in either mode, but it will if it complies with 1124 [RFC4301] IPSec tunnelling. 1126 Similarly, in line with the "uniform conceptual model" of [RFC2983], 1127 the "full-functionality option" of [RFC3168], and [RFC4301], the 1128 following rule is applied if decapsulation is done within the PCN- 1129 domain: 1131 o if the outer header's marking state is more severe then it is 1132 copied onto the inner header. 1134 Note: the order of increasing severity is: not PCN-marked; threshold- 1135 marking; excess-traffic-marking. 1137 An operator may wish to tunnel PCN-traffic from PCN-ingress-nodes to 1138 PCN-egress-nodes. The PCN-marks shouldn't be visible outside the 1139 PCN-domain, which can be achieved by the PCN-egress-node doing the 1140 PCN-colouring function (Section 7.3) after all the other (PCN and 1141 tunnelling) functions. The potential reasons for doing such 1142 tunnelling are: the PCN-egress-node then automatically knows the 1143 address of the relevant PCN-ingress-node for a flow; even if ECMP is 1144 running, all PCN-packets on a particular ingress-egress-aggregate 1145 follow the same path. But it also has drawbacks, for example the 1146 additional overhead in terms of bandwidth and processing, and the 1147 cost of setting up a mesh of tunnels between PCN-boundary-nodes 1148 (there is an N^2 scaling issue). 1150 Potential issues arise for a "partially PCN-capable tunnel", ie where 1151 only one tunnel endpoint is in the PCN domain: 1153 1. The tunnel originates outside a PCN-domain and ends inside it. 1154 If the packet arrives at the tunnel ingress with the same 1155 encoding as used within the PCN-domain to indicate PCN-marking, 1156 then this could lead the PCN-egress-node to falsely measure pre- 1157 congestion. 1159 2. The tunnel originates inside a PCN-domain and ends outside it. 1160 If the packet arrives at the tunnel ingress already PCN-marked, 1161 then it will still have the same encoding when it's decapsulated 1162 which could potentially confuse nodes beyond the tunnel egress. 1164 In line with the solution for partially capable DiffServ tunnels in 1165 [RFC2983], the following rules are applied: 1167 o For case (1), the tunnel egress node clears any PCN-marking on the 1168 inner header. This rule is applied before the 'copy on 1169 decapsulation' rule above. 1171 o For case (2), the tunnel ingress node clears any PCN-marking on 1172 the inner header. This rule is applied after the 'copy on 1173 encapsulation' rule above. 1175 Note that the above implies that one has to know, or determine, the 1176 characteristics of the other end of the tunnel as part of 1177 establishing it. 1179 Tunnelling constraints were a major factor in the choice of the 1180 baseline encoding. As explained in 1181 [I-D.moncaster-pcn-baseline-encoding], with current tunnelling 1182 endpoints only the 11 codepoint of the ECN field survives 1183 decapsulation, and hence the baseline encoding only uses the 11 1184 codepoint to indicate PCN-marking. Extended encoding schemes need to 1185 explain their interactions with (or assumptions about) tunnelling. A 1186 lengthy discussion of all the issues associated with layered 1187 encapsulation of congestion notification (for ECN as well as PCN) is 1188 in [I-D.briscoe-tsvwg-ecn-tunnel]. 1190 7.8. Fault handling 1192 If a PCN-interior-node (or one of its links) fails, then lower layer 1193 protection mechanisms or the regular IP routing protocol will 1194 eventually re-route around it. If the new route can carry all the 1195 admitted traffic, flows will gracefully continue. If instead this 1196 causes early warning of pre-congestion on the new route, then 1197 admission control based on pre-congestion notification will ensure 1198 new flows will not be admitted until enough existing flows have 1199 departed. Re-routing may result in heavy (pre-)congestion, when the 1200 flow termination mechanism will kick in. 1202 If a PCN-boundary-node fails then we would like the regular QoS 1203 signalling protocol to be responsible for taking appropriate action. 1204 As an example [I-D.briscoe-tsvwg-cl-architecture] considers what 1205 happens if RSVP is the QoS signalling protocol. 1207 8. Challenges 1209 Prior work on PCN and similar mechanisms has thrown up a number of 1210 considerations about PCN's design goals (things PCN should be good 1211 at) [I-D.chan-pcn-problem-statement] and some issues that have been 1212 hard to solve in a fully satisfactory manner. Taken as a whole it 1213 represents a list of trade-offs (it is unlikely that they can all be 1214 100% achieved) and perhaps as evaluation criteria to help an operator 1215 (or the IETF) decide between options. 1217 The following are open issues. They are mainly taken from 1218 [I-D.briscoe-tsvwg-cl-architecture], which also describes some 1219 possible solutions. Note that some may be considered unimportant in 1220 general or in specific deployment scenarios or by some operators. 1222 NOTE: Potential solutions are out of scope for this document. 1224 o ECMP (Equal Cost Multi-Path) Routing: The level of pre-congestion 1225 is measured on a specific ingress-egress-aggregate. However, if 1226 the PCN-domain runs ECMP, then traffic on this ingress-egress- 1227 aggregate may follow several different paths - some of the paths 1228 could be pre-congested whilst others are not. There are three 1229 potential problems: 1231 1. over-admission: a new flow is admitted (because the pre- 1232 congestion level measured by the PCN-egress-node is 1233 sufficiently diluted by unmarked packets from non-congested 1234 paths that a new flow is admitted), but its packets travel 1235 through a pre-congested PCN-node. 1237 2. under-admission: a new flow is blocked (because the pre- 1238 congestion level measured by the PCN-egress-node is 1239 sufficiently increased by PCN-marked packets from pre- 1240 congested paths that a new flow is blocked), but its packets 1241 travel along an uncongested path. 1243 3. ineffective termination: a flow is terminated, but its path 1244 doesn't travel through the (pre-)congested router(s). Since 1245 flow termination is a 'last resort', which protects the 1246 network should over-admission occur, this problem is probably 1247 more important to solve than the other two. 1249 o ECMP and signalling: It is possible that, in a PCN-domain running 1250 ECMP, the signalling packets (eg RSVP, NSIS) follow a different 1251 path than the data packets, which could matter if the signalling 1252 packets are used as probes. Whether this is an issue depends on 1253 which fields the ECMP algorithm uses; if the ECMP algorithm is 1254 restricted to the source and destination IP addresses, then it 1255 will not be an issue. ECMP and signalling interactions are a 1256 specific instance of a general issue for non-traditional routing 1257 combined with resource management along a path [Hancock]. 1259 o Tunnelling: There are scenarios where tunnelling makes it 1260 difficult to determine the path in the PCN-domain. The problem, 1261 its impact, and the potential solutions are similar to those for 1262 ECMP. 1264 o Scenarios with only one tunnel endpoint in the PCN domain may make 1265 it harder for the PCN-egress-node to gather from the signalling 1266 messages (eg RSVP, NSIS) the identity of the PCN-ingress-node. 1268 o Bi-Directional Sessions: Many applications have bi-directional 1269 sessions - hence there are two microflows that should be admitted 1270 (or terminated) as a pair - for instance a bi-directional voice 1271 call only makes sense if microflows in both directions are 1272 admitted. However, the PCN mechanisms concern admission and 1273 termination of a single flow, and coordination of the decision for 1274 both flows is a matter for the signalling protocol and out of 1275 scope of PCN. One possible example would use SIP pre-conditions. 1276 However, there are others. 1278 o Global Coordination: PCN makes its admission decision based on 1279 PCN-markings on a particular ingress-egress-aggregate. Decisions 1280 about flows through a different ingress-egress-aggregate are made 1281 independently. However, one can imagine network topologies and 1282 traffic matrices where, from a global perspective, it would be 1283 better to make a coordinated decision across all the ingress- 1284 egress-aggregates for the whole PCN-domain. For example, to block 1285 (or even terminate) flows on one ingress-egress-aggregate so that 1286 more important flows through a different ingress-egress-aggregate 1287 could be admitted. The problem may well be relatively 1288 insignificant. 1290 o Aggregate Traffic Characteristics: Even when the number of flows 1291 is stable, the traffic level through the PCN-domain will vary 1292 because the sources vary their traffic rates. PCN works best when 1293 there is not too much variability in the total traffic level at a 1294 PCN-node's interface (ie in the aggregate traffic from all 1295 sources). Too much variation means that a node may (at one 1296 moment) not be doing any PCN-marking and then (at another moment) 1297 drop packets because it is overloaded. This makes it hard to tune 1298 the admission control scheme to stop admitting new flows at the 1299 right time. Therefore the problem is more likely with fewer, 1300 burstier flows. 1302 o Flash crowds and Speed of Reaction: PCN is a measurement-based 1303 mechanism and so there is an inherent delay between packet marking 1304 by PCN-interior-nodes and any admission control reaction at PCN- 1305 boundary-nodes. For example, potentially if a big burst of 1306 admission requests occurs in a very short space of time (eg 1307 prompted by a televote), they could all get admitted before enough 1308 PCN-marks are seen to block new flows. In other words, any 1309 additional load offered within the reaction time of the mechanism 1310 must not move the PCN-domain directly from a no congestion state 1311 to overload. This 'vulnerability period' may have an impact at 1312 the signalling level, for instance QoS requests should be rate 1313 limited to bound the number of requests able to arrive within the 1314 vulnerability period. 1316 o Silent at start: after a successful admission request the source 1317 may wait some time before sending data (eg waiting for the called 1318 party to answer). Then the risk is that, in some circumstances, 1319 PCN's measurements underestimate what the pre-congestion level 1320 will be when the source does start sending data. 1322 9. Operations and Management 1324 This Section considers operations and management issues, under the 1325 FCAPS headings: OAM of Faults, Configuration, Accounting, Performance 1326 and Security. Provisioning is discussed with performance. 1328 9.1. Configuration OAM 1330 Threshold-marking and excess-traffic-marking are standardised in 1331 [I-D.eardley-pcn-marking-behaviour]. However, more diversity in PCN- 1332 boundary-node behaviours is expected, in order to interface with 1333 diverse industry architectures. It may be possible to have different 1334 PCN-boundary-node behaviours for different ingress-egress-aggregates 1335 within the same PCN-domain. 1337 A PCN marking behaviour (threshold-marking, excess-traffic-marking) 1338 is enabled on either the egress or the ingress interfaces of PCN- 1339 nodes. A consistent choice must be made across the PCN-domain to 1340 ensure that the PCN mechanisms protect all links. 1342 PCN configuration control variables fall into the following 1343 categories: 1345 o system options (enabling or disabling behaviours) 1347 o parameters (setting levels, addresses etc) 1349 One possibility is that all configurable variables sit within an SNMP 1350 management framework [RFC3411], being structured within a defined 1351 management information base (MIB) on each node, and being remotely 1352 readable and settable via a suitably secure management protocol 1353 (SNMPv3). 1355 Some configuration options and parameters have to be set once to 1356 'globally' control the whole PCN-domain. Where possible, these are 1357 identified below. This may affect operational complexity and the 1358 chances of interoperability problems between equipment from different 1359 vendors. 1361 It may be possible for an operator to configure some PCN-interior- 1362 nodes so that they don't run the PCN mechanisms, if it knows that 1363 these links will never become (pre-)congested. 1365 9.1.1. System options 1367 On PCN-interior-nodes there will be very few system options: 1369 o Whether two PCN-markings (threshold-marked and excess-traffic- 1370 marked) are enabled or only one. Typically all nodes throughout a 1371 PCN-domain will be configured the same in this respect. However, 1372 exceptions could be made. For example, if most PCN-nodes used 1373 both markings, but some legacy hardware was incapable of running 1374 two algorithms, an operator might be willing to configure these 1375 legacy nodes solely for excess-traffic-marking to enable flow 1376 termination as a back-stop. It would be sensible to place such 1377 nodes where they could be provisioned with a greater leeway over 1378 expected traffic levels. 1380 o In the case where only one PCN-marking is enabled, all nodes must 1381 be configured to generate PCN-marks from the same meter (ie either 1382 the threshold meter or the excess traffic meter). 1384 PCN-boundary-nodes (ingress and egress) will have more system 1385 options: 1387 o Which of admission and flow termination are enabled. If any PCN- 1388 interior-node is configured to generate a marking, all PCN- 1389 boundary-nodes must be able to interpret that marking (which 1390 includes understanding, in a PCN-domain that uses only one type of 1391 PCN-marking, whether they are generated by PCN-interior-node's 1392 threshold meters or the excess traffic meters). Therefore all 1393 PCN-boundary-nodes must be configured the same in this respect. 1395 o Where flow admission and termination decisions are made: at PCN- 1396 ingress-nodes or at PCN-egress-nodes (or at a centralised node, 1397 see Appendix). Theoretically, this configuration choice could be 1398 negotiated for each pair of PCN-boundary-nodes, but we cannot 1399 imagine why such complexity would be required, except perhaps in 1400 future inter-domain scenarios. 1402 o How PCN-markings are translated into admission control and flow 1403 termination decisions (see Section 6.1 and Section 6.2). 1405 PCN-egress-nodes will have further system options: 1407 o How the mapping should be established between each packet and its 1408 aggregate, eg by MPLS label, by IP packet filterspec; and how to 1409 take account of ECMP. 1411 o If an equipment vendor provides a choice, there may be options to 1412 select which smoothing algorithm to use for measurements. 1414 9.1.2. Parameters 1416 Like any DiffServ domain, every node within a PCN-domain will need to 1417 be configured with the DSCP(s) used to identify PCN-packets. On each 1418 interior link the main configuration parameters are the PCN- 1419 threshold-rate and PCN-excess-rate. A larger PCN-threshold-rate 1420 enables more PCN-traffic to be admitted on a link, hence improving 1421 capacity utilisation. A PCN-excess-rate set further above the PCN- 1422 threshold-rate allows greater increases in traffic (whether due to 1423 natural fluctuations or some unexpected event) before any flows are 1424 terminated, ie minimises the chances of unnecessarily triggering the 1425 termination mechanism. For instance, an operator may want to design 1426 their network so that it can cope with a failure of any single PCN- 1427 node without terminating any flows. 1429 Setting these rates on first deployment of PCN will be very similar 1430 to the traditional process for sizing an admission controlled 1431 network, depending on: the operator's requirements for minimising 1432 flow blocking (grade of service), the expected PCN traffic load on 1433 each link and its statistical characteristics (the traffic matrix), 1434 contingency for re-routing the PCN traffic matrix in the event of 1435 single or multiple failures, and the expected load from other classes 1436 relative to link capacities [Menth]. But once a domain is in 1437 operation, a PCN design goal is to be able to determine growth in 1438 these configured rates much more simply, by monitoring PCN-marking 1439 rates from actual rather than expected traffic (see Section 9.2 on 1440 Performance & Provisioning). 1442 Operators may also wish to configure a rate greater than the PCN- 1443 excess-rate that is the absolute maximum rate that a link allows for 1444 PCN-traffic. This may simply be the physical link rate, but some 1445 operators may wish to configure a logical limit to prevent starvation 1446 of other traffic classes during any brief period after PCN-traffic 1447 exceeds the PCN-excess-rate but before flow termination brings it 1448 back below this rate. 1450 Threshold-marking requires a threshold token bucket depth to be 1451 configured, excess-traffic-marking needs a value for the MTU (maximum 1452 size of a PCN-packet on the link) and both require setting a maximum 1453 size of their token buckets. It will be preferable for there to be 1454 rules to set defaults for these parameters, but then allow operators 1455 to change them, for instance if average traffic characteristics 1456 change over time. 1458 The PCN-egress-node may allow configuration of the following: 1460 o how it smooths metering of PCN-markings (eg EWMA parameters) 1462 Whichever node makes admission and flow termination decisions will 1463 contain algorithms for converting PCN-marking levels into admission 1464 or flow termination decisions. These will also require configurable 1465 parameters, for instance: 1467 o an admission control algorithm that is based on the fraction of 1468 marked packets will at least require a marking threshold setting 1469 above which it denies admission to new flows; 1471 o flow termination algorithms will probably require a parameter to 1472 delay termination of any flows until it is more certain that an 1473 anomalous event is not transient; 1475 o a parameter to control the trade-off between how quickly excess 1476 flows are terminated, and over-termination. 1478 One particular proposal, [I-D.charny-pcn-single-marking] would 1479 require a global parameter to be defined on all PCN-nodes, but only 1480 needs one PCN marking rate to be configured on each link. The global 1481 parameter is a scaling factor between admission and termination (the 1482 PCN-traffic rate on a link up to which flows are admitted vs the rate 1483 above which flows are terminated). [I-D.charny-pcn-single-marking] 1484 discusses in full the impact of this particular proposal on the 1485 operation of PCN. 1487 9.2. Performance & Provisioning OAM 1489 Monitoring of performance factors measurable from *outside* the PCN 1490 domain will be no different with PCN than with any other packet-based 1491 flow admission control system, both at the flow level (blocking 1492 probability etc) and the packet level (jitter [RFC3393], [Y.1541], 1493 loss rate [RFC4656], mean opinion score [P.800], etc). The 1494 difference is that PCN is intentionally designed to indicate 1495 *internally* which exact resource(s) are the cause of performance 1496 problems and by how much. 1498 Even better, PCN indicates which resources will probably cause 1499 problems if they are not upgraded soon. This can be achieved by the 1500 management system monitoring the total amount (in bytes) of PCN- 1501 marking generated by each queue over a period. Given possible long 1502 provisioning lead times, pre-congestion volume is the best metric to 1503 reveal whether sufficient persistent demand has occurred to warrant 1504 an upgrade. Because, even before utilisation becomes problematic, 1505 the statistical variability of traffic will cause occasional bursts 1506 of pre-congestion. This 'early warning system' decouples the process 1507 of adding customers from the provisioning process. This should cut 1508 the time to add a customer when compared against admission control 1509 provided over native DiffServ [RFC2998], because it saves having to 1510 re-run the capacity planning process before adding each customer. 1512 Alternatively, before triggering an upgrade, the long term pre- 1513 congestion volume on each link can be used to balance traffic load 1514 across the PCN-domain by adjusting the link weights of the routing 1515 system. When an upgrade to a link's configured PCN-rates is 1516 required, it may also be necessary to upgrade the physical capacity 1517 available to other classes. But usually there will be sufficient 1518 physical capacity for the upgrade to go ahead as a simple 1519 configuration change. Alternatively, [Songhurst] has proposed an 1520 adaptive rather than preconfigured system, where the configured PCN- 1521 threshold-rate is replaced with a high and low water mark and the 1522 marking algorithm automatically optimises how physical capacity is 1523 shared using the relative loads from PCN and other traffic classes. 1525 All the above processes require just three extra counters associated 1526 with each PCN queue: threshold-markings, excess-traffic-markings and 1527 drop. Every time a PCN packet is marked or dropped its size in bytes 1528 should be added to the appropriate counter. Then the management 1529 system can read the counters at any time and subtract a previous 1530 reading to establish the incremental volume of each type of 1531 (pre-)congestion. Readings should be taken frequently, so that 1532 anomalous events (eg re-routes) can be separated from regular 1533 fluctuating demand if required. 1535 9.3. Accounting OAM 1537 Accounting is only done at trust boundaries so it is out of scope of 1538 the initial charter of the PCN WG, which is confined to intra-domain 1539 issues. Use of PCN internal to a domain makes no difference to the 1540 flow signalling events crossing trust boundaries outside the PCN- 1541 domain, which are typically used for accounting. 1543 9.4. Fault OAM 1545 Fault OAM is about preventing faults, telling the management system 1546 (or manual operator) that the system has recovered (or not) from a 1547 failure, and about maintaining information to aid fault diagnosis. 1549 Admission blocking and particularly flow termination mechanisms 1550 should rarely be needed in practice. It would be unfortunate if they 1551 didn't work after an option had been accidentally disabled. 1552 Therefore it will be necessary to regularly test that the live system 1553 works as intended (devising a meaningful test is left as an exercise 1554 for the operator). 1556 Section 7 describes how the PCN architecture has been designed to 1557 ensure admitted flows continue gracefully after recovering 1558 automatically from link or node failures. The need to record and 1559 monitor re-routing events affecting signalling is unchanged by the 1560 addition of PCN to a DiffServ domain. Similarly, re-routing events 1561 within the PCN-domain will be recorded and monitored just as they 1562 would be without PCN. 1564 PCN-marking does make it possible to record 'near-misses'. For 1565 instance, at the PCN-egress-node a 'reporting threshold' could be set 1566 to monitor how often - and for how long - the system comes close to 1567 triggering flow blocking without actually doing so. Similarly, 1568 bursts of flow termination marking could be recorded even if they are 1569 not sufficiently sustained to trigger flow termination. Such 1570 statistics could be correlated with per-queue counts of marking 1571 volume (Section 9.2) to upgrade resources in danger of causing 1572 service degradation, or to trigger manual tracing of intermittent 1573 incipient errors that would otherwise have gone unnoticed. 1575 Finally, of course, many faults are caused by failings in the 1576 management process ('human error'): a wrongly configured address in a 1577 node, a wrong address given in a signalling protocol, a wrongly 1578 configured parameter in a queueing algorithm, a node set into a 1579 different mode from other nodes, and so on. Generally, a clean 1580 design with few configurable options ensures this class of faults can 1581 be traced more easily and prevented more often. Sound management 1582 practice at run-time also helps. For instance: a management system 1583 should be used that constrains configuration changes within system 1584 rules (eg preventing an option setting inconsistent with other 1585 nodes); configuration options should also be recorded in an offline 1586 database; and regular automatic consistency checks between live 1587 systems and the database should be performed. PCN adds nothing 1588 specific to this class of problems. 1590 9.5. Security OAM 1592 Security OAM is about using secure operational practices as well as 1593 being able to track security breaches or near-misses at run-time. 1594 PCN adds few specifics to the general good practice required in this 1595 field [RFC4778], other than those below. The correct functions of 1596 the system should be monitored (Section 9.2) in multiple independent 1597 ways and correlated to detect possible security breaches. Persistent 1598 (pre-)congestion marking should raise an alarm (both on the node 1599 doing the marking and on the PCN-egress-node metering it). 1600 Similarly, persistently poor external QoS metrics such as jitter or 1601 MOS should raise an alarm. The following are examples of symptoms 1602 that may be the result of innocent faults, rather than attacks, but 1603 until diagnosed they should be logged and trigger a security alarm: 1605 o Anomalous patterns of non-conforming incoming signals and packets 1606 rejected at the PCN-ingress-nodes (eg packets already marked PCN- 1607 capable, or traffic persistently starving token bucket policers). 1609 o PCN-capable packets arriving at a PCN-egress-node with no 1610 associated state for mapping them to a valid ingress-egress- 1611 aggregate. 1613 o A PCN-ingress-node receiving feedback signals about the pre- 1614 congestion level on a non-existent aggregate, or that are 1615 inconsistent with other signals (eg unexpected sequence numbers, 1616 inconsistent addressing, conflicting reports of the pre-congestion 1617 level, etc). 1619 o Pre-congestion marking arriving at an PCN-egress-node with 1620 (pre-)congestion markings focused on particular flows, rather than 1621 randomly distributed throughout the aggregate. 1623 10. IANA Considerations 1625 This memo includes no request to IANA. 1627 11. Security considerations 1629 Security considerations essentially come from the Trust Assumption 1630 (Section 5.1), ie that all PCN-nodes are PCN-enabled and are trusted 1631 for truthful PCN-marking and transport. PCN splits functionality 1632 between PCN-interior-nodes and PCN-boundary-nodes, and the security 1633 considerations are somewhat different for each, mainly because PCN- 1634 boundary-nodes are flow-aware and PCN-interior-nodes are not. 1636 o Because the PCN-boundary-nodes are flow-aware, they are trusted to 1637 use that awareness correctly. The degree of trust required 1638 depends on the kinds of decisions they have to make and the kinds 1639 of information they need to make them. There is nothing specific 1640 to PCN. 1642 o the PCN-ingress-nodes police packets to ensure a PCN-flow sticks 1643 within its agreed limit, and to ensure that only PCN-flows that 1644 have been admitted contribute PCN-traffic into the PCN-domain. 1645 The policer must drop (or perhaps downgrade to a different DSCP) 1646 any PCN-packets received that are outside this remit. This is 1647 similar to the existing IntServ behaviour. Between them the PCN- 1648 boundary-nodes must encircle the PCN-domain, otherwise PCN-packets 1649 could enter the PCN-domain without being subject to admission 1650 control, which would potentially destroy the QoS of existing 1651 flows. 1653 o PCN-interior-nodes are not flow-aware. This prevents some 1654 security attacks where an attacker targets specific flows in the 1655 data plane - for instance for DoS or eavesdropping. 1657 o The PCN-boundary-nodes rely on correct PCN-marking by the PCN- 1658 interior-nodes. For instance a rogue PCN-interior-node could PCN- 1659 mark all packets so that no flows were admitted. Another 1660 possibility is that it doesn't PCN-mark any packets, even when it 1661 is pre-congested. More subtly, the rogue PCN-interior-node could 1662 perform these attacks selectively on particular flows, or it could 1663 PCN-mark the correct fraction overall, but carefully choose which 1664 flows it marked. 1666 o the PCN-boundary-nodes should be able to deal with DoS attacks and 1667 state exhaustion attacks based on fast changes in per flow 1668 signalling. 1670 o the signalling between the PCN-boundary-nodes must be protected 1671 from attacks. For example the recipient needs to validate that 1672 the message is indeed from the node that claims to have sent it. 1673 Possible measures include digest authentication and protection 1674 against replay and man-in-the-middle attacks. For the specific 1675 protocol RSVP, hop-by-hop authentication is in [RFC2747], and 1676 [I-D.behringer-tsvwg-rsvp-security-groupkeying] may also be 1677 useful. 1679 Operational security advice is given in Section 9.5. 1681 12. Conclusions 1683 The document describes a general architecture for flow admission and 1684 termination based on pre-congestion information in order to protect 1685 the quality of service of established inelastic flows within a single 1686 DiffServ domain. The main topic is the functional architecture. It 1687 also mentions other topics like the assumptions and open issues. 1689 13. Acknowledgements 1691 This document is a revised version of [I-D.eardley-pcn-architecture]. 1692 Its authors were: P. Eardley, J. Babiarz, K. Chan, A. Charny, R. 1693 Geib, G. Karagiannis, M. Menth, T. Tsou. They are therefore 1694 contributors to this document. 1696 Thanks to those who have made comments on 1697 [I-D.eardley-pcn-architecture] and on earlier versions of this draft: 1698 Lachlan Andrew, Joe Babiarz, Fred Baker, David Black, Steven Blake, 1699 Bob Briscoe, Jason Canon, Ken Carlberg, Anna Charny, Joachim 1700 Charzinski, Andras Csaszar, Lars Eggert, Ruediger Geib, Wei Gengyu, 1701 Robert Hancock, Fortune Huang, Christian Hublet, Ingemar Johansson, 1702 Georgios Karagiannis, Hein Mekkes, Michael Menth, Toby Moncaster, Ben 1703 Strulo, Tom Taylor, Hannes Tschofenig, Tina Tsou, Lars Westberg, 1704 Magnus Westerlund, Delei Yu. Thanks to Bob Briscoe who extensively 1705 revised the Operations and Management section. 1707 This document is the result of discussions in the PCN WG and 1708 forerunner activity in the TSVWG. A number of previous drafts were 1709 presented to TSVWG: [I-D.chan-pcn-problem-statement], 1710 [I-D.briscoe-tsvwg-cl-architecture], [I-D.briscoe-tsvwg-cl-phb], 1711 [I-D.charny-pcn-single-marking], [I-D.babiarz-pcn-sip-cap], 1712 [I-D.lefaucheur-rsvp-ecn], [I-D.westberg-pcn-load-control]. The 1713 authors of them were: B, Briscoe, P. Eardley, D. Songhurst, F. Le 1714 Faucheur, A. Charny, J. Babiarz, K. Chan, S. Dudley, G. Karagiannis, 1715 A. Bader, L. Westberg, J. Zhang, V. Liatsos, X-G. Liu, A. Bhargava. 1717 14. Comments Solicited 1719 Comments and questions are encouraged and very welcome. They can be 1720 addressed to the IETF PCN working group mailing list . 1722 15. Changes 1724 15.1. Changes from -06 to -07 1726 References re-formatted to pass ID nits. No other changes. 1728 15.2. Changes from -05 to -06 1730 Minor clarifications throughout, the least insignificant are as 1731 follows: 1733 o Section 1: added to the list of encoding states in an 'extended' 1734 scheme: "or perhaps further encoding states as suggested in 1735 draft-westberg-pcn-load-control" 1737 o Section 2: added definition for PCN-colouring (to clarify that the 1738 term is used consistently differently from 'PCN-marking') 1740 o Section 6.1 and 6.2: added "(others might be possible)" before the 1741 list of high level approaches for making flow admission 1742 (termination) decisions. 1744 o Section 6.2: corrected a significant typo in 2nd bullet (more -> 1745 less) 1747 o Section 6.3: corrected a couple of significant typos in Figure 2 1749 o Section 6.5 (PCN-traffic) re-written for clarity. Non PCN-traffic 1750 contributing to PCN meters is now given as an example (there may 1751 be cases where don't need to meter it). 1753 o Section 7.7: added to the text about encapsulation being done 1754 within the PCN-domain: "Note: A tunnel will not provide this 1755 behaviour if it complies with [RFC3168] tunnelling in either mode, 1756 but it will if it complies with [RFC4301] IPSec tunnelling." 1758 o Section 7.7: added mention of [RFC4301] to the text about 1759 decapsulation being done within the PCN-domain. 1761 o Section 8: deleted the text about design goals, since this is 1762 already covered adequately earlier eg in S3. 1764 o Section 11: replaced the last sentence of bullet 1 by "There is 1765 nothing specific to PCN." 1767 o Appendix: added to open issues: possibility of automatically and 1768 periodically probing. 1770 o References: Split out Normative references (RFC2474 & RFC3246). 1772 15.3. Changes from -04 to -05 1774 Minor nits removed as follows: 1776 o Further minor changes to reflect that baseline encoding is 1777 consensus, standards track document, whilst there can be 1778 (experimental track) encoding extensions 1780 o Traffic conditioning updated to reflect discussions in Dublin, 1781 mainly that PCN-interior-nodes don't police PCN-traffic (so 1782 deleted bullet in S7.1) and that it is not advised to have non 1783 PCN-traffic that shares the same capacity (on a link) as PCN- 1784 traffic (so added bullet in S6.5) 1786 o Probing moved into Appendix A and deleted the 'third viewpoint' 1787 (admission control based on the marking of a single packet like an 1788 RSVP PATH message) - since this isn't really probing, and in any 1789 case is already mentioned in S6.1. 1791 o Minor changes to S9 Operations and management - mainly to reflect 1792 that consensus on marking behaviour has simplified things so eg 1793 there are fewer parameters to configure. 1795 o A few terminology-related errors expunged, and two pictures added 1796 to help. 1798 o Re-phrased the claim about the natural decision point in S7.4 1800 o Clarified that extended encoding schemes need to explain their 1801 interactions with (or assumptions about) tunnelling (S7.7) and how 1802 they meet the guidelines of BCP124 (S6.6) 1804 o Corrected the third bullet in S6.2 (to reflect consensus about 1805 PCN-marking) 1807 15.4. Changes from -03 to -04 1809 o Minor changes throughout to reflect the consensus call about PCN- 1810 marking (as reflected in [I-D.eardley-pcn-marking-behaviour]). 1812 o Minor changes throughout to reflect the current decisions about 1813 encoding (as reflected in [I-D.moncaster-pcn-baseline-encoding] 1814 and [I-D.moncaster-pcn-3-state-encoding]). 1816 o Introduction: re-structured to create new sections on Benefits, 1817 Deployment scenarios and Assumptions. 1819 o Introduction: Added pointers to other PCN documents. 1821 o Terminology: changed PCN-lower-rate to PCN-threshold-rate and PCN- 1822 upper-rate to PCN-excess-rate; excess-rate-marking to excess- 1823 traffic-marking. 1825 o Benefits: added bullet about SRLGs. 1827 o Deployment scenarios: new section combining material from various 1828 places within the document. 1830 o S6 (high level functional architecture): re-structured and edited 1831 to improve clarity, and reflect the latest PCN-marking and 1832 encoding drafts. 1834 o S6.4: added claim that the most natural place to make an admission 1835 decision is a PCN-egress-node. 1837 o S6.5: updated the bullet about non-PCN-traffic that uses the same 1838 DSCP as PCN-traffic. 1840 o S6.6: added a section about backwards compatibility with respect 1841 to [RFC4774]. 1843 o Appendix A: added bullet about end-to-end PCN. 1845 o Probing: moved to Appendix B. 1847 o Other minor clarifications, typos etc. 1849 15.5. Changes from -02 to -03 1851 o Abstract: Clarified by removing the term 'aggregated'. Follow-up 1852 clarifications later in draft: S1: expanded PCN-egress-nodes 1853 bullet to mention case where the PCN-feedback-information is about 1854 one (or a few) PCN-marks, rather than aggregated information; S3 1855 clarified PCN-meter; S5 minor changes; conclusion. 1857 o S1: added a paragraph about how the PCN-domain looks to the 1858 outside world (essentially it looks like a DiffServ domain). 1860 o S2: tweaked the PCN-traffic terminology bullet: changed PCN 1861 traffic classes to PCN behaviour aggregates, to be more in line 1862 with traditional DiffServ jargon (-> follow-up changes later in 1863 draft); included a definition of PCN-flows (and corrected a couple 1864 of 'PCN microflows' to 'PCN-flows' later in draft) 1866 o S3.5: added possibility of downgrading to best effort, where PCN- 1867 packets arrive at PCN-ingress-node already ECN marked (CE or ECN 1868 nonce) 1870 o S4: added note about whether talk about PCN operating on an 1871 interface or on a link. In S8.1 (OAM) mentioned that PCN 1872 functionality needs to be configured consistently on either the 1873 ingress or the egress interface of PCN-nodes in a PCN-domain. 1875 o S5.2: clarified that signalling protocol installs flow filter spec 1876 at PCN-ingress-node (& updates after possible re-route) 1878 o S5.6: addressing: clarified 1880 o S5.7: added tunnelling issue of N^2 scaling if you set up a mesh 1881 of tunnels between PCN-boundary-nodes 1883 o S7.3: Clarified the "third viewpoint" of probing (always probe). 1885 o S8.1: clarified that SNMP is only an example; added note that an 1886 operator may be able to not run PCN on some PCN-interior-nodes, if 1887 it knows that these links will never become (pre-)congested; added 1888 note that it may be possible to have different PCN-boundary-node 1889 behaviours for different ingress-egress-aggregates within the same 1890 PCN-domain. 1892 o Appendix: Created an Appendix about "Possible work items beyond 1893 the scope of the current PCN WG Charter". Material moved from 1894 near start of S3 and elsewhere throughout draft. Moved text about 1895 centralised decision node to Appendix. 1897 o Other minor clarifications. 1899 15.6. Changes from -01 to -02 1901 o S1: Benefits: provisioning bullet extended to stress that PCN does 1902 not use RFC2475-style traffic conditioning. 1904 o S1: Deployment models: mentioned, as variant of PCN-domain 1905 extending to end nodes, that may extend to LAN edge switch. 1907 o S3.1: Trust Assumption: added note about not needing PCN-marking 1908 capability if known that an interface cannot become pre-congested. 1910 o S4: now divided into sub-sections 1912 o S4.1: Admission control: added second proposed method for how to 1913 decide to block new flows (PCN-egress-node receives one (or 1914 several) PCN-marked packets). 1916 o S5: Probing sub-section removed. Material now in new S7. 1918 o S5.6: Addressing: clarified how PCN-ingress-node can discover 1919 address of PCN-egress-node 1921 o S5.6: Addressing: centralised node case, added that PCN-ingress- 1922 node may need to know address of PCN-egress-node 1924 o S5.8: Tunnelling: added case of "partially PCN-capable tunnel" and 1925 degraded bullet on this in S6 (Open Issues) 1927 o S7: Probing: new section. Much more comprehensive than old S5.5. 1929 o S8: Operations and Management: substantially revised. 1931 o other minor changes not affecting semantics 1933 15.7. Changes from -00 to -01 1935 In addition to clarifications and nit squashing, the main changes 1936 are: 1938 o S1: Benefits: added one about provisioning (and contrast with 1939 DiffServ SLAs) 1941 o S1: Benefits: clarified that the objective is also to stop PCN- 1942 packets being significantly delayed (previously only mentioned not 1943 dropping packets) 1945 o S1: Deployment models: added one where policing is done at ingress 1946 of access network and not at ingress of PCN-domain (assume trust 1947 between networks) 1949 o S1: Deployment models: corrected MPLS-TE to MPLS 1951 o S2: Terminology: adjusted definition of PCN-domain 1953 o S3.5: Other assumptions: corrected, so that two assumptions (PCN- 1954 nodes not performing ECN and PCN-ingress-node discarding arriving 1955 CE packet) only apply if the PCN WG decides to encode PCN-marking 1956 in the ECN-field. 1958 o S4 & S5: changed PCN-marking algorithm to marking behaviour 1960 o S4: clarified that PCN-interior-node functionality applies for 1961 each outgoing interface, and added clarification: "The 1962 functionality is also done by PCN-ingress-nodes for their outgoing 1963 interfaces (ie those 'inside' the PCN-domain)." 1965 o S4 (near end): altered to say that a PCN-node "should" dedicate 1966 some capacity to lower priority traffic so that it isn't starved 1967 (was "may") 1969 o S5: clarified to say that PCN functionality is done on an 1970 'interface' (rather than on a 'link') 1972 o S5.2: deleted erroneous mention of service level agreement 1974 o S5.5: Probing: re-written, especially to distinguish probing to 1975 test the ingress-egress-aggregate from probing to test a 1976 particular ECMP path. 1978 o S5.7: Addressing: added mention of probing; added that in the case 1979 where traffic is always tunnelled across the PCN-domain, add a 1980 note that he PCN-ingress-node needs to know the address of the 1981 PCN-egress-node. 1983 o S5.8: Tunnelling: re-written, especially to provide a clearer 1984 description of copying on tunnel entry/exit, by adding explanation 1985 (keeping tunnel encaps/decaps and PCN-marking orthogonal), 1986 deleting one bullet ("if the inner header's marking state is more 1987 sever then it is preserved" - shouldn't happen), and better 1988 referencing of other IETF documents. 1990 o S6: Open issues: stressed that "NOTE: Potential solutions are out 1991 of scope for this document" and edited a couple of sentences that 1992 were close to solution space. 1994 o S6: Open issues: added one about scenarios with only one tunnel 1995 endpoint in the PCN domain . 1997 o S6: Open issues: ECMP: added under-admission as another potential 1998 risk 2000 o S6: Open issues: added one about "Silent at start" 2001 o S10: Conclusions: a small conclusions section added 2003 16. Appendix: Possible work items beyond the scope of the current PCN 2004 WG charter 2006 This section mentions some topics that are outside the PCN WG's 2007 current charter, but which have been mentioned as areas of interest. 2008 They might be work items for: the PCN WG after a future re- 2009 chartering; some other IETF WG; another standards body; an operator- 2010 specific usage that is not standardised. 2012 NOTE: it should be crystal clear that this section discusses 2013 possibilities only. 2015 The first set of possibilities relate to the restrictions on scope 2016 imposed by the PCN WG charter (see Section 5): 2018 o a single PCN-domain encompasses several autonomous systems that do 2019 not trust each other, perhaps by using a mechanism like re-ECN, 2020 [I-D.briscoe-re-pcn-border-cheat]. 2022 o not all the nodes run PCN. For example, the PCN-domain is a 2023 multi-site enterprise network. The sites are connected by a VPN 2024 tunnel; although PCN doesn't operate inside the tunnel, the PCN 2025 mechanisms still work properly because the of the good QoS on the 2026 virtual link (the tunnel). Another example is that PCN is 2027 deployed on the general Internet (ie widely but not universally 2028 deployed). 2030 o applying the PCN mechanisms to other types of traffic, ie beyond 2031 inelastic traffic. For instance, applying the PCN mechanisms to 2032 traffic scheduled with the Assured Forwarding per-hop behaviour. 2033 One example could be flow-rate adaptation by elastic applications 2034 that adapt according to the pre-congestion information. 2036 o the aggregation assumption doesn't hold, because the link capacity 2037 is too low. Measurement-based admission control is less accurate, 2038 with a greater risk of over-admission for instance. 2040 o the applicability of PCN mechanisms for emergency use (911, GETS, 2041 WPS, MLPP, etc.) 2043 Other possibilities include: 2045 o Probing. This is discussed in Section 16.1 below. 2047 o The PCN-domain extends to the end users. The scenario is 2048 described in [I-D.babiarz-pcn-sip-cap]. The end users need to be 2049 trusted to do their own policing. This scenario is in the scope 2050 of the PCN WG charter if there is sufficient traffic for the 2051 aggregation assumption to hold. A variant is that the PCN-domain 2052 extends out as far as the LAN edge switch. 2054 o indicating pre-congestion through signalling messages rather than 2055 in-band (in the form of PCN-marked packets) 2057 o the decision-making functionality is at a centralised node rather 2058 than at the PCN-boundary-nodes. This requires that the PCN- 2059 egress-node signals PCN-feedback-information to the centralised 2060 node, and that the centralised node signals to the PCN-ingress- 2061 node the decision about admission (or termination). It may need 2062 the centralised node and the PCN-boundary-nodes to be configured 2063 with each other's addresses. The centralised case is described 2064 further in [I-D.tsou-pcn-racf-applic]. 2066 o Signalling extensions for specific protocols (eg RSVP, NSIS). For 2067 example: the details of how the signalling protocol installs the 2068 flowspec at the PCN-ingress-node for an admitted PCN-flow; and how 2069 the signalling protocol carries the PCN-feedback-information. 2070 Perhaps also for other functions such as: coping with failure of a 2071 PCN-boundary-node ([I-D.briscoe-tsvwg-cl-architecture] considers 2072 what happens if RSVP is the QoS signalling protocol); establishing 2073 a tunnel across the PCN-domain if it is necessary to carry ECN 2074 marks transparently. 2076 o Policing by the PCN-ingress-node may not be needed if the PCN- 2077 domain can trust that the upstream network has already policed the 2078 traffic on its behalf. 2080 o PCN for Pseudowire: PCN may be used as a congestion avoidance 2081 mechanism for edge to edge pseudowire emulations 2082 [I-D.ietf-pwe3-congestion-frmwk]. 2084 o PCN for MPLS: [RFC3270] defines how to support the DiffServ 2085 architecture in MPLS networks (Multi-protocol label switching). 2086 [RFC5129] describes how to add PCN for admission control of 2087 microflows into a set of MPLS aggregates. PCN-marking is done in 2088 MPLS's EXP field (which [I-D.ietf-mpls-cosfield-def] proposes to 2089 re-name to the Class of Service (CoS) field). 2091 o PCN for Ethernet: Similarly, it may be possible to extend PCN into 2092 Ethernet networks, where PCN-marking is done in the Ethernet 2093 header. NOTE: Specific consideration of this extension is outside 2094 the IETF's remit. 2096 16.1. Probing 2098 16.1.1. Introduction 2100 Probing is a potential mechanism to assist admission control. 2102 PCN's admission control, as described so far, is essentially a 2103 reactive mechanism where the PCN-egress-node monitors the pre- 2104 congestion level for traffic from each PCN-ingress-node; if the level 2105 rises then it blocks new flows on that ingress-egress-aggregate. 2106 However, it's possible that an ingress-egress-aggregate carries no 2107 traffic, and so the PCN-egress-node can't make an admission decision 2108 using the usual method described earlier. 2110 One approach is to be "optimistic" and simply admit the new flow. 2111 However it's possible to envisage a scenario where the traffic levels 2112 on other ingress-egress-aggregates are already so high that they're 2113 blocking new PCN-flows, and admitting a new flow onto this 'empty' 2114 ingress-egress-aggregate adds extra traffic onto a link that is 2115 already pre-congested - which may 'tip the balance' so that PCN's 2116 flow termination mechanism is activated or some packets are dropped. 2117 This risk could be lessened by configuring on each link sufficient 2118 'safety margin' above the PCN-threshold-rate. 2120 An alternative approach is to make PCN a more proactive mechanism. 2121 The PCN-ingress-node explicitly determines, before admitting the 2122 prospective new flow, whether the ingress-egress-aggregate can 2123 support it. This can be seen as a "pessimistic" approach, in 2124 contrast to the "optimism" of the approach above. It involves 2125 probing: a PCN-ingress-node generates and sends probe packets in 2126 order to test the pre-congestion level that the flow would 2127 experience. 2129 One possibility is that a probe packet is just a dummy data packet, 2130 generated by the PCN-ingress-node and addressed to the PCN-egress- 2131 node. 2133 16.1.2. Probing functions 2135 The probing functions are: 2137 o Make decision that probing is needed. As described above, this is 2138 when the ingress-egress-aggregate (or the ECMP path - Section 8) 2139 carries no PCN-traffic. An alternative is always to probe, ie 2140 probe before admitting every PCN-flow. 2142 o (if required) Communicate the request that probing is needed - the 2143 PCN-egress-node signals to the PCN-ingress-node that probing is 2144 needed 2146 o (if required) Generate probe traffic - the PCN-ingress-node 2147 generates the probe traffic. The appropriate number (or rate) of 2148 probe packets will depend on the PCN-marking algorithm; for 2149 example an excess-traffic-marking algorithm generates fewer PCN- 2150 marks than a threshold-marking algorithm, and so will need more 2151 probe packets. 2153 o Forward probe packets - as far as PCN-interior-nodes are 2154 concerned, probe packets are handled the same as (ordinary data) 2155 PCN-packets, in terms of routing, scheduling and PCN-marking. 2157 o Consume probe packets - the PCN-egress-node consumes probe packets 2158 to ensure that they don't travel beyond the PCN-domain. 2160 16.1.3. Discussion of rationale for probing, its downsides and open 2161 issues 2163 It is an unresolved question whether probing is really needed, but 2164 two viewpoints have been put forward as to why it is useful. The 2165 first is perhaps the most obvious: there is no PCN-traffic on the 2166 ingress-egress-aggregate. The second assumes that multipath routing 2167 ECMP is running in the PCN-domain. We now consider each in turn. 2169 The first viewpoint assumes the following: 2171 o There is no PCN-traffic on the ingress-egress-aggregate (so a 2172 normal admission decision cannot be made). 2174 o Simply admitting the new flow has a significant risk of leading to 2175 overload: packets dropped or flows terminated. 2177 On the former bullet, [PCN-email-traffic-empty-aggregates] suggests 2178 that, during the future busy hour of a national network with about 2179 100 PCN-boundary-nodes, there are likely to be significant numbers of 2180 aggregates with very few flows under nearly all circumstances. 2182 The latter bullet could occur if new flows start on many of the empty 2183 ingress-egress-aggregates, which together overload a link in the PCN- 2184 domain. To be a problem this would probably have to happen in a 2185 short time period (flash crowd) because, after the reaction time of 2186 the system, other (non-empty) ingress-egress-aggregates that pass 2187 through the link will measure pre-congestion and so block new flows. 2188 Also, flows naturally end anyway. 2190 The downsides of probing for this viewpoint are: 2192 o Probing adds delay to the admission control process. 2194 o Sufficient probing traffic has to be generated to test the pre- 2195 congestion level of the ingress-egress-aggregate. But the probing 2196 traffic itself may cause pre-congestion, causing other PCN-flows 2197 to be blocked or even terminated - and in the flash crowd scenario 2198 there will be probing on many ingress-egress-aggregates. 2200 The second viewpoint applies in the case where there is multipath 2201 routing (ECMP) in the PCN-domain. Note that ECMP is often used on 2202 core networks. There are two possibilities: 2204 (1) If admission control is based on measurements of the ingress- 2205 egress-aggregate, then the viewpoint that probing is useful assumes: 2207 o there's a significant chance that the traffic is unevenly balanced 2208 across the ECMP paths, and hence there's a significant risk of 2209 admitting a flow that should be blocked (because it follows an 2210 ECMP path that is pre-congested) or blocking a flow that should be 2211 admitted. 2213 o Note: [PCN-email-ECMP] suggests unbalanced traffic is quite 2214 possible, even with quite a large number of flows on a PCN-link 2215 (eg 1000) when Assumption 3 (aggregation) is likely to be 2216 satisfied. 2218 (2) If admission control is based on measurements of pre-congestion 2219 on specific ECMP paths, then the viewpoint that probing is useful 2220 assumes: 2222 o There is no PCN-traffic on the ECMP path on which to base an 2223 admission decision. 2225 o Simply admitting the new flow has a significant risk of leading to 2226 overload. 2228 o The PCN-egress-node can match a packet to an ECMP path. 2230 o Note: This is similar to the first viewpoint and so similarly 2231 could occur in a flash crowd if a new flow starts more-or-less 2232 simultaneously on many of the empty ECMP paths. Because there are 2233 several (sometimes many) ECMP paths between each pair of PCN- 2234 boundary-nodes, it's presumably more likely that an ECMP path is 2235 'empty' than an ingress-egress-aggregate is. To constrain the 2236 number of ECMP paths, a few tunnels could be set-up between each 2237 pair of PCN-boundary-nodes. Tunnelling also solves the issue in 2238 the bullet immediately above (which is otherwise hard because an 2239 ECMP routing decision is made independently on each node). 2241 The downsides of probing for this viewpoint are: 2243 o Probing adds delay to the admission control process. 2245 o Sufficient probing traffic has to be generated to test the pre- 2246 congestion level of the ECMP path. But there's the risk that the 2247 probing traffic itself may cause pre-congestion, causing other 2248 PCN-flows to be blocked or even terminated. 2250 o The PCN-egress-node needs to consume the probe packets to ensure 2251 they don't travel beyond the PCN-domain, since they might confuse 2252 the destination end node. This is non-trivial, since probe 2253 packets are addressed to the destination end node, in order to 2254 test the relevant ECMP path (ie they are not addressed to the PCN- 2255 egress-node, unlike the first viewpoint above). 2257 The open issues associated with this viewpoint include: 2259 o What rate and pattern of probe packets does the PCN-ingress-node 2260 need to generate, so that there's enough traffic to make the 2261 admission decision? 2263 o What difficulty does the delay (whilst probing is done), and 2264 possible packet drops, cause applications? 2266 o Can the delay be alleviated by automatically and periodically 2267 probing on the ingress-egress-aggregate? Or does this add too 2268 much overhead? 2270 o Are there other ways of dealing with the flash crowd scenario? 2271 For instance, by limiting the rate at which new flows are 2272 admitted; or perhaps by a PCN-egress-node blocking new flows on 2273 its empty ingress-egress-aggregates when its non-empty ones are 2274 pre-congested. 2276 o (Second viewpoint only) How does the PCN-egress-node disambiguate 2277 probe packets from data packets (so it can consume the former)? 2278 The PCN-egress-node must match the characteristic setting of 2279 particular bits in the probe packet's header or body - but these 2280 bits must not be used by any PCN-interior-node's ECMP algorithm. 2281 In the general case this isn't possible, but it should be possible 2282 for a typical ECMP algorithm (which examines: the source and 2283 destination IP addresses and port numbers, the protocol ID, and 2284 the DSCP). 2286 17. References 2287 17.1. Normative References 2289 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 2290 "Definition of the Differentiated Services Field (DS 2291 Field) in the IPv4 and IPv6 Headers", RFC 2474, 2292 December 1998. 2294 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 2295 J., Courtney, W., Davari, S., Firoiu, V., and D. 2296 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 2297 Behavior)", RFC 3246, March 2002. 2299 17.2. Informative References 2301 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 2302 Services in the Internet Architecture: an Overview", 2303 RFC 1633, June 1994. 2305 [RFC2211] Wroclawski, J., "Specification of the Controlled-Load 2306 Network Element Service", RFC 2211, September 1997. 2308 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 2309 and W. Weiss, "An Architecture for Differentiated 2310 Services", RFC 2475, December 1998. 2312 [RFC2747] Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic 2313 Authentication", RFC 2747, January 2000. 2315 [RFC2983] Black, D., "Differentiated Services and Tunnels", 2316 RFC 2983, October 2000. 2318 [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., 2319 Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. 2320 Felstaine, "A Framework for Integrated Services Operation 2321 over Diffserv Networks", RFC 2998, November 2000. 2323 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 2324 of Explicit Congestion Notification (ECN) to IP", 2325 RFC 3168, September 2001. 2327 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 2328 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 2329 Protocol Label Switching (MPLS) Support of Differentiated 2330 Services", RFC 3270, May 2002. 2332 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 2333 Metric for IP Performance Metrics (IPPM)", RFC 3393, 2334 November 2002. 2336 [RFC3411] Harrington, D., Presuhn, R., and B. Wijnen, "An 2337 Architecture for Describing Simple Network Management 2338 Protocol (SNMP) Management Frameworks", STD 62, RFC 3411, 2339 December 2002. 2341 [RFC4216] Zhang, R. and J. Vasseur, "MPLS Inter-Autonomous System 2342 (AS) Traffic Engineering (TE) Requirements", RFC 4216, 2343 November 2005. 2345 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 2346 Internet Protocol", RFC 4301, December 2005. 2348 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 2349 RFC 4303, December 2005. 2351 [RFC4594] Babiarz, J., Chan, K., and F. Baker, "Configuration 2352 Guidelines for DiffServ Service Classes", RFC 4594, 2353 August 2006. 2355 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 2356 Zekauskas, "A One-way Active Measurement Protocol 2357 (OWAMP)", RFC 4656, September 2006. 2359 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 2360 Explicit Congestion Notification (ECN) Field", BCP 124, 2361 RFC 4774, November 2006. 2363 [RFC4778] Kaeo, M., "Operational Security Current Practices in 2364 Internet Service Provider Environments", RFC 4778, 2365 January 2007. 2367 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 2368 Marking in MPLS", RFC 5129, January 2008. 2370 [P.800] "Methods for subjective determination of transmission 2371 quality", ITU-T Recommendation P.800, August 1996. 2373 [Y.1541] "Network Performance Objectives for IP-based Services", 2374 ITU-T Recommendation Y.1541, February 2006. 2376 [I-D.ietf-mpls-cosfield-def] 2377 Andersson, L., ""EXP field" renamed to "CoS Field"", 2378 draft-ietf-mpls-cosfield-def-04 (work in progress), 2379 July 2008. 2381 [I-D.ietf-pwe3-congestion-frmwk] 2382 Bryant, S., Davie, B., Martini, L., and E. Rosen, 2383 "Pseudowire Congestion Control Framework", 2384 draft-ietf-pwe3-congestion-frmwk-01 (work in progress), 2385 May 2008. 2387 [I-D.babiarz-pcn-sip-cap] 2388 Babiarz, J., "SIP Controlled Admission and Preemption", 2389 draft-babiarz-pcn-sip-cap-00 (work in progress), 2390 October 2006. 2392 [I-D.behringer-tsvwg-rsvp-security-groupkeying] 2393 Behringer, M. and F. Faucheur, "Applicability of Keying 2394 Methods for RSVP Security", 2395 draft-behringer-tsvwg-rsvp-security-groupkeying-01 (work 2396 in progress), November 2007. 2398 [I-D.briscoe-re-pcn-border-cheat] 2399 Briscoe, B., "Emulating Border Flow Policing using Re-PCN 2400 on Bulk Data", draft-briscoe-re-pcn-border-cheat-02 (work 2401 in progress), September 2008. 2403 [I-D.briscoe-tsvwg-cl-architecture] 2404 Briscoe, B., "An edge-to-edge Deployment Model for Pre- 2405 Congestion Notification: Admission Control over a 2406 DiffServ Region", draft-briscoe-tsvwg-cl-architecture-04 2407 (work in progress), October 2006. 2409 [I-D.briscoe-tsvwg-cl-phb] 2410 Briscoe, B., "Pre-Congestion Notification marking", 2411 draft-briscoe-tsvwg-cl-phb-03 (work in progress), 2412 October 2006. 2414 [I-D.briscoe-tsvwg-ecn-tunnel] 2415 Briscoe, B., "Layered Encapsulation of Congestion 2416 Notification", draft-briscoe-tsvwg-ecn-tunnel-01 (work in 2417 progress), July 2008. 2419 [I-D.chan-pcn-problem-statement] 2420 Chan, K., "Pre-Congestion Notification Problem Statement", 2421 draft-chan-pcn-problem-statement-01 (work in progress), 2422 October 2006. 2424 [I-D.charny-pcn-comparison] 2425 Charny, A., "Comparison of Proposed PCN Approaches", 2426 draft-charny-pcn-comparison-00 (work in progress), 2427 November 2007. 2429 [I-D.charny-pcn-single-marking] 2430 Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre- 2431 Congestion Notification Using Single Marking for Admission 2432 and Termination", draft-charny-pcn-single-marking-03 2433 (work in progress), November 2007. 2435 [I-D.eardley-pcn-architecture] 2436 Eardley, P., "Pre-Congestion Notification Architecture", 2437 draft-eardley-pcn-architecture-00 (work in progress), 2438 June 2007. 2440 [I-D.eardley-pcn-marking-behaviour] 2441 Eardley, P., "Marking behaviour of PCN-nodes", 2442 draft-eardley-pcn-marking-behaviour-01 (work in progress), 2443 June 2008. 2445 [I-D.lefaucheur-rsvp-ecn] 2446 Faucheur, F., "RSVP Extensions for Admission Control over 2447 Diffserv using Pre-congestion Notification (PCN)", 2448 draft-lefaucheur-rsvp-ecn-01 (work in progress), 2449 June 2006. 2451 [I-D.menth-pcn-emft] 2452 Menth, M., Lehrieder, F., Eardley, P., Charny, A., and J. 2453 Babiarz, "Edge-Assisted Marked Flow Termination", 2454 draft-menth-pcn-emft-00 (work in progress), February 2008. 2456 [I-D.menth-pcn-psdm-encoding] 2457 Menth, M., Babiarz, J., Moncaster, T., and B. Briscoe, 2458 "PCN Encoding for Packet-Specific Dual Marking (PSDM)", 2459 draft-menth-pcn-psdm-encoding-00 (work in progress), 2460 July 2008. 2462 [I-D.moncaster-pcn-3-state-encoding] 2463 Moncaster, T., Briscoe, B., and M. Menth, "A three state 2464 extended PCN encoding scheme", 2465 draft-moncaster-pcn-3-state-encoding-00 (work in 2466 progress), June 2008. 2468 [I-D.moncaster-pcn-baseline-encoding] 2469 Moncaster, T., Briscoe, B., and M. Menth, "Baseline 2470 Encoding and Transport of Pre-Congestion Information", 2471 draft-moncaster-pcn-baseline-encoding-02 (work in 2472 progress), July 2008. 2474 [I-D.sarker-pcn-ecn-pcn-usecases] 2475 Sarker, Z. and I. Johansson, "Usecases and Benefits of end 2476 to end ECN support in PCN Domains", 2477 draft-sarker-pcn-ecn-pcn-usecases-01 (work in progress), 2478 May 2008. 2480 [I-D.tsou-pcn-racf-applic] 2481 Tsou, T. and T. Taylor, "Applicability Statement for the 2482 Use of Pre-Congestion Notification in a Resource- 2483 Controlled Network", draft-tsou-pcn-racf-applic-00 (work 2484 in progress), February 2008. 2486 [I-D.westberg-pcn-load-control] 2487 Westberg, L., Bhargava, A., Bader, A., Karagiannis, G., 2488 and H. Mekkes, "LC-PCN: The Load Control PCN Solution", 2489 draft-westberg-pcn-load-control-04 (work in progress), 2490 July 2008. 2492 [Hancock] "Slide 14 of 'NSIS: An Outline Framework for QoS 2493 Signalling'", May 2002, . 2496 [Iyer] "An approach to alleviate link overload as observed on an 2497 IP backbone", IEEE INFOCOM , 2003, 2498 . 2500 [Menth] "PCN-Based Resilient Network Admission Control: The Impact 2501 of a Single Bit"", Technical Report , 2007, . 2505 [Menth08] "PCN-Based Admission Control and Flow Termination", 2008, 2506 . 2509 [PCN-email-ECMP] 2510 "Email to PCN WG mailing list", November 2007, . 2513 [PCN-email-SRLG] 2514 "Email to PCN WG mailing list", March 2008, . 2517 [PCN-email-traffic-empty-aggregates] 2518 "Email to PCN WG mailing list", October 2007, . 2521 [Songhurst] 2522 "Guaranteed QoS Synthesis for Admission Control with 2523 Shared Capacity", BT Technical Report TR-CXR9-2006-001, 2524 Feburary 2006, . 2527 [Style] "Guardian Style", Note: This document uses the 2528 abbreviations 'ie' and 'eg' (not 'i.e.' and 'e.g.'), as in 2529 many style guides, eg, 2007, 2530 . 2532 Author's Address 2534 Philip Eardley 2535 BT 2536 B54/77, Sirius House Adastral Park Martlesham Heath 2537 Ipswich, Suffolk IP5 3RE 2538 United Kingdom 2540 Email: philip.eardley@bt.com 2542 Full Copyright Statement 2544 Copyright (C) The IETF Trust (2008). 2546 This document is subject to the rights, licenses and restrictions 2547 contained in BCP 78, and except as set forth therein, the authors 2548 retain all their rights. 2550 This document and the information contained herein are provided on an 2551 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2552 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2553 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2554 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2555 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2556 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2558 Intellectual Property 2560 The IETF takes no position regarding the validity or scope of any 2561 Intellectual Property Rights or other rights that might be claimed to 2562 pertain to the implementation or use of the technology described in 2563 this document or the extent to which any license under such rights 2564 might or might not be available; nor does it represent that it has 2565 made any independent effort to identify any such rights. Information 2566 on the procedures with respect to rights in RFC documents can be 2567 found in BCP 78 and BCP 79. 2569 Copies of IPR disclosures made to the IETF Secretariat and any 2570 assurances of licenses to be made available, or the result of an 2571 attempt made to obtain a general license or permission for the use of 2572 such proprietary rights by implementers or users of this 2573 specification can be obtained from the IETF on-line IPR repository at 2574 http://www.ietf.org/ipr. 2576 The IETF invites any interested party to bring to its attention any 2577 copyrights, patents or patent applications, or other proprietary 2578 rights that may cover technology that may be required to implement 2579 this standard. Please address the information to the IETF at 2580 ietf-ipr@ietf.org. 2582 Acknowledgment 2584 Funding for the RFC Editor function is provided by the IETF 2585 Administrative Support Activity (IASA).