idnits 2.17.1 draft-ietf-pcn-architecture-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2488. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2499. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2506. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2512. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 825: '...fServ Codepoints SHOULD be chosen that...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 7, 2008) is 5731 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-ietf-pwe3-congestion-frmwk-01 -- No information found for draft-briscoe-tsvwg-ecn-tunnel - is the name correct? == Outdated reference: A later version (-05) exists of draft-westberg-pcn-load-control-04 == Outdated reference: A later version (-03) exists of draft-briscoe-re-pcn-border-cheat-01 == Outdated reference: A later version (-01) exists of draft-moncaster-pcn-3-state-encoding-00 == Outdated reference: A later version (-01) exists of draft-tsou-pcn-racf-applic-00 == Outdated reference: A later version (-02) exists of draft-sarker-pcn-ecn-pcn-usecases-01 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion and Pre-Congestion Philip. Eardley (Editor) 3 Notification Working Group BT 4 Internet-Draft August 7, 2008 5 Intended status: Informational 6 Expires: February 8, 2009 8 Pre-Congestion Notification Architecture 9 draft-ietf-pcn-architecture-05 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on February 8, 2009. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 The purpose of this document is to describe a general architecture 43 for flow admission and termination based on pre-congestion 44 information in order to protect the quality of service of established 45 inelastic flows within a single DiffServ domain. 47 Status 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 3. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 54 4. Deployment scenarios . . . . . . . . . . . . . . . . . . . . . 8 55 5. Assumptions and constraints on scope . . . . . . . . . . . . . 11 56 5.1. Assumption 1: Trust and support of PCN - controlled 57 environment . . . . . . . . . . . . . . . . . . . . . . . 11 58 5.2. Assumption 2: Real-time applications . . . . . . . . . . . 12 59 5.3. Assumption 3: Many flows and additional load . . . . . . . 12 60 5.4. Assumption 4: Emergency use out of scope . . . . . . . . . 13 61 6. High-level functional architecture . . . . . . . . . . . . . . 13 62 6.1. Flow admission . . . . . . . . . . . . . . . . . . . . . . 15 63 6.2. Flow termination . . . . . . . . . . . . . . . . . . . . . 16 64 6.3. Flow admission and/or flow termination when there are 65 only two PCN encoding states . . . . . . . . . . . . . . . 17 66 6.4. Information transport . . . . . . . . . . . . . . . . . . 18 67 6.5. PCN-traffic . . . . . . . . . . . . . . . . . . . . . . . 18 68 6.6. Backwards compatibility . . . . . . . . . . . . . . . . . 19 69 7. Detailed Functional architecture . . . . . . . . . . . . . . . 20 70 7.1. PCN-interior-node functions . . . . . . . . . . . . . . . 21 71 7.2. PCN-ingress-node functions . . . . . . . . . . . . . . . . 21 72 7.3. PCN-egress-node functions . . . . . . . . . . . . . . . . 22 73 7.4. Admission control functions . . . . . . . . . . . . . . . 22 74 7.5. Flow termination functions . . . . . . . . . . . . . . . . 23 75 7.6. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 24 76 7.7. Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . 24 77 7.8. Fault handling . . . . . . . . . . . . . . . . . . . . . . 26 78 8. Design goals and challenges . . . . . . . . . . . . . . . . . 26 79 9. Operations and Management . . . . . . . . . . . . . . . . . . 29 80 9.1. Configuration OAM . . . . . . . . . . . . . . . . . . . . 29 81 9.1.1. System options . . . . . . . . . . . . . . . . . . . . 30 82 9.1.2. Parameters . . . . . . . . . . . . . . . . . . . . . . 31 83 9.2. Performance & Provisioning OAM . . . . . . . . . . . . . . 33 84 9.3. Accounting OAM . . . . . . . . . . . . . . . . . . . . . . 34 85 9.4. Fault OAM . . . . . . . . . . . . . . . . . . . . . . . . 34 86 9.5. Security OAM . . . . . . . . . . . . . . . . . . . . . . . 35 87 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 88 11. Security considerations . . . . . . . . . . . . . . . . . . . 36 89 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 90 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 91 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38 92 15. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 93 15.1. Changes from -04 to -05 . . . . . . . . . . . . . . . . . 38 94 15.2. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 39 95 15.3. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 40 96 15.4. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 41 97 15.5. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 41 98 16. Appendix: Possible work items beyond the scope of the 99 current PCN WG Charter . . . . . . . . . . . . . . . . . . . . 43 100 16.1. Probing . . . . . . . . . . . . . . . . . . . . . . . . . 45 101 16.1.1. Introduction . . . . . . . . . . . . . . . . . . . . . 45 102 16.1.2. Probing functions . . . . . . . . . . . . . . . . . . 46 103 16.1.3. Discussion of rationale for probing, its downsides 104 and open issues . . . . . . . . . . . . . . . . . . . 46 105 17. Informative References . . . . . . . . . . . . . . . . . . . . 49 106 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 53 107 Intellectual Property and Copyright Statements . . . . . . . . . . 54 109 1. Introduction 111 The purpose of this document is to describe a general architecture 112 for flow admission and termination based on (pre-) congestion 113 information in order to protect the quality of service of flows 114 within a DiffServ domain [RFC2475]. This document defines an 115 architecture for implementing two mechanisms to protect the quality 116 of service of established inelastic flows within a single DiffServ 117 domain, where all boundary and interior nodes are PCN-enabled and 118 trust each other for correct PCN operation. Flow admission control 119 determines whether a new flow should be admitted, in order to protect 120 the QoS of existing PCN-flows in normal circumstances. However, in 121 abnormal circumstances, for instance a disaster affecting multiple 122 nodes and causing traffic re-routes, then the QoS on existing PCN- 123 flows may degrade even though care was exercised when admitting those 124 flows. Therefore we also propose a mechanism for flow termination, 125 which removes enough traffic in order to protect the QoS of the 126 remaining PCN-flows. 128 As a fundamental building block to enable these two mechanisms, PCN- 129 interior-nodes generate, encode and transport pre-congestion 130 information towards the PCN-egress-nodes. Two rates, a PCN- 131 threshold-rate and a PCN-excess-rate, are associated with each link 132 of the PCN-domain. Each rate is used by a marking behaviour that 133 determines how and when PCN-packets are marked, and how the markings 134 are encoded in packet headers. Overall the aim is to enable PCN- 135 nodes to give an "early warning" of potential congestion before there 136 is any significant build-up of PCN-packets in the queue. 138 PCN-boundary-nodes convert measurements of these PCN-markings into 139 decisions about flow admission and termination. In a PCN-domain with 140 both threshold marking and excess traffic marking enabled, then the 141 admission control mechanism limits the PCN-traffic on each link to 142 *roughly* its PCN-threshold-rate and the flow termination mechanism 143 limits the PCN-traffic on each link to *roughly* its PCN-excess-rate. 144 Other scenarios are discussed later. 146 This document describes the PCN architecture and outlines some 147 benefits, deployment scenarios, assumptions and terminology for PCN. 148 The behaviour of PCN-interior-nodes is standardised in two standards 149 track documents, which are summarised in this 150 document.[I-D.eardley-pcn-marking-behaviour] standardises the two 151 marking behaviours of PCN-nodes: threshold marking and excess traffic 152 marking. Threshold marking marks all PCN-packets if the PCN traffic 153 rate is greater than a first configured rate, "PCN-threshold-rate". 154 Excess traffic marking marks a proportion of PCN-packets, such that 155 the amount marked equals the traffic rate in excess of a second 156 configured rate, "PCN-excess-rate". The "baseline" encoding is 157 standardised in [I-D.moncaster-pcn-baseline-encoding], which 158 standardises two PCN encoding states (PCN-marked and not PCN-marked), 159 whilst (experimental) extensions to the baseline encoding can provide 160 three encoding states (threshold-marked, excess-traffic-marked, not 161 PCN-marked). PCN encoding uses a combination of the DSCP field and 162 ECN field in the IP header to indicate that a packet is a PCN-packet 163 and whether it is PCN-marked. PCN therefore defines semantics for 164 the ECN field different from the default semantics of [RFC3168]; 165 PCN's encoding has been chosen to meet the guidelines of BCP124, 166 [RFC4774]. The behaviour of PCN-boundary-nodes is described in 167 Informational documents. Several possibilities are outlined in this 168 document; detailed descriptions and comparisons are in 169 [I-D.charny-pcn-comparison] and [Menth08]. 171 2. Terminology 173 o PCN-domain: a PCN-capable domain; a contiguous set of PCN-enabled 174 nodes that perform DiffServ scheduling; the complete set of PCN- 175 nodes whose PCN-marking can in principle influence decisions about 176 flow admission and termination for the PCN-domain, including the 177 PCN-egress-nodes which measure these PCN-marks. 179 o PCN-boundary-node: a PCN-node that connects one PCN-domain to a 180 node either in another PCN-domain or in a non PCN-domain. 182 o PCN-interior-node: a node in a PCN-domain that is not a PCN- 183 boundary-node. 185 o PCN-node: a PCN-boundary-node or a PCN-interior-node 187 o PCN-egress-node: a PCN-boundary-node in its role in handling 188 traffic as it leaves a PCN-domain. 190 o PCN-ingress-node: a PCN-boundary-node in its role in handling 191 traffic as it enters a PCN-domain. 193 o PCN-traffic, PCN-packets, PCN-BA: a PCN-domain carries traffic of 194 different DiffServ behaviour aggregates (BAs) [RFC2475]. The 195 PCN-BA uses the PCN mechanisms to carry PCN-traffic and the 196 corresponding packets are PCN-packets. The same network will 197 carry traffic of other DiffServ BAs. The PCN-BA is distinguished 198 by a combination of the DiffServ codepoint (DSCP) and ECN fields. 200 o PCN-flow: the unit of PCN-traffic that the PCN-boundary-node 201 admits (or terminates); the unit could be a single microflow (as 202 defined in [RFC2475]) or some identifiable collection of 203 microflows. 205 o Ingress-egress-aggregate: The collection of PCN-packets from all 206 PCN-flows that travel in one direction between a specific pair of 207 PCN-boundary-nodes. 209 o PCN-threshold-rate: a reference rate configured for each link in 210 the PCN-domain, which is lower than the PCN-excess-rate. It is 211 used by a marking behaviour that determines whether a packet 212 should be PCN-marked with a first encoding, "threshold-marked". 214 o PCN-excess-rate: a reference rate configured for each link in the 215 PCN-domain, which is higher than the PCN-threshold-rate. It is 216 used by a marking behaviour that determines whether a packet 217 should be PCN-marked with a second encoding, "excess-traffic- 218 marked". 220 o Threshold-marking: a PCN-marking behaviour with the objective that 221 all PCN-traffic is marked if the PCN-traffic exceeds the PCN- 222 threshold-rate. 224 o Excess-traffic-marking: a PCN-marking behaviour with the objective 225 that the amount of PCN-traffic that is PCN-marked is equal to the 226 amount that exceeds the PCN-excess-rate. 228 o Pre-congestion: a condition of a link within a PCN-domain in which 229 the PCN-node performs PCN-marking, in order to provide an "early 230 warning" of potential congestion before there is any significant 231 build-up of PCN-packets in the real queue. (Hence, by analogy 232 with ECN we call our mechanism Pre-Congestion Notification.) 234 o PCN-marking: the process of setting the header in a PCN-packet 235 based on defined rules, in reaction to pre-congestion; either 236 threshold-marking or excess-traffic-marking. 238 o PCN-feedback-information: information signalled by a PCN-egress- 239 node to a PCN-ingress-node or central control node, which is 240 needed for the flow admission and flow termination mechanisms. 242 3. Benefits 244 We believe that the key benefits of the PCN mechanisms described in 245 this document are that they are simple, scalable, and robust because: 247 o Per flow state is only required at the PCN-ingress-nodes 248 ("stateless core"). This is required for policing purposes (to 249 prevent non-admitted PCN traffic from entering the PCN-domain) and 250 so on. It is not generally required that other network entities 251 are aware of individual flows (although they may be in particular 252 deployment scenarios). 254 o Admission control is resilient: PCN's QoS is decoupled from the 255 routing system; hence in general admitted flows can survive 256 capacity, routing or topology changes without additional 257 signalling, and they don't have to be told (or learn) about such 258 changes. The PCN-threshold-rate on each link can be chosen small 259 enough that admitted traffic can still be carried after a 260 rerouting in most failure cases [Menth]. This is an important 261 feature as QoS violations in core networks due to link failures 262 are more likely than QoS violations due to increased traffic 263 volume [Iyer]. 265 o The PCN-marking behaviours only operate on the overall PCN-traffic 266 on the link, not per flow. 268 o The information of these measurements is signalled to the PCN- 269 egress-nodes by the PCN-marks in the packet headers, ie "in-band". 270 No additional signalling protocol is required for transporting the 271 PCN-marks. Therefore no secure binding is required between data 272 packets and separate congestion messages. 274 o The PCN-egress-nodes make separate measurements, operating on the 275 aggregate PCN-traffic from each PCN-ingress-node, ie not per flow. 276 Similarly, signalling by the PCN-egress-node of PCN-feedback- 277 information (which is used for flow admission and termination 278 decisions) is at the granularity of the ingress-egress-aggregate. 279 An alternative approach is that the PCN-egress-nodes monitor the 280 PCN-traffic and signal PCN-feedback-information (which is used for 281 flow admission and termination decisions) at the granularity of 282 one (or a few) PCN-marks. 284 o The admitted PCN-load is controlled dynamically. Therefore it 285 adapts as the traffic matrix changes, and also if the network 286 topology changes (eg after a link failure). Hence an operator can 287 be less conservative when deploying network capacity, and less 288 accurate in their prediction of the PCN-traffic matrix. 290 o The termination mechanism complements admission control. It 291 allows the network to recover from sudden unexpected surges of 292 PCN-traffic on some links, thus restoring QoS to the remaining 293 flows. Such scenarios are expected to be rare but not impossible. 294 They can be caused by large network failures that redirect lots of 295 admitted PCN-traffic to other links, or by malfunction of the 296 measurement-based admission control in the presence of admitted 297 flows that send for a while with an atypically low rate and then 298 increase their rates in a correlated way. 300 o Flow termination can also enable an operator to be less 301 conservative when deploying network capacity. It is an 302 alternative to running links at low utilisation in order to 303 protect against link or node failures. This is especially the 304 case with SRLGs (shared risk link groups, which are links that 305 share a resource, such as a fibre, whose failure affects all those 306 links [RFC4216]). A requirement to fully protect traffic against 307 a single SRLG failure requires low utilisation (~10%) of the link 308 bandwidth on some links before failure [PCN-email-SRLG]. 310 o The PCN-excess-rate may be set below the maximum rate that PCN- 311 traffic can be transmitted on a link, in order to trigger 312 termination of some PCN-flows before loss (or excessive delay) of 313 PCN-packets occurs, or to keep the maximum PCN-load on a link 314 below a level configured by the operator. 316 o Provisioning of the network is decoupled from the process of 317 adding new customers. By contrast, with the DiffServ architecture 318 [RFC2475] operators rely on subscription-time Service Level 319 Agreements that statically define the parameters of the traffic 320 that will be accepted from a customer, and so the operator has to 321 run the provisioning process each time a new customer is added to 322 check that the Service Level Agreement can be fulfilled. A PCN- 323 domain doesn't need such traffic conditioning. 325 4. Deployment scenarios 327 Operators of networks will want to use the PCN mechanisms in various 328 arrangements, for instance depending on how they are performing 329 admission control outside the PCN-domain (users after all are 330 concerned about QoS end-to-end), what their particular goals and 331 assumptions are, how many PCN encoding states are available, and so 332 on. 334 From the perspective of the outside world, a PCN-domain essentially 335 looks like a DiffServ domain. PCN-traffic is either transported 336 across it transparently or policed at the PCN-ingress-node (ie 337 dropped or carried at a lower QoS). A couple of differences are 338 that: PCN-traffic has better QoS guarantees than normal DiffServ 339 traffic (because PCN's mechanisms better protect the QoS of admitted 340 flows); and in rare circumstances (failures), on the one hand some 341 PCN-flows may get terminated, but on the other hand other flows will 342 get their QoS restored. Non PCN-traffic is treated transparently, ie 343 the PCN-domain is a normal DiffServ domain. 345 An operator may choose to deploy either admission control or flow 346 termination or both. Although designed to work together, they are 347 independent mechanisms, and the use of one does not require or 348 prevent the use of the other. 350 A PCN-domain may have three encoding states (or pedantically, an 351 operator may choose to use up three encoding states for PCN): not 352 PCN-marked, threshold-marked, excess-traffic-marked. Then both PCN 353 admission control and flow termination can be supported. As 354 illustrated in Figure 1, admission control accepts new flows until 355 the PCN-traffic rate on the bottleneck link rises above the PCN- 356 threshold-rate, whilst if necessary the flow termination mechanism 357 terminates flows down to the PCN-excess-rate on the bottleneck link. 359 ==Marking behaviour== ==PCN mechanisms== 360 Rate of ^ 361 PCN-traffic on | 362 bottleneck link | (as below and also) 363 | (as below) Drop some PCN-pkts 364 | 365 scheduler rate -| -------------------------------------------------- 366 (for PCN-traffic)| 367 | Some pkts Terminate some 368 | excess-traffic-marked admitted flows 369 | & & 370 | Rest of pkts Block new flows 371 | threshold-marked 372 | 373 PCN-excess-rate -|--------------------------------------------------- 374 | 375 | All pkts Block new flows 376 | threshold-marked 377 | 378 PCN-threshold-rate -|--------------------------------------------------- 379 | 380 | No pkts Admit new flows 381 | PCN-marked 382 | 384 Figure 1: Schematic of how PCN's admission control and flow 385 termination mechanisms kick in as the rate of PCN-traffic increases, 386 for a PCN-domain with three encoding states. 388 On the other hand, a PCN-domain may have two encoding states (as in 389 [I-D.moncaster-pcn-baseline-encoding]) (or pedantically, an operator 390 may choose to use up two encoding states for PCN): not PCN-marked, 391 PCN-marked. Then there are three possibilities, as discussed in the 392 following paragraphs (see also Section 6.3). 394 First, an operator could use just PCN's admission control, solving 395 heavy congestion (caused by re-routing) by 'just waiting' - as 396 sessions end, PCN-traffic naturally reduces, and meanwhile the 397 admission control mechanism will prevent admission of new flows that 398 use the affected links. So the PCN-domain will naturally return to 399 normal operation, but with reduced capacity. The drawback of this 400 approach would be that until PCN-traffic naturally departs to relieve 401 the congestion, all PCN-flows as well as lower priority services will 402 be adversely affected. 404 Second, an operator could just rely for admission control on 405 statically provisioned capacity per PCN-ingress-node (regardless of 406 the PCN-egress-node of a flow), as is typical in the hose model of 407 the DiffServ architecture [RFC2475]. Such traffic conditioning 408 agreements can lead to focused overload: many flows happen to focus 409 on a particular link and then all flows through the congested link 410 fail catastrophically. PCN's flow termination mechanism could then 411 be used to counteract such a problem. 413 Third, both admission control and flow termination can be triggered 414 from the single type of PCN-marking; the main downside is that 415 admission control is less accurate. 417 Within the PCN-domain there is some flexibility about how the 418 decision making functionality is distributed. These possibilities 419 are outlined in Section 7.4 and also discussed elsewhere, such as in 420 [Menth08]. 422 The flow admission and termination decisions need to be enforced 423 through per-flow policing by the PCN-ingress-nodes. If there are 424 several PCN-domains on the end-to-end path then each needs to police 425 at its PCN-ingress-nodes. One exception is if the operator runs both 426 the access network (not a PCN-domain) and the core network (a PCN- 427 domain); per flow policing could be devolved to the access network 428 and not done at the PCN-ingress-node. Note: to aid readability, the 429 rest of this draft assumes that policing is done by the PCN-ingress- 430 nodes. 432 PCN admission control has to fit with the overall approach to 433 admission control. For instance [I-D.briscoe-tsvwg-cl-architecture] 434 describes the case where RSVP signalling runs end-to-end. The PCN- 435 domain is a single RSVP hop, ie only the PCN-boundary-nodes process 436 RSVP messages, with RSVP messages processed on each hop outside the 437 PCN-domain, as in IntServ over DiffServ [RFC2998]. It would also be 438 possible for the RSVP signalling to be originated and/or terminated 439 by proxies, with application-layer signalling between the end user 440 and the proxy (eg SIP signalling with a home hub). A similar example 441 would use NSIS signalling is used instead of RSVP. 443 It is possible that a user wants its inelastic traffic to use the PCN 444 mechanisms but also react to ECN marking outside the PCN-domain 445 [I-D.sarker-pcn-ecn-pcn-usecases]. Two possible ways to do this are 446 to tunnel all PCN-packets across the PCN-domain, so that the ECN 447 marks are carried transparently across the PCN-domain, or to use an 448 encoding like [I-D.moncaster-pcn-3-state-encoding]. Tunnelling is 449 discussed further in Section 7.7. 451 Some possible deployment models that are outside the current PCN WG 452 Charter are outlined in the Appendix. 454 5. Assumptions and constraints on scope 456 The scope of PCN is, at least initially (see Appendix), restricted by 457 the following assumptions: 459 1. these components are deployed in a single DiffServ domain, within 460 which all PCN-nodes are PCN-enabled and trust each other for 461 truthful PCN-marking and transport 463 2. all flows handled by these mechanisms are inelastic and 464 constrained to a known peak rate through policing or shaping 466 3. the number of PCN-flows across any potential bottleneck link is 467 sufficiently large that stateless, statistical mechanisms can be 468 effective. To put it another way, the aggregate bit rate of PCN- 469 traffic across any potential bottleneck link needs to be 470 sufficiently large relative to the maximum additional bit rate 471 added by one flow. This is the basic assumption of measurement- 472 based admission control. 474 4. PCN-flows may have different precedence, but the applicability of 475 the PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc.) 476 is out of scope. 478 5.1. Assumption 1: Trust and support of PCN - controlled environment 480 We assume that the PCN-domain is a controlled environment, ie all the 481 nodes in a PCN-domain run PCN and trust each other. There are 482 several reasons for proposing this assumption: 484 o The PCN-domain has to be encircled by a ring of PCN-boundary- 485 nodes, otherwise traffic could enter a PCN BA without being 486 subject to admission control, which would potentially degrade the 487 QoS of existing PCN-flows. 489 o Similarly, a PCN-boundary-node has to trust that all the PCN-nodes 490 mark PCN-traffic consistently. A node not doing PCN-marking 491 wouldn't be able to alert when it suffered pre-congestion, which 492 potentially would lead to too many PCN-flows being admitted (or 493 too few being terminated). Worse, a rogue node could perform 494 various attacks, as discussed in the Security Considerations 495 section. 497 One way of assuring the above two points is that the entire PCN- 498 domain is run by a single operator. Another possibility is that 499 there are several operators but they trust each other to a sufficient 500 level, in their handling of PCN-traffic. 502 Note: All PCN-nodes need to be trustworthy. However if it's known 503 that an interface cannot become pre-congested then it's not strictly 504 necessary for it to be capable of PCN-marking. But this must be 505 known even in unusual circumstances, eg after the failure of some 506 links. 508 5.2. Assumption 2: Real-time applications 510 We assume that any variation of source bit rate is independent of the 511 level of pre-congestion. We assume that PCN-packets come from real 512 time applications generating inelastic traffic, ie it sends packets 513 at the rate the codec produces them, regardless of the availability 514 of capacity [RFC4594]. For example, voice and video requiring low 515 delay, jitter and packet loss, the Controlled Load Service, 516 [RFC2211], and the Telephony service class, [RFC4594]. This 517 assumption is to help focus the effort where it looks like PCN would 518 be most useful, ie the sorts of applications where per flow QoS is a 519 known requirement. In other words we focus on PCN providing a 520 benefit to inelastic traffic (PCN may or may not provide a benefit to 521 other types of traffic). 523 As a consequence, it is assumed that PCN-marking is being applied to 524 traffic scheduled with the expedited forwarding per-hop behaviour, 525 [RFC3246], or a per-hop behaviour with similar characteristics. 527 5.3. Assumption 3: Many flows and additional load 529 We assume that there are many PCN-flows on any bottleneck link in the 530 PCN-domain (or, to put it another way, the aggregate bit rate of PCN- 531 traffic across any potential bottleneck link is sufficiently large 532 relative to the maximum additional bit rate added by one PCN-flow). 533 Measurement-based admission control assumes that the present is a 534 reasonable prediction of the future: the network conditions are 535 measured at the time of a new flow request, however the actual 536 network performance must be OK during the call some time later. One 537 issue is that if there are only a few variable rate flows, then the 538 aggregate traffic level may vary a lot, perhaps enough to cause some 539 packets to get dropped. If there are many flows then the aggregate 540 traffic level should be statistically smoothed. How many flows is 541 enough depends on a number of things such as the variation in each 542 flow's rate, the total rate of PCN-traffic, and the size of the 543 "safety margin" between the traffic level at which we start 544 admission-marking and at which packets are dropped or significantly 545 delayed. 547 We do not make explicit assumptions on how many PCN-flows are in each 548 ingress-egress-aggregate. Performance evaluation work may clarify 549 whether it is necessary to make any additional assumption on 550 aggregation at the ingress-egress-aggregate level. 552 5.4. Assumption 4: Emergency use out of scope 554 PCN-flows may have different precedence, but the applicability of the 555 PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc) is out 556 of scope for consideration by the PCN WG. 558 6. High-level functional architecture 560 The high-level approach is to split functionality between: 562 o PCN-interior-nodes 'inside' the PCN-domain, which monitor their 563 own state of pre-congestion and mark PCN-packets if appropriate. 564 They are not flow-aware, nor aware of ingress-egress-aggregates. 565 The functionality is also done by PCN-ingress-nodes for their 566 outgoing interfaces (ie those 'inside' the PCN-domain). 568 o PCN-boundary-nodes at the edge of the PCN-domain, which control 569 admission of new PCN-flows and termination of existing PCN-flows, 570 based on information from PCN-interior-nodes. This information is 571 in the form of the PCN-marked data packets (which are intercepted 572 by the PCN-egress-nodes) and not signalling messages. Generally 573 PCN-ingress-nodes are flow-aware. 575 The aim of this split is to keep the bulk of the network simple, 576 scalable and robust, whilst confining policy, application-level and 577 security interactions to the edge of the PCN-domain. For example the 578 lack of flow awareness means that the PCN-interior-nodes don't care 579 about the flow information associated with the PCN-packets that they 580 carry, nor do the PCN-boundary-nodes care about which PCN-interior- 581 nodes its flows traverse. The objective is to standardise PCN- 582 marking behaviour, but potentially produce more than one 583 (informational) RFC describing how PCN-boundary-nodes react to PCN- 584 marks. 586 In order to generate information about the current state of the PCN- 587 domain, each PCN-node PCN-marks packets if it is "pre-congested". 588 Exactly when a PCN-node decides if it is "pre-congested" (the 589 algorithm) and exactly how packets are "PCN-marked" (the encoding) 590 are defined in separate standards-track documents, but at a high 591 level it is as follows: 593 o the algorithms: a PCN-node meters the amount of PCN-traffic on 594 each one of its outgoing (or incoming) links. The measurement is 595 made as an aggregate of all PCN-packets, and not per flow. There 596 are two algorithms, one for threshold-marking and one for excess- 597 traffic-marking. 599 o the encoding(s): a PCN-node PCN-marks a PCN-packet by setting a 600 combination of the DSCP and ECN fields. In the "baseline" 601 encoding [I-D.moncaster-pcn-baseline-encoding], the ECN field is 602 set to 11 and the DSCP is not altered. Extension encodings may be 603 defined that (at most) use a second DSCP (eg as in 604 [I-D.moncaster-pcn-3-state-encoding]) and/or set the ECN field to 605 values other than 11 (eg as in [I-D.menth-pcn-psdm-encoding]). 607 In the "baseline" encoding [I-D.moncaster-pcn-baseline-encoding], the 608 ECN field is set to 11 and the DSCP is not altered. Extension 609 encodings may be defined that (at most) use a second DSCP (eg as in 610 [I-D.moncaster-pcn-3-state-encoding]) and/or set the ECN field to 611 values other than 11 (eg as in [I-D.menth-pcn-psdm-encoding]). 613 In a PCN-domain the operator may have two or three encoding states 614 available. The baseline encoding provides two encoding states (not 615 PCN-marked, PCN-marked), whilst extended encodings can provide three 616 encoding states (not PCN-marked, threshold-marked, excess-traffic- 617 marked). 619 The PCN-boundary-nodes monitor the PCN-marked packets in order to 620 extract information about the current state of the PCN-domain. Based 621 on this monitoring, a decision is made about whether to admit a 622 prospective new flow or whether to terminate existing flow(s). 624 PCN-marking needs to be configured on all links in the PCN-domain to 625 ensure that the PCN mechanisms protect all links. The actual 626 functionality can be configured on the outgoing or incoming 627 interfaces of PCN-nodes - or one algorithm could be configured on the 628 outgoing interface and the other on the incoming interface. The 629 important thing is that a consistent choice is made across the PCN- 630 domain to ensure that the PCN mechanisms protect all links. See 631 [I-D.eardley-pcn-marking-behaviour] for further discussion. 633 The objective of the threshold-marking algorithm is to threshold-mark 634 all PCN-packets whenever the rate of PCN-packets is greater than some 635 configured rate, the PCN-threshold-rate. The objective of the 636 excess-traffic-marking algorithm is to excess-traffic-mark PCN- 637 packets at a rate equal to the difference between the bit rate of 638 PCN-packets and some configured rate, the PCN-excess-rate. Note that 639 this description reflects the overall intent of the algorithm rather 640 than its instantaneous behaviour, since the rate measured at a 641 particular moment depends on the detailed algorithm, its 642 implementation and the traffic's variance as well as its rate (eg 643 marking may well continue after a recent overload even after the 644 instantaneous rate has dropped). The algorithms are specified in 645 [I-D.eardley-pcn-marking-behaviour]. 647 In a PCN-domain the operator may have two or three encoding states 648 available. In both cases the ECN field is set to 11 to indicate PCN- 649 marking. In the former case, one DSCP is used. In the latter case a 650 second DSCP is used, which allows distinct threshold-marks and 651 excess-traffic-marks. The encoding is specified in 652 [I-D.moncaster-pcn-baseline-encoding] and 653 [I-D.moncaster-pcn-3-state-encoding]. 655 All the various admission and termination approaches are detailed and 656 compared in [I-D.charny-pcn-comparison] and [Menth08]. The 657 discussion below is just a brief summary. It initially assumes there 658 are three encoding states available. 660 6.1. Flow admission 662 The objective of PCN's flow admission control mechanism is to limit 663 the PCN-traffic on each link in the PCN-domain to *roughly* its PCN- 664 threshold-rate, by admitting or blocking prospective new flows, in 665 order to protect the QoS of existing PCN-flows. The PCN-threshold- 666 rate is a parameter that can be configured by the operator and will 667 be set lower than the traffic rate at which the link becomes 668 congested and the node drops packets. 670 Exactly how the admission control decision is made will be defined 671 separately in informational documents. At a high level two 672 approaches are proposed: 674 o the PCN-egress-node measures (possibly as a moving average) the 675 fraction of the PCN-traffic that is threshold-marked. The 676 fraction is measured for a specific ingress-egress-aggregate. If 677 the fraction is below a threshold value then the new flow is 678 admitted, and if the fraction is above the threshold value then it 679 is blocked. In [I-D.eardley-pcn-architecture] the fraction is 680 measured as an EWMA (exponentially weighted moving average) and 681 termed the "congestion level estimate". 683 o the PCN-egress-node monitors PCN-traffic and if it receives one 684 (or several) threshold-marked packets, then the new flow is 685 blocked, otherwise it is admitted. One possibility is to react to 686 the marking state of an initial flow set-up packet (eg RSVP PATH). 687 Another is that after one (or several) threshold-marks then all 688 flows are blocked until after a specific period of no congestion. 690 Note that the admission control decision is made for a particular 691 pair of PCN-boundary-nodes. So it is quite possible for a new flow 692 to be admitted between one pair of PCN-boundary-nodes, whilst at the 693 same time another admission request is blocked between a different 694 pair of PCN-boundary-nodes. 696 6.2. Flow termination 698 The objective of PCN's flow termination mechanism is to limit the 699 PCN-traffic on each link to *roughly* its PCN-excess-rate, by 700 terminating some existing PCN-flows, in order to protect the QoS of 701 the remaining PCN-flows. The PCN-excess-rate is a parameter that can 702 be configured by the operator and may be set lower than the traffic 703 rate at which the link becomes congested and the node drops packets. 705 Exactly how the flow termination decision is made will be defined 706 separately in informational documents. At a high level several 707 approaches are proposed: 709 o In one approach the PCN-egress-node measures the rate of PCN- 710 traffic that is not excess-traffic-marked, which is the amount of 711 PCN-traffic that can actually be supported. Also the PCN-ingress- 712 node measures the rate of PCN-traffic that is destined for this 713 specific PCN-egress-node, and hence it can calculate the excess 714 amount that should be terminated. 716 o Another approach instead measures the rate of excess-traffic- 717 marked traffic and terminates this amount of traffic. This 718 terminates more traffic than the previous bullet if some nodes are 719 dropping PCN-traffic. 721 o Another approach monitors PCN-packets and terminates some of the 722 PCN-flows that have an excess-traffic-marked packet. (If all such 723 flows were terminated, far too much traffic would be terminated, 724 so a random selection needs to be made from those with an excess- 725 traffic-marked packet, [I-D.menth-pcn-emft].) 727 Since flow termination is designed for "abnormal" circumstances, it 728 is quite likely that some PCN-nodes are congested and hence packets 729 are being dropped and/or significantly queued. The flow termination 730 mechanism must bear this in mind. 732 Note also that the termination control decision is made for a 733 particular pair of PCN-boundary-nodes. So it is quite possible for 734 PCN-flows to be terminated between one pair of PCN-boundary-nodes, 735 whilst at the same time none are terminated between a different pair 736 of PCN-boundary-nodes. 738 6.3. Flow admission and/or flow termination when there are only two PCN 739 encoding states 741 If a PCN-domain has only two encoding states available (PCN-marked 742 and not PCN-marked), ie it's using the baseline encoding 743 [I-D.moncaster-pcn-baseline-encoding], then an operator has three 744 options: 746 o admission control only: PCN-marking means threshold-marking, ie 747 only the threshold-marking algorithm writes PCN-marks. Only PCN 748 admission control is available. 750 o flow termination only: PCN-marking means excess-traffic-marking, 751 ie only the excess-traffic-marking algorithm writes PCN-marks. 752 Only PCN termination control is available. 754 o both admission control and flow termination: only the excess- 755 traffic-marking algorithm writes PCN-marks, however the configured 756 rate (PCN-excess-rate) is set at the rate the admission control 757 mechanism needs to limit PCN-traffic to, as shown in Figure 2. 758 [I-D.charny-pcn-single-marking] describes how both admission 759 control and flow termination can be triggered in this case and 760 also gives some of the pros and cons of this approach. The main 761 downside is that admission control is less accurate. 763 ==Marking behaviour== ==PCN mechanisms== 764 Rate of ^ 765 PCN-traffic on | 766 bottleneck link | Some pkts Terminate some 767 | excess-traffic-marked admitted flows 768 | & & 769 | Rest of pkts Block new flows 770 | threshold-marked 771 | 772 U*PCN-excess-rate -|--------------------------------------------------- 773 | 774 | All pkts Block new flows 775 | threshold-marked 776 | 777 PCN-excess-rate -|--------------------------------------------------- 778 | 779 | No pkts Admit new flows 780 | PCN-marked 781 | 783 Figure 2: Schematic of how PCN's admission control and flow 784 termination mechanisms kick in as the rate of PCN-traffic increases, 785 for a PCN-domain with two encoding states and using the approach of 786 [I-D.charny-pcn-single-marking]. Note: U is a global parameter for 787 all the PCN-links. 789 6.4. Information transport 791 The transport of pre-congestion information from a PCN-node to a PCN- 792 egress-node is through PCN-markings in data packet headers, ie "in- 793 band": no signalling protocol messaging is needed. Signalling is 794 needed to transport PCN-feedback-information between the PCN- 795 boundary-nodes, for example to convey the fraction of PCN-marked 796 traffic from a PCN-egress-node to the relevant PCN-ingress-node. 797 Exactly what information needs to be transported will be described in 798 the future PCN WG document(s) about the boundary mechanisms. The 799 signalling could be done by an extension of RSVP or NSIS, for 800 instance; protocol work will be done by the relevant WG, but for 801 example [I-D.lefaucheur-rsvp-ecn] describes the extensions needed for 802 RSVP. 804 6.5. PCN-traffic 806 The following are some high-level points about how PCN works: 808 o There needs to be a way for a PCN-node to distinguish PCN-traffic 809 from other traffic. This is through a combination of the DSCP 810 field and/or ECN field. 812 o The PCN mechanisms may be applied to more than one behaviour 813 aggregate which are distinguished by DSCP. 815 o There may be traffic that is more important than PCN, perhaps a 816 particular application or an operator's control messages. A PCN- 817 node may dedicate capacity to such traffic or priority schedule it 818 over PCN. In the latter case its traffic needs to contribute to 819 the PCN meters (ie be metered by the threshold-marking and excess- 820 traffic-marking algorithms). 822 o There may be other traffic that uses the same capacity (on a link) 823 as PCN-traffic. The baseline encoding 824 [I-D.moncaster-pcn-baseline-encoding] states that: "To conserve 825 DSCPs, DiffServ Codepoints SHOULD be chosen that are already 826 defined for use with admission controlled traffic, such as the 827 Voice-Admit codepoint defined in [voice-admit]." So, for example 828 if the Voice-Admit codepoint is used for PCN-traffic and there is 829 voice-admit traffic in the PCN-domain, then they will share the 830 same capacity since scheduling behaviour is coupled with the DSCP 831 only. Such traffic needs to contribute to the PCN meters. 833 o It is not advised to have non PCN-traffic that shares the same 834 capacity (on a link) as PCN-traffic, since it makes the PCN 835 mechanisms less accurate and so reduces PCN's ability to protect 836 the QoS of admitted PCN-flows. If there is such traffic, there 837 needs to be a mechanism to limit it. 839 o There will be traffic less important than PCN. For instance best 840 effort or assured forwarding traffic. It will be scheduled at 841 lower priority than PCN, and use a separate queue or queues. 842 However, a PCN-node should dedicate some capacity to lower 843 priority traffic so that it isn't starved. Such traffic doesn't 844 contribute to the PCN meters. 846 6.6. Backwards compatibility 848 PCN specifies semantics for the ECN field that differ from the 849 default semantics of [RFC3168]. BCP124 [RFC4774] gives guidelines 850 for specifying alternative semantics for the ECN field. These are 851 discussed in the baseline encoding 852 document[I-D.moncaster-pcn-baseline-encoding], which in summary meets 853 these guidelines by: 855 o using a DSCP to allow PCN-nodes to distinguish PCN-traffic that 856 uses the alternative ECN semantics; 858 o defining these semantics for use within a controlled region, the 859 PCN-domain; 861 o taking appropriate action if ECN capable, non-PCN traffic arrives 862 at a PCN-ingress-node with the DSCP used by PCN. 864 The 'appropriate action' is to block ECN-capable traffic that uses 865 the same DSCP as PCN from entering the PCN-domain directly. Blocking 866 means it is dropped or downgraded to a lower priority behaviour 867 aggregate, or alternatively such traffic may be tunnelled through the 868 PCN-domain. The reason that blocking is needed is that the PCN- 869 egress-node clears the ECN field to 00. 871 Extended encoding schemes may take different 'appropriate action'. 872 They need to describe how they meet the guidelines of BCP124 873 [RFC4774]. 875 7. Detailed Functional architecture 877 This section is intended to provide a systematic summary of the new 878 functional architecture in the PCN-domain. First it describes 879 functions needed at the three specific types of PCN-node; these are 880 data plane functions and are in addition to their normal router 881 functions. Then it describes further functionality needed for both 882 flow admission control and flow termination; these are signalling and 883 decision-making functions, and there are various possibilities for 884 where the functions are physically located. The section is split 885 into: 887 1. functions needed at PCN-interior-nodes 889 2. functions needed at PCN-ingress-nodes 891 3. functions needed at PCN-egress-nodes 893 4. other functions needed for flow admission control 895 5. other functions needed for flow termination control 897 Note: Probing is covered in the Appendix. 899 The section then discusses some other detailed topics: 901 1. addressing 903 2. tunnelling 905 3. fault handling 907 7.1. PCN-interior-node functions 909 Each link of the PCN-domain is configured with the following 910 functionality: 912 o Behaviour aggregate classify - decide whether an incoming packet 913 is a PCN-packet or not. 915 o Meter - measure the 'amount of PCN-traffic'. The measurement is 916 made as an aggregate of all PCN-packets, and not per flow. 918 o Mark - algorithms determine whether to PCN-mark PCN-packets and 919 what packet encoding is used. 921 The functions are standardised in [I-D.eardley-pcn-marking-behaviour] 922 and the baseline encoding in [I-D.moncaster-pcn-baseline-encoding] 923 (extended encodings are to be defined in other documents). 925 7.2. PCN-ingress-node functions 927 Each ingress link of the PCN-domain is configured with the following 928 functionality: 930 o Packet classify - decide whether an incoming packet is part of a 931 previously admitted flow, by using a filter spec (eg DSCP, source 932 and destination addresses and port numbers). 934 o Traffic condition - police, by dropping or downgrading, any 935 packets received with a DSCP demanding PCN transport that do not 936 belong to an admitted flow. (A prospective PCN-flow that is 937 rejected could be blocked or admitted into a lower priority 938 behaviour aggregate.) Similarly, police packets that are part of 939 a previously admitted flow, to check that the flow keeps to the 940 agreed rate or flowspec (eg RFC1633 [RFC1633] for a microflow and 941 its NSIS equivalent). If PCN 943 o Packet colour - set the DSCP and ECN fields appropriately for the 944 PCN-domain, for example as in 945 [I-D.moncaster-pcn-baseline-encoding]. 947 o Meter - some approaches to flow termination require the PCN- 948 ingress-node to measure the (aggregate) rate of PCN-traffic 949 towards a particular PCN-egress-node. 951 The first two are policing functions, needed to make sure that PCN- 952 packets admitted into the PCN-domain belong to a flow that's been 953 admitted and to ensure that the flow keeps to the flowspec agreed (eg 954 doesn't go at a faster rate and is inelastic traffic). Installing 955 the filter spec will typically be done by the signalling protocol, as 956 will re-installing the filter, for example after a re-route that 957 changes the PCN-ingress-node (see [I-D.briscoe-tsvwg-cl-architecture] 958 for an example using RSVP). Packet colouring allows the rest of the 959 PCN-domain to recognise PCN-packets. 961 7.3. PCN-egress-node functions 963 Each egress link of the PCN-domain is configured with the following 964 functionality: 966 o Packet classify - determine which PCN-ingress-node a PCN-packet 967 has come from. 969 o Meter - "measure PCN-traffic" or "monitor PCN-marks". 971 o Packet colour - for PCN-packets, set the DSCP and ECN fields to 972 the appropriate values for use outside the PCN-domain. 974 The metering functionality of course depends on whether it is 975 targeted at admission control or flow termination. Alternative 976 proposals involve the PCN-egress-node "measuring" as an aggregate (ie 977 not per flow) all PCN-packets from a particular PCN-ingress-node, or 978 "monitoring" the PCN-traffic and reacting to one (or several) PCN- 979 marked packets. For packet colouring, 980 [I-D.moncaster-pcn-baseline-encoding] specifies that the PCN-egress- 981 node sets the ECN field to 00; other encodings may define something 982 different. 984 7.4. Admission control functions 986 As well as the functions covered above, other specific admission 987 control functions can be performed at a PCN-boundary-node (PCN- 988 ingress-node or PCN-egress-node) or at a centralised node, but not at 989 normal PCN-interior-nodes. The functions are: 991 o Make decision about admission - based on the output of the PCN- 992 egress-node's PCN meter function. In the case where it "measures 993 PCN-traffic", the measured traffic on the ingress-egress-aggregate 994 is compared with some reference level. In the case where it 995 "monitors PCN-marks", then the decision is based on whether one 996 (or several) packets is (are) PCN-marked or not (eg the RSVP PATH 997 message). In either case, the admission decision also takes 998 account of policy and application layer requirements. 1000 o Communicate decision about admission - signal the decision to the 1001 node making the admission control request (which may be outside 1002 the PCN-domain), and to the policer (PCN-ingress-node function) 1003 for enforcement of the decision. 1005 There are various possibilities for how the functionality can be 1006 distributed (we assume the operator would configure which is used): 1008 o The decision is made at the PCN-egress-node and the decision 1009 (admit or block) is signalled to the PCN-ingress-node. 1011 o The decision is recommended by the PCN-egress-node (admit or 1012 block) but the decision is definitively made by the PCN-ingress- 1013 node. The rationale is that the PCN-egress-node naturally has the 1014 necessary information about PCN-marking on the ingress-egress- 1015 aggregate, but the PCN-ingress-node is the policy enforcement 1016 point which polices incoming traffic to ensure it's part of an 1017 admitted PCN-flow. 1019 o The decision is made at the PCN-ingress-node, which requires that 1020 the PCN-egress-node signals PCN-feedback-information to the PCN- 1021 ingress-node. For example, it could signal the current fraction 1022 of PCN-traffic that is PCN-marked. 1024 o The decision is made at a centralised node (see Appendix). 1026 7.5. Flow termination functions 1028 As well as the functions covered above, other specific termination 1029 control functions can be performed at a PCN-boundary-node (PCN- 1030 ingress-node or PCN-egress-node) or at a centralised node, but not at 1031 normal PCN-interior-nodes. There are various possibilities for how 1032 the functionality can be distributed, similar to those discussed 1033 above in the Admission control section; the flow termination decision 1034 could be made at the PCN-ingress-node, the PCN-egress-node or at some 1035 centralised node. The functions are: 1037 o PCN-meter at PCN-egress-node - similarly to flow admission, there 1038 are two types of proposals: to "measure PCN-traffic" on the 1039 ingress-egress-aggregate, and to "monitor PCN-marks" and react to 1040 one (or several) PCN-marks. 1042 o (if required) PCN-meter at PCN-ingress-node - make "measurements 1043 of PCN-traffic" being sent towards a particular PCN-egress-node; 1044 again, this is done for the ingress-egress-aggregate and not per 1045 flow. 1047 o (if required) Communicate PCN-feedback-information to the node 1048 that makes the flow termination decision. For example, as in 1049 [I-D.briscoe-tsvwg-cl-architecture], communicate the PCN-egress- 1050 node's measurements to the PCN-ingress-node. 1052 o Make decision about flow termination - use the information from 1053 the PCN-meter(s) to decide which PCN-flow or PCN-flows to 1054 terminate. The decision takes account of policy and application 1055 layer requirements. 1057 o Communicate decision about flow termination - signal the decision 1058 to the node that is able to terminate the flow (which may be 1059 outside the PCN-domain), and to the policer (PCN-ingress-node 1060 function) for enforcement of the decision. 1062 7.6. Addressing 1064 PCN-nodes may need to know the address of other PCN-nodes. Note: in 1065 all cases PCN-interior-nodes don't need to know the address of any 1066 other PCN-nodes (except as normal their next hop neighbours, for 1067 routing purposes). 1069 The PCN-egress-node needs to know the address of the PCN-ingress-node 1070 associated with a flow, at a minimum so that the PCN-ingress-node can 1071 be informed to enforce the admission decision (and any flow 1072 termination decision) through policing. There are various 1073 possibilities for how the PCN-egress-node can do this, ie associate 1074 the received packet to the correct ingress-egress-aggregate. It is 1075 not the intention of this document to mandate a particular mechanism. 1077 o The addressing information can be gathered from signalling. For 1078 example, regular processing of an RSVP Path message, as the PCN- 1079 ingress-node is the previous RSVP hop (PHOP) 1080 ([I-D.lefaucheur-rsvp-ecn]). Or the PCN-ingress-node could signal 1081 its address to the PCN-egress-node. 1083 o Always tunnel PCN-traffic across the PCN-domain. Then the PCN- 1084 ingress-node's address is simply the source address of the outer 1085 packet header. The PCN-ingress-node needs to learn the address of 1086 the PCN-egress-node, either by manual configuration or by one of 1087 the automated tunnel endpoint discovery mechanisms (such as 1088 signalling or probing over the data route, interrogating routing 1089 or using a centralised broker). 1091 7.7. Tunnelling 1093 Tunnels may originate and/or terminate within a PCN-domain. It is 1094 important that the PCN-marking of any packet can potentially 1095 influence PCN's flow admission control and termination - it shouldn't 1096 matter whether the packet happens to be tunnelled at the PCN-node 1097 that PCN-marks the packet, or indeed whether it's decapsulated or 1098 encapsulated by a subsequent PCN-node. This suggests that the 1099 "uniform conceptual model" described in [RFC2983] should be re- 1100 applied in the PCN context. In line with this and the approach of 1101 [RFC4303] and [I-D.briscoe-tsvwg-ecn-tunnel], the following rule is 1102 applied if encapsulation is done within the PCN-domain: 1104 o any PCN-marking is copied into the outer header 1106 Similarly, in line with the "uniform conceptual model" of [RFC2983] 1107 and the "full-functionality option" of [RFC3168], the following rule 1108 is applied if decapsulation is done within the PCN-domain: 1110 o if the outer header's marking state is more severe then it is 1111 copied onto the inner header 1113 o Note: the order of increasing severity is: not PCN-marked; 1114 threshold-marking; excess-traffic-marking. 1116 An operator may wish to tunnel PCN-traffic from PCN-ingress-nodes to 1117 PCN-egress-nodes. The PCN-marks shouldn't be visible outside the 1118 PCN-domain, which can be achieved by the PCN-egress-node doing the 1119 packet colouring function (Section 7.3) after all the other (PCN and 1120 tunnelling) functions. The potential reasons for doing such 1121 tunnelling are: the PCN-egress-node then automatically knows the 1122 address of the relevant PCN-ingress-node for a flow; even if ECMP is 1123 running, all PCN-packets on a particular ingress-egress-aggregate 1124 follow the same path. But it also has drawbacks, for example the 1125 additional overhead in terms of bandwidth and processing, and the 1126 cost of setting up a mesh of tunnels between PCN-boundary-nodes 1127 (there is an N^2 scaling issue). 1129 Potential issues arise for a "partially PCN-capable tunnel", ie where 1130 only one tunnel endpoint is in the PCN domain: 1132 1. The tunnel starts outside a PCN-domain and finishes inside it. 1133 If the packet arrives at the tunnel ingress with the same 1134 encoding as used within the PCN-domain to indicate PCN-marking, 1135 then this could lead the PCN-egress-node to falsely measure pre- 1136 congestion. 1138 2. The tunnel starts inside a PCN-domain and finishes outside it. 1139 If the packet arrives at the tunnel ingress already PCN-marked, 1140 then it will still have the same encoding when it's decapsulated 1141 which could potentially confuse nodes beyond the tunnel egress. 1143 In line with the solution for partially capable DiffServ tunnels in 1144 [RFC2983], the following rules are applied: 1146 o For case (1), the tunnel egress node clears any PCN-marking on the 1147 inner header. This rule is applied before the 'copy on 1148 decapsulation' rule above. 1150 o For case (2), the tunnel ingress node clears any PCN-marking on 1151 the inner header. This rule is applied after the 'copy on 1152 encapsulation' rule above. 1154 Note that the above implies that one has to know, or figure out, the 1155 characteristics of the other end of the tunnel as part of setting it 1156 up. 1158 Tunnelling constraints were a major factor in the choice of the 1159 baseline encoding. As explained in 1160 [I-D.moncaster-pcn-baseline-encoding], with current tunnelling 1161 endpoints only the 11 codepoint of the ECN field survives 1162 decapsulation, and hence the baseline encoding only uses the 11 1163 codepoint to indicate PCN-marking. Extended encoding schemes need to 1164 explain their interactions with (or assumptions about) tunnelling. A 1165 lengthy discussion of all the issues associated with layered 1166 encapsulation of congestion notification (for ECN as well as PCN) is 1167 in [I-D.briscoe-tsvwg-ecn-tunnel]. 1169 7.8. Fault handling 1171 If a PCN-interior-node fails (or one of its links), then lower layer 1172 protection mechanisms or the regular IP routing protocol will 1173 eventually re-route round it. If the new route can carry all the 1174 admitted traffic, flows will gracefully continue. If instead this 1175 causes early warning of pre-congestion on the new route, then 1176 admission control based on pre-congestion notification will ensure 1177 new flows will not be admitted until enough existing flows have 1178 departed. Re-routing may result in heavy (pre-)congestion, when the 1179 flow termination mechanism will kick in. 1181 If a PCN-boundary-node fails then we would like the regular QoS 1182 signalling protocol to take care of things. As an example 1183 [I-D.briscoe-tsvwg-cl-architecture] considers what happens if RSVP is 1184 the QoS signalling protocol. 1186 8. Design goals and challenges 1188 Prior work on PCN and similar mechanisms has thrown up a number of 1189 considerations about PCN's design goals (things PCN should be good 1190 at) and some issues that have been hard to solve in a fully 1191 satisfactory manner. Taken as a whole it represents a list of trade- 1192 offs (it's unlikely that they can all be 100% achieved) and perhaps 1193 as evaluation criteria to help an operator (or the IETF) decide 1194 between options. 1196 The following are key design goals for PCN (based on 1197 [I-D.chan-pcn-problem-statement]): 1199 o The PCN-enabled packet forwarding network should be simple, 1200 scalable and robust 1202 o Compatibility with other traffic (ie a proposed solution should 1203 work well when non-PCN traffic is also present in the network) 1205 o Support of different types of real-time traffic (eg should work 1206 well with CBR and VBR voice and video sources treated together) 1208 o Reaction time of the mechanisms should be commensurate with the 1209 desired application-level requirements (eg a termination mechanism 1210 needs to terminate flows before significant QoS issues are 1211 experienced by real-time traffic, and before most users hang up). 1213 o Compatibility with different precedence levels of real-time 1214 applications (eg preferential treatment of higher precedence calls 1215 over lower precedence calls, [ITU-MLPP]). 1217 The following are open issues. They are mainly taken from 1218 [I-D.briscoe-tsvwg-cl-architecture] which also describes some 1219 possible solutions. Note that some may be considered unimportant in 1220 general or in specific deployment scenarios or by some operators. 1222 NOTE: Potential solutions are out of scope for this document. 1224 o ECMP (Equal Cost Multi-Path) Routing: The level of pre-congestion 1225 is measured on a specific ingress-egress-aggregate. However, if 1226 the PCN-domain runs ECMP, then traffic on this ingress-egress- 1227 aggregate may follow several different paths - some of the paths 1228 could be pre-congested whilst others are not. There are three 1229 potential problems: 1231 1. over-admission: a new flow is admitted (because the pre- 1232 congestion level measured by the PCN-egress-node is 1233 sufficiently diluted by unmarked packets from non-congested 1234 paths that a new flow is admitted), but its packets travel 1235 through a pre-congested PCN-node 1237 2. under-admission: a new flow is blocked (because the pre- 1238 congestion level measured by the PCN-egress-node is 1239 sufficiently increased by PCN-marked packets from pre- 1240 congested paths that a new flow is blocked), but its packets 1241 travel along an uncongested path 1243 3. ineffective termination: flows are terminated, however their 1244 path doesn't travel through the (pre-)congested router(s). 1245 Since flow termination is a 'last resort' that protects the 1246 network should over-admission occur, this problem is probably 1247 more important to solve than the other two. 1249 o ECMP and signalling: It is possible that, in a PCN-domain running 1250 ECMP, the signalling packets (eg RSVP, NSIS) follow a different 1251 path than the data packets, which could matter if the signalling 1252 packets are used as probes. Whether this is an issue depends on 1253 which fields the ECMP algorithm uses; if the ECMP algorithm is 1254 restricted to the source and destination IP addresses, then it 1255 won't be. ECMP and signalling interactions are a specific 1256 instance of a general issue for non-traditional routing combined 1257 with resource management along a path [Hancock]. 1259 o Tunnelling: There are scenarios where tunnelling makes it hard to 1260 determine the path in the PCN-domain. The problem, its impact and 1261 the potential solutions are similar to those for ECMP. 1263 o Scenarios with only one tunnel endpoint in the PCN domain may make 1264 it harder for the PCN-egress-node to gather from the signalling 1265 messages (eg RSVP, NSIS) the identity of the PCN-ingress-node. 1267 o Bi-Directional Sessions: Many applications have bi-directional 1268 sessions - hence there are two microflows that should be admitted 1269 (or terminated) as a pair - for instance a bi-directional voice 1270 call only makes sense if microflows in both directions are 1271 admitted. However, PCN's mechanisms concern admission and 1272 termination of a single flow, and coordination of the decision for 1273 both flows is a matter for the signalling protocol and out of 1274 scope of PCN. One possible example would use SIP pre-conditions; 1275 there are others. 1277 o Global Coordination: PCN makes its admission decision based on 1278 PCN-markings on a particular ingress-egress-aggregate. Decisions 1279 about flows through a different ingress-egress-aggregate are made 1280 independently. However, one can imagine network topologies and 1281 traffic matrices where, from a global perspective, it would be 1282 better to make a coordinated decision across all the ingress- 1283 egress-aggregates for the whole PCN-domain. For example, to block 1284 (or even terminate) flows on one ingress-egress-aggregate so that 1285 more important flows through a different ingress-egress-aggregate 1286 could be admitted. The problem may well be second order. 1288 o Aggregate Traffic Characteristics: Even when the number of flows 1289 is stable, the traffic level through the PCN-domain will vary 1290 because the sources vary their traffic rates. PCN works best when 1291 there's not too much variability in the total traffic level at a 1292 PCN-node's interface (ie in the aggregate traffic from all 1293 sources). Too much variation means that a node may (at one 1294 moment) not be doing any PCN-marking and then (at another moment) 1295 drop packets because it's overloaded. This makes it hard to tune 1296 the admission control scheme to stop admitting new flows at the 1297 right time. Therefore the problem is more likely with fewer, 1298 burstier flows. 1300 o Flash crowds and Speed of Reaction: PCN is a measurement-based 1301 mechanism and so there is an inherent delay between packet marking 1302 by PCN-interior-nodes and any admission control reaction at PCN- 1303 boundary-nodes. For example, potentially if a big burst of 1304 admission requests occurs in a very short space of time (eg 1305 prompted by a televote), they could all get admitted before enough 1306 PCN-marks are seen to block new flows. In other words, any 1307 additional load offered within the reaction time of the mechanism 1308 mustn't move the PCN-domain directly from no congestion to 1309 overload. This 'vulnerability period' may impact at the 1310 signalling level, for instance QoS requests should be rate limited 1311 to bound the number of requests able to arrive within the 1312 vulnerability period. 1314 o Silent at start: after a successful admission request the source 1315 may wait some time before sending data (eg waiting for the called 1316 party to answer). Then the risk is that, in some circumstances, 1317 PCN's measurements underestimate what the pre-congestion level 1318 will be when the source does start sending data. 1320 9. Operations and Management 1322 This Section considers operations and management issues, under the 1323 FCAPS headings: OAM of Faults, Configuration, Accounting, Performance 1324 and Security. Provisioning is discussed with performance. 1326 9.1. Configuration OAM 1328 Threshold-marking and excess-traffic-marking are standardised in 1329 [I-D.eardley-pcn-marking-behaviour]. However, more diversity in PCN- 1330 boundary-node behaviours is expected, in order to interface with 1331 diverse industry architectures. It may be possible to have different 1332 PCN-boundary-node behaviours for different ingress-egress-aggregates 1333 within the same PCN-domain. 1335 A PCN marking behaviour (threshold-marking, excess-traffic-marking) 1336 is enabled on either the egress or the ingress interfaces of PCN- 1337 nodes. A consistent choice must be made across the PCN-domain to 1338 ensure that the PCN mechanisms protect all links. 1340 PCN configuration control variables fall into the following 1341 categories: 1343 o system options (enabling or disabling behaviours) 1345 o parameters (setting levels, addresses etc) 1347 One possibility is that all configurable variables sit within an SNMP 1348 management framework [RFC3411], being structured within a defined 1349 management information base (MIB) on each node, and being remotely 1350 readable and settable via a suitably secure management protocol 1351 (SNMPv3). 1353 Some configuration options and parameters have to be set once to 1354 'globally' control the whole PCN-domain. Where possible, these are 1355 identified below. This may affect operational complexity and the 1356 chances of interoperability problems between kit from different 1357 vendors. 1359 It may be possible for an operator to configure some PCN-interior- 1360 nodes so they don't run the PCN mechanisms, if it knows that these 1361 links will never become (pre-)congested. 1363 9.1.1. System options 1365 On PCN-interior-nodes there will be very few system options: 1367 o Whether two PCN-markings (threshold-marked and excess-traffic- 1368 marked) are enabled or only one. Typically all nodes throughout a 1369 PCN-domain will be configured the same in this respect. However, 1370 exceptions could be made. For example, if most PCN-nodes used 1371 both markings, but some legacy hardware was incapable of running 1372 two algorithms, an operator might be willing to configure these 1373 legacy nodes solely for excess-traffic-marking to enable flow 1374 termination as a back-stop. It would be sensible to place such 1375 nodes where they could be provisioned with a greater leeway over 1376 expected traffic levels. 1378 o In the case where only one PCN-marking is enabled, all nodes must 1379 be configured to generate PCN-marks from the same meter (ie either 1380 the threshold meter or the excess traffic meter). 1382 PCN-boundary-nodes (ingress and egress) will have more system 1383 options: 1385 o Which of admission and flow termination are enabled. If any PCN- 1386 interior-node is configured to generate a marking, all PCN- 1387 boundary-nodes must be able to handle that marking (which includes 1388 understanding, in a PCN-domain that uses only one type of PCN- 1389 marking, whether they are generated by PCN-interior-node's 1390 threshold meters or the excess traffic meters). Therefore all 1391 PCN-boundary-nodes must be configured the same in this respect. 1393 o Where flow admission and termination decisions are made: at the 1394 PCN-ingress-node, PCN-egress-node or at a centralised node (see 1395 Section 7). Theoretically, this configuration choice could be 1396 negotiated for each pair of PCN-boundary-nodes, but we cannot 1397 imagine why such complexity would be required, except perhaps in 1398 future inter-domain scenarios. 1400 o How PCN-markings are translated into admission control and flow 1401 termination decisions (see Section 6.1 and Section 6.2). 1403 PCN-egress-nodes will have further system options: 1405 o How the mapping should be established between each packet and its 1406 aggregate, eg by MPLS label, by IP packet filterspec; and how to 1407 take account of ECMP. 1409 o If an equipment vendor provides a choice, there may be options to 1410 select which smoothing algorithm to use for measurements. 1412 9.1.2. Parameters 1414 Like any DiffServ domain, every node within a PCN-domain will need to 1415 be configured with the DSCP(s) used to identify PCN-packets. On each 1416 interior link the main configuration parameters are the PCN- 1417 threshold-rate and PCN-excess-rate. A larger PCN-threshold-rate 1418 enables more PCN-traffic to be admitted on a link, hence improving 1419 capacity utilisation. A PCN-excess-rate set further above the PCN- 1420 threshold-rate allows greater increases in traffic (whether due to 1421 natural fluctuations or some unexpected event) before any flows are 1422 terminated, ie minimises the chances of unnecessarily triggering the 1423 termination mechanism. For instance an operator may want to design 1424 their network so that it can cope with a failure of any single PCN- 1425 node without terminating any flows. 1427 Setting these rates on first deployment of PCN will be very similar 1428 to the traditional process for sizing an admission controlled 1429 network, depending on: the operator's requirements for minimising 1430 flow blocking (grade of service), the expected PCN traffic load on 1431 each link and its statistical characteristics (the traffic matrix), 1432 contingency for re-routing the PCN traffic matrix in the event of 1433 single or multiple failures and the expected load from other classes 1434 relative to link capacities [Menth]. But once a domain is up and 1435 running, a PCN design goal is to be able to determine growth in these 1436 configured rates much more simply, by monitoring PCN-marking rates 1437 from actual rather than expected traffic (see Section 9.2 on 1438 Performance & Provisioning). 1440 Operators may also wish to configure a rate greater than the PCN- 1441 excess-rate that is the absolute maximum rate that a link allows for 1442 PCN-traffic. This may simply be the physical link rate, but some 1443 operators may wish to configure a logical limit to prevent starvation 1444 of other traffic classes during any brief period after PCN-traffic 1445 exceeds the PCN-excess-rate but before flow termination brings it 1446 back below this rate. 1448 Threshold-marking requires a threshold token bucket depth to be 1449 configured, excess-traffic-marking needs a value for the MTU (maximum 1450 size of a PCN-packet on the link) and both require setting a maximum 1451 size of their token buckets. It will be preferable for their to be 1452 rules to set defaults for these parameters, but then allow operators 1453 to change them, for instance if average traffic characteristics 1454 change over time. 1456 The PCN-egress-node may allow configuration of the following: 1458 o how it smooths metering of PCN-markings (eg EWMA parameters) 1460 Whichever node makes admission and flow termination decisions will 1461 contain algorithms for converting PCN-marking levels into admission 1462 or flow termination decisions. These will also require configurable 1463 parameters, for instance: 1465 o an admission control algorithm that's based on the fraction of 1466 marked packets will at least require a marking threshold setting 1467 above which it denies admission to new flows; 1469 o flow termination algorithms will probably require a parameter to 1470 delay termination of any flows until it is more certain that an 1471 anomalous event is not transient; 1473 o a parameter to control the trade-off between how quickly excess 1474 flows are terminated and over-termination. 1476 One particular proposal, [I-D.charny-pcn-single-marking] would 1477 require a global parameter to be defined on all PCN-nodes, but only 1478 needs one PCN marking rate to be configured on each link. The global 1479 parameter is a scaling factor between admission and termination (the 1480 PCN-traffic rate on a link up to which flows are admitted vs the rate 1481 above which flows are terminated). [I-D.charny-pcn-single-marking] 1482 discusses in full the impact of this particular proposal on the 1483 operation of PCN. 1485 9.2. Performance & Provisioning OAM 1487 Monitoring of performance factors measurable from *outside* the PCN 1488 domain will be no different with PCN than with any other packet-based 1489 flow admission control system, both at the flow level (blocking 1490 probability etc) and the packet level (jitter [RFC3393], [Y.1541], 1491 loss rate [RFC4656], mean opinion score [P.800], etc). The 1492 difference is that PCN is intentionally designed to indicate 1493 *internally* which exact resource(s) are the cause of performance 1494 problems and by how much. 1496 Even better, PCN indicates which resources will probably cause 1497 problems if they are not upgraded soon. This can be achieved by the 1498 management system monitoring the total amount (in bytes) of PCN- 1499 marking generated by each queue over a period. Given possible long 1500 provisioning lead times, pre-congestion volume is the best metric to 1501 reveal whether sufficient persistent demand has mounted up to warrant 1502 an upgrade. Because, even before utilisation becomes problematic, 1503 the statistical variability of traffic will cause occasional bursts 1504 of pre-congestion. This 'early warning system' decouples the process 1505 of adding customers from the provisioning process. This should cut 1506 the time to add a customer when compared against admission control 1507 provided over native DiffServ [RFC2998], because it saves having to 1508 re-run the capacity planning process before adding each customer. 1510 Alternatively, before triggering an upgrade, the long term pre- 1511 congestion volume on each link can be used to balance traffic load 1512 across the PCN-domain by adjusting the link weights of the routing 1513 system. When an upgrade to a link's configured PCN-rates is 1514 required, it may also be necessary to upgrade the physical capacity 1515 available to other classes. But usually there will be sufficient 1516 physical capacity for the upgrade to go ahead as a simple 1517 configuration change. Alternatively, [Songhurst] has proposed an 1518 adaptive rather than preconfigured system, where the configured PCN- 1519 threshold-rate is replaced with a high and low water mark and the 1520 marking algorithm automatically optimises how physical capacity is 1521 shared using the relative loads from PCN and other traffic classes. 1523 All the above processes require just three extra counters associated 1524 with each PCN queue: threshold-markings, excess-traffic-markings and 1525 drop. Every time a PCN packet is marked or dropped its size in bytes 1526 should be added to the appropriate counter. Then the management 1527 system can read the counters at any time and subtract a previous 1528 reading to establish the incremental volume of each type of 1529 (pre-)congestion. Readings should be taken frequently, so that 1530 anomalous events (eg re-routes) can be separated from regular 1531 fluctuating demand if required. 1533 9.3. Accounting OAM 1535 Accounting is only done at trust boundaries so it is out of scope of 1536 the initial Charter of the PCN WG which is confined to intra-domain 1537 issues. Use of PCN internal to a domain makes no difference to the 1538 flow signalling events crossing trust boundaries outside the PCN- 1539 domain, which are typically used for accounting. 1541 9.4. Fault OAM 1543 Fault OAM is about preventing faults, telling the management system 1544 (or manual operator) that the system has recovered (or not) from a 1545 failure, and about maintaining information to aid fault diagnosis. 1547 Admission blocking and particularly flow termination mechanisms 1548 should rarely be needed in practice. It would be unfortunate if they 1549 didn't work after an option had been accidentally disabled. 1550 Therefore it will be necessary to regularly test that the live system 1551 works as intended (devising a meaningful test is left as an exercise 1552 for the operator). 1554 Section 7 describes how the PCN architecture has been designed to 1555 ensure admitted flows continue gracefully after recovering 1556 automatically from link or node failures. The need to record and 1557 monitor re-routing events affecting signalling is unchanged by the 1558 addition of PCN to a DiffServ domain. Similarly, re-routing events 1559 within the PCN-domain will be recorded and monitored just as they 1560 would be without PCN. 1562 PCN-marking does make it possible to record 'near-misses'. For 1563 instance, at the PCN-egress-node a 'reporting threshold' could be set 1564 to monitor how often - and for how long - the system comes close to 1565 triggering flow blocking without actually doing so. Similarly, 1566 bursts of flow termination marking could be recorded even if they are 1567 not sufficiently sustained to trigger flow termination. Such 1568 statistics could be correlated with per-queue counts of marking 1569 volume (Section 9.2) to upgrade resources in danger of causing 1570 service degradation, or to trigger manual tracing of intermittent 1571 incipient errors that would otherwise have gone unnoticed. 1573 Finally, of course, many faults are caused by failings in the 1574 management process ('human error'): a wrongly configured address in a 1575 node, a wrong address given in a signalling protocol, a wrongly 1576 configured parameter in a queueing algorithm, a node set into a 1577 different mode from other nodes, and so on. Generally, a clean 1578 design with few configurable options ensures this class of faults can 1579 be traced more easily and prevented more often. Sound management 1580 practice at run-time also helps. For instance: a management system 1581 should be used that constrains configuration changes within system 1582 rules (eg preventing an option setting inconsistent with other 1583 nodes); configuration options should also be recorded in an offline 1584 database; and regular automatic consistency checks between live 1585 systems and the database. PCN adds nothing specific to this class of 1586 problems. 1588 9.5. Security OAM 1590 Security OAM is about using secure operational practices as well as 1591 being able to track security breaches or near-misses at run-time. 1592 PCN adds few specifics to the general good practice required in this 1593 field [RFC4778], other than those below. The correct functions of 1594 the system should be monitored (Section 9.2) in multiple independent 1595 ways and correlated to detect possible security breaches. Persistent 1596 (pre-)congestion marking should raise an alarm (both on the node 1597 doing the marking and on the PCN-egress-node metering it). 1598 Similarly, persistently poor external QoS metrics such as jitter or 1599 MOS should raise an alarm. The following are examples of symptoms 1600 that may be the result of innocent faults, rather than attacks, but 1601 until diagnosed they should be logged and trigger a security alarm: 1603 o Anomalous patterns of non-conforming incoming signals and packets 1604 rejected at the PCN-ingress-nodes (eg packets already marked PCN- 1605 capable, or traffic persistently starving token bucket policers). 1607 o PCN-capable packets arriving at a PCN-egress-node with no 1608 associated state for mapping them to a valid ingress-egress- 1609 aggregate. 1611 o A PCN-ingress-node receiving feedback signals about the pre- 1612 congestion level on a non-existent aggregate, or that are 1613 inconsistent with other signals (eg unexpected sequence numbers, 1614 inconsistent addressing, conflicting reports of the pre-congestion 1615 level, etc). 1617 o Pre-congestion marking arriving at an PCN-egress-node with 1618 (pre-)congestion markings focused on particular flows, rather than 1619 randomly distributed throughout the aggregate. 1621 10. IANA Considerations 1623 This memo includes no request to IANA. 1625 11. Security considerations 1627 Security considerations essentially come from the Trust Assumption 1628 (Section 5.1), ie that all PCN-nodes are PCN-enabled and trust each 1629 other for truthful PCN-marking and transport. PCN splits 1630 functionality between PCN-interior-nodes and PCN-boundary-nodes, and 1631 the security considerations are somewhat different for each, mainly 1632 because PCN-boundary-nodes are flow-aware and PCN-interior-nodes are 1633 not. 1635 o Because the PCN-boundary-nodes are flow-aware, they are trusted to 1636 use that awareness correctly. The degree of trust required 1637 depends on the kinds of decisions they have to make and the kinds 1638 of information they need to make them. For example when the PCN- 1639 boundary-node needs to know the contents of the sessions for 1640 making the admission and termination decisions, or when the 1641 contents are highly classified, then the security requirements for 1642 the PCN-boundary-nodes involved will also need to be high. 1644 o the PCN-ingress-nodes police packets to ensure a PCN-flow sticks 1645 within its agreed limit, and to ensure that only PCN-flows which 1646 have been admitted contribute PCN-traffic into the PCN-domain. 1647 The policer must drop (or perhaps downgrade to a different DSCP) 1648 any PCN-packets received that are outside this remit. This is 1649 similar to the existing IntServ behaviour. Between them the PCN- 1650 boundary-nodes must encircle the PCN-domain, otherwise PCN-packets 1651 could enter the PCN-domain without being subject to admission 1652 control, which would potentially destroy the QoS of existing 1653 flows. 1655 o PCN-interior-nodes aren't flow-aware. This prevents some security 1656 attacks where an attacker targets specific flows in the data plane 1657 - for instance for DoS or eavesdropping. 1659 o PCN-marking by the PCN-interior-nodes along the packet forwarding 1660 path needs to be trusted, because the PCN-boundary-nodes rely on 1661 this information. For instance a rogue PCN-interior-node could 1662 PCN-mark all packets so that no flows were admitted. Another 1663 possibility is that it doesn't PCN-mark any packets, even when 1664 it's pre-congested. More subtly, the rogue PCN-interior-node 1665 could perform these attacks selectively on particular flows, or it 1666 could PCN-mark the correct fraction overall, but carefully choose 1667 which flows it marked. 1669 o the PCN-boundary-nodes should be able to deal with DoS attacks and 1670 state exhaustion attacks based on fast changes in per flow 1671 signalling. 1673 o the signalling between the PCN-boundary-nodes (and possibly a 1674 central control node) must be protected from attacks. For example 1675 the recipient needs to validate that the message is indeed from 1676 the node that claims to have sent it. Possible measures include 1677 digest authentication and protection against replay and man-in- 1678 the-middle attacks. For the specific protocol RSVP, hop-by-hop 1679 authentication is in [RFC2747], and 1680 [I-D.behringer-tsvwg-rsvp-security-groupkeying] may also be 1681 useful. 1683 Operational security advice is given in Section 9.5. 1685 12. Conclusions 1687 The document describes a general architecture for flow admission and 1688 termination based on pre-congestion information in order to protect 1689 the quality of service of established inelastic flows within a single 1690 DiffServ domain. The main topic is the functional architecture. It 1691 also mentions other topics like the assumptions and open issues. 1693 13. Acknowledgements 1695 This document is a revised version of [I-D.eardley-pcn-architecture]. 1696 Its authors were: P. Eardley, J. Babiarz, K. Chan, A. Charny, R. 1697 Geib, G. Karagiannis, M. Menth, T. Tsou. They are therefore 1698 contributors to this document. 1700 Thanks to those who've made comments on 1701 [I-D.eardley-pcn-architecture] and on earlier versions of this draft: 1702 Lachlan Andrew, Joe Babiarz, Fred Baker, David Black, Steven Blake, 1703 Bob Briscoe, Jason Canon, Ken Carlberg, Anna Charny, Joachim 1704 Charzinski, Andras Csaszar, Lars Eggert, Ruediger Geib, Wei Gengyu, 1705 Robert Hancock, Ingemar Johansson, Georgios Karagiannis, Michael 1706 Menth, Toby Moncaster, Ben Strulo, Tom Taylor, Hannes Tschofenig, 1707 Tina Tsou, Lars Westberg, Magnus Westerlund, Delei Yu. Thanks to Bob 1708 Briscoe who extensively revised the Operations and Management 1709 section. 1711 This document is the result of discussions in the PCN WG and 1712 forerunner activity in the TSVWG. A number of previous drafts were 1713 presented to TSVWG: [I-D.chan-pcn-problem-statement], 1714 [I-D.briscoe-tsvwg-cl-architecture], [I-D.briscoe-tsvwg-cl-phb], 1716 [I-D.charny-pcn-single-marking], [I-D.babiarz-pcn-sip-cap], 1717 [I-D.lefaucheur-rsvp-ecn], [I-D.westberg-pcn-load-control]. The 1718 authors of them were: B, Briscoe, P. Eardley, D. Songhurst, F. Le 1719 Faucheur, A. Charny, J. Babiarz, K. Chan, S. Dudley, G. Karagiannis, 1720 A. Bader, L. Westberg, J. Zhang, V. Liatsos, X-G. Liu, A. Bhargava. 1722 14. Comments Solicited 1724 Comments and questions are encouraged and very welcome. They can be 1725 addressed to the IETF PCN working group mailing list . 1727 15. Changes 1729 15.1. Changes from -04 to -05 1731 Minor nits removed as follows: 1733 o Further minor changes to reflect that baseline encoding is 1734 consensus, standards track document, whilst there can be 1735 (experimental track) encoding extensions 1737 o Traffic conditioning updated to reflect discussions in Dublin, 1738 mainly that PCN-interior-nodes don't police PCN-traffic (so 1739 deleted bullet in S7.1) and that it is not advised to have non 1740 PCN-traffic that shares the same capacity (on a link) as PCN- 1741 traffic (so added bullet in S6.5) 1743 o Probing moved into Appendix A and deleted the 'third viewpoint' 1744 (admission control based on the marking of a single packet like an 1745 RSVP PATH message) - since this isn't really probing, and in any 1746 case is already mentioned in S6.1. 1748 o Minor changes to S9 Operations and management - mainly to reflect 1749 that consensus on marking behaviour has simplified things so eg 1750 there are fewer parameters to configure. 1752 o A few terminology-related errors expunged, and two pictures added 1753 to help. 1755 o Re-phrased of the claim about the natural decision point in S7.4 1757 o Clarified that extended encoding schemes need to explain their 1758 interactions with (or assumptions about) tunnelling (S7.7) and how 1759 they meet the guidelines of BCP124 (S6.6) 1761 o Corrected the third bullet in S6.2 (to reflect consensus about 1762 PCN-marking) 1764 15.2. Changes from -03 to -04 1766 o Minor changes throughout to reflect the consensus call about PCN- 1767 marking (as reflected in [I-D.eardley-pcn-marking-behaviour]). 1769 o Minor changes throughout to reflect the current decisions about 1770 encoding (as reflected in [I-D.moncaster-pcn-baseline-encoding]and 1771 [I-D.moncaster-pcn-3-state-encoding]). 1773 o Introduction: re-structured to create new sections on Benefits, 1774 Deployment scenarios and Assumptions. 1776 o Introduction: Added pointers to other PCN documents. 1778 o Terminology: changed PCN-lower-rate to PCN-threshold-rate and PCN- 1779 upper-rate to PCN-excess-rate; excess-rate-marking to excess- 1780 traffic-marking. 1782 o Benefits: added bullet about SRLGs. 1784 o Deployment scenarios: new section combining material from various 1785 places within the document. 1787 o S6 (high level functional architecture): re-structured and edited 1788 to improve clarity, and reflect the latest PCN-marking and 1789 encoding drafts. 1791 o S6.4: added claim that the most natural place to make an admission 1792 decision is a PCN-egress-node. 1794 o S6.5: updated the bullet about non-PCN-traffic that uses the same 1795 DSCP as PCN-traffic. 1797 o S6.6: added a section about backwards compatibility with respect 1798 to [RFC4774]. 1800 o Appendix A: added bullet about end-to-end PCN. 1802 o Probing: moved to Appendix B. 1804 o Other minor clarifications, typos etc. 1806 15.3. Changes from -02 to -03 1808 o Abstract: Clarified by removing the term 'aggregated'. Follow-up 1809 clarifications later in draft: S1: expanded PCN-egress-nodes 1810 bullet to mention case where the PCN-feedback-information is about 1811 one (or a few) PCN-marks, rather than aggregated information; S3 1812 clarified PCN-meter; S5 minor changes; conclusion. 1814 o S1: added a paragraph about how the PCN-domain looks to the 1815 outside world (essentially it looks like a DiffServ domain). 1817 o S2: tweaked the PCN-traffic terminology bullet: changed PCN 1818 traffic classes to PCN behaviour aggregates, to be more in line 1819 with traditional DiffServ jargon (-> follow-up changes later in 1820 draft); included a definition of PCN-flows (and corrected a couple 1821 of 'PCN microflows' to 'PCN-flows' later in draft) 1823 o S3.5: added possibility of downgrading to best effort, where PCN- 1824 packets arrive at PCN-ingress-node already ECN marked (CE or ECN 1825 nonce) 1827 o S4: added note about whether talk about PCN operating on an 1828 interface or on a link. In S8.1 (OAM) mentioned that PCN 1829 functionality needs to be configured consistently on either the 1830 ingress or the egress interface of PCN-nodes in a PCN-domain. 1832 o S5.2: clarified that signalling protocol installs flow filter spec 1833 at PCN-ingress-node (& updates after possible re-route) 1835 o S5.6: addressing: clarified 1837 o S5.7: added tunnelling issue of N^2 scaling if you set up a mesh 1838 of tunnels between PCN-boundary-nodes 1840 o S7.3: Clarified the "third viewpoint" of probing (always probe). 1842 o S8.1: clarified that SNMP is only an example; added note that an 1843 operator may be able to not run PCN on some PCN-interior-nodes, if 1844 it knows that these links will never become (pre-)congested; added 1845 note that it may be possible to have different PCN-boundary-node 1846 behaviours for different ingress-egress-aggregates within the same 1847 PCN-domain. 1849 o Appendix: Created an Appendix about "Possible work items beyond 1850 the scope of the current PCN WG Charter". Material moved from 1851 near start of S3 and elsewhere throughout draft. Moved text about 1852 centralised decision node to Appendix. 1854 o Other minor clarifications. 1856 15.4. Changes from -01 to -02 1858 o S1: Benefits: provisioning bullet extended to stress that PCN does 1859 not use RFC2475-style traffic conditioning. 1861 o S1: Deployment models: mentioned, as variant of PCN-domain 1862 extending to end nodes, that may extend to LAN edge switch. 1864 o S3.1: Trust Assumption: added note about not needing PCN-marking 1865 capability if known that an interface cannot become pre-congested. 1867 o S4: now divided into sub-sections 1869 o S4.1: Admission control: added second proposed method for how to 1870 decide to block new flows (PCN-egress-node receives one (or 1871 several) PCN-marked packets). 1873 o S5: Probing sub-section removed. Material now in new S7. 1875 o S5.6: Addressing: clarified how PCN-ingress-node can discover 1876 address of PCN-egress-node 1878 o S5.6: Addressing: centralised node case, added that PCN-ingress- 1879 node may need to know address of PCN-egress-node 1881 o S5.8: Tunnelling: added case of "partially PCN-capable tunnel" and 1882 degraded bullet on this in S6 (Open Issues) 1884 o S7: Probing: new section. Much more comprehensive than old S5.5. 1886 o S8: Operations and Management: substantially revised. 1888 o other minor changes not affecting semantics 1890 15.5. Changes from -00 to -01 1892 In addition to clarifications and nit squashing, the main changes 1893 are: 1895 o S1: Benefits: added one about provisioning (and contrast with 1896 DiffServ SLAs) 1898 o S1: Benefits: clarified that the objective is also to stop PCN- 1899 packets being significantly delayed (previously only mentioned not 1900 dropping packets) 1902 o S1: Deployment models: added one where policing is done at ingress 1903 of access network and not at ingress of PCN-domain (assume trust 1904 between networks) 1906 o S1: Deployment models: corrected MPLS-TE to MPLS 1908 o S2: Terminology: adjusted definition of PCN-domain 1910 o S3.5: Other assumptions: corrected, so that two assumptions (PCN- 1911 nodes not performing ECN and PCN-ingress-node discarding arriving 1912 CE packet) only apply if the PCN WG decides to encode PCN-marking 1913 in the ECN-field. 1915 o S4 & S5: changed PCN-marking algorithm to marking behaviour 1917 o S4: clarified that PCN-interior-node functionality applies for 1918 each outgoing interface, and added clarification: "The 1919 functionality is also done by PCN-ingress-nodes for their outgoing 1920 interfaces (ie those 'inside' the PCN-domain)." 1922 o S4 (near end): altered to say that a PCN-node "should" dedicate 1923 some capacity to lower priority traffic so that it isn't starved 1924 (was "may") 1926 o S5: clarified to say that PCN functionality is done on an 1927 'interface' (rather than on a 'link') 1929 o S5.2: deleted erroneous mention of service level agreement 1931 o S5.5: Probing: re-written, especially to distinguish probing to 1932 test the ingress-egress-aggregate from probing to test a 1933 particular ECMP path. 1935 o S5.7: Addressing: added mention of probing; added that in the case 1936 where traffic is always tunnelled across the PCN-domain, add a 1937 note that he PCN-ingress-node needs to know the address of the 1938 PCN-egress-node. 1940 o S5.8: Tunnelling: re-written, especially to provide a clearer 1941 description of copying on tunnel entry/exit, by adding explanation 1942 (keeping tunnel encaps/decaps and PCN-marking orthogonal), 1943 deleting one bullet ("if the inner header's marking state is more 1944 sever then it is preserved" - shouldn't happen), and better 1945 referencing of other IETF documents. 1947 o S6: Open issues: stressed that "NOTE: Potential solutions are out 1948 of scope for this document" and edited a couple of sentences that 1949 were close to solution space. 1951 o S6: Open issues: added one about scenarios with only one tunnel 1952 endpoint in the PCN domain . 1954 o S6: Open issues: ECMP: added under-admission as another potential 1955 risk 1957 o S6: Open issues: added one about "Silent at start" 1959 o S10: Conclusions: a small conclusions section added 1961 16. Appendix: Possible work items beyond the scope of the current PCN 1962 WG Charter 1964 This section mentions some topics that are outside the PCN WG's 1965 current Charter, but which have been mentioned as areas of interest. 1966 They might be work items for: the PCN WG after a future re- 1967 chartering; some other IETF WG; another standards body; an operator- 1968 specific usage that's not standardised. 1970 NOTE: it should be crystal clear that this section discusses 1971 possibilities only. 1973 The first set of possibilities relate to the restrictions on scope 1974 imposed by the PCN WG Charter (see Section 5): 1976 o a single PCN-domain encompasses several autonomous systems that 1977 don't trust each other (perhaps by using a mechanism like re-ECN, 1978 [I-D.briscoe-re-pcn-border-cheat]. 1980 o not all the nodes run PCN. For example, the PCN-domain is a 1981 multi-site enterprise network. The sites are connected by a VPN 1982 tunnel; although PCN doesn't operate inside the tunnel, the PCN 1983 mechanisms still work properly because the of the good QoS on the 1984 virtual link (the tunnel). Another example is that PCN is 1985 deployed on the general Internet (ie widely but not universally 1986 deployed). 1988 o applying the PCN mechanisms to other types of traffic, ie beyond 1989 inelastic traffic. For instance, applying the PCN mechanisms to 1990 traffic scheduled with the Assured Forwarding per-hop behaviour. 1991 One example could be flow-rate adaptation by elastic applications, 1992 that adapts according to the pre-congestion information. 1994 o the aggregation assumption doesn't hold, because the link capacity 1995 is too low. Measurement-based admission control is then risky. 1997 o the applicability of PCN mechanisms for emergency use (911, GETS, 1998 WPS, MLPP, etc.) 2000 Other possibilities include: 2002 o Probing. This is discussed in Section 16.1 below. 2004 o The PCN-domain extends to the end users. The scenario is 2005 described in [I-D.babiarz-pcn-sip-cap]. The end users need to be 2006 trusted to do their own policing. This scenario is in the scope 2007 of the PCN WG charter if there is sufficient traffic for the 2008 aggregation assumption to hold. A variant is that the PCN-domain 2009 extends out as far as the LAN edge switch. 2011 o indicating pre-congestion through signalling messages rather than 2012 in-band (in the form of PCN-marked packets) 2014 o the decision-making functionality is at a centralised node rather 2015 than at the PCN-boundary-nodes. This requires that the PCN- 2016 egress-node signals PCN-feedback-information to the centralised 2017 node, and that the centralised node signals to the PCN-ingress- 2018 node the decision about admission (or termination). It may also 2019 need the centralised node and the PCN-boundary-nodes to know each 2020 other's addresses. It would be possible for the centralised node 2021 to be one of the PCN-boundary-nodes, when clearly the signalling 2022 would sometimes be replaced by a message internal to the node. 2023 The centralised case is described further in 2024 [I-D.tsou-pcn-racf-applic]. 2026 o Signalling extensions for specific protocols (eg RSVP, NSIS). For 2027 example: the details of how the signalling protocol installs the 2028 flowspec at the PCN-ingress-node for an admitted PCN-flow; and how 2029 the signalling protocol carries the PCN-feedback-information. 2030 Perhaps also for other functions such as: coping with failure of a 2031 PCN-boundary-node ([I-D.briscoe-tsvwg-cl-architecture] considers 2032 what happens if RSVP is the QoS signalling protocol); establishing 2033 a tunnel across the PCN-domain if it is necessary to carry ECN 2034 marks transparently. 2036 o Policing by the PCN-ingress-node may not be needed if the PCN- 2037 domain can trust that the upstream network has already policed the 2038 traffic on its behalf. 2040 o PCN for Pseudowire: PCN may be used as a congestion avoidance 2041 mechanism for edge to edge pseudowire emulations 2042 [I-D.ietf-pwe3-congestion-frmwk]. 2044 o PCN for MPLS: [RFC3270] defines how to support the DiffServ 2045 architecture in MPLS networks (Multi-protocol label switching). 2046 [RFC5129] describes how to add PCN for admission control of 2047 microflows into a set of MPLS aggregates. PCN-marking is done in 2048 MPLS's EXP field (which [I-D.andersson-mpls-expbits-def] proposes 2049 to re-name to the Class of Service (CoS) bits). 2051 o PCN for Ethernet: Similarly, it may be possible to extend PCN into 2052 Ethernet networks, where PCN-marking is done in the Ethernet 2053 header. NOTE: Specific consideration of this extension is outside 2054 the IETF's remit. 2056 16.1. Probing 2058 16.1.1. Introduction 2060 Probing is a potential mechanism to assist admission control. 2062 PCN's admission control, as described so far, is essentially a 2063 reactive mechanism where the PCN-egress-node monitors the pre- 2064 congestion level for traffic from each PCN-ingress-node; if the level 2065 rises then it blocks new flows on that ingress-egress-aggregate. 2066 However, it's possible that an ingress-egress-aggregate carries no 2067 traffic, and so the PCN-egress-node can't make an admission decision 2068 using the usual method described earlier. 2070 One approach is to be "optimistic" and simply admit the new flow. 2071 However it's possible to envisage a scenario where the traffic levels 2072 on other ingress-egress-aggregates are already so high that they're 2073 blocking new PCN-flows, and admitting a new flow onto this 'empty' 2074 ingress-egress-aggregate adds extra traffic onto the link that's 2075 already pre-congested - which may 'tip the balance' so that PCN's 2076 flow termination mechanism is activated or some packets are dropped. 2077 This risk could be lessened by configuring on each link sufficient 2078 'safety margin' above the PCN-threshold-rate. 2080 An alternative approach is to make PCN a more proactive mechanism. 2081 The PCN-ingress-node explicitly determines, before admitting the 2082 prospective new flow, whether the ingress-egress-aggregate can 2083 support it. This can be seen as a "pessimistic" approach, in 2084 contrast to the "optimism" of the approach above. It involves 2085 probing: a PCN-ingress-node generates and sends probe packets in 2086 order to test the pre-congestion level that the flow would 2087 experience. 2089 One possibility is that a probe packet is just a dummy data packet, 2090 generated by the PCN-ingress-node and addressed to the PCN-egress- 2091 node. 2093 16.1.2. Probing functions 2095 The probing functions are: 2097 o Make decision that probing is needed. As described above, this is 2098 when the ingress-egress-aggregate (or the ECMP path - Section 8) 2099 carries no PCN-traffic. An alternative is always to probe, ie 2100 probe before admitting every PCN-flow. 2102 o (if required) Communicate the request that probing is needed - the 2103 PCN-egress-node signals to the PCN-ingress-node that probing is 2104 needed 2106 o (if required) Generate probe traffic - the PCN-ingress-node 2107 generates the probe traffic. The appropriate number (or rate) of 2108 probe packets will depend on the PCN-marking algorithm; for 2109 example an excess-traffic-marking algorithm generates fewer PCN- 2110 marks than a threshold-marking algorithm, and so will need more 2111 probe packets. 2113 o Forward probe packets - as far as PCN-interior-nodes are 2114 concerned, probe packets are handled the same as (ordinary data) 2115 PCN-packets, in terms of routing, scheduling and PCN-marking. 2117 o Consume probe packets - the PCN-egress-node consumes probe packets 2118 to ensure that they don't travel beyond the PCN-domain. 2120 16.1.3. Discussion of rationale for probing, its downsides and open 2121 issues 2123 It is an unresolved question whether probing is really needed, but 2124 two viewpoints have been put forward as to why it is useful. The 2125 first is perhaps the most obvious: there is no PCN-traffic on the 2126 ingress-egress-aggregate. The second assumes that multipath routing 2127 ECMP is running in the PCN-domain. We now consider each in turn. 2129 The first viewpoint assumes the following: 2131 o There is no PCN-traffic on the ingress-egress-aggregate (so a 2132 normal admission decision cannot be made). 2134 o Simply admitting the new flow has a significant risk of leading to 2135 overload: packets dropped or flows terminated. 2137 On the former bullet, [PCN-email-traffic-empty-aggregates] suggests 2138 that, during the future busy hour of a national network with about 2139 100 PCN-boundary-nodes, there are likely to be significant numbers of 2140 aggregates with very few flows under nearly all circumstances. 2142 The latter bullet could occur if a new flow starts on many of the 2143 empty ingress-egress-aggregates and causes overload on a link in the 2144 PCN-domain. To be a problem this would probably have to happen in a 2145 short time period (flash crowd) because, after the reaction time of 2146 the system, other (non-empty) ingress-egress-aggregates that pass 2147 through the link will measure pre-congestion and so block new flows, 2148 and also flows naturally end anyway. 2150 The downsides of probing for this viewpoint are: 2152 o Probing adds delay to the admission control process. 2154 o Sufficient probing traffic has to be generated to test the pre- 2155 congestion level of the ingress-egress-aggregate. But the probing 2156 traffic itself may cause pre-congestion, causing other PCN-flows 2157 to be blocked or even terminated - and in the flash crowd scenario 2158 there will be probing on many ingress-egress-aggregates. 2160 The open issues associated with this viewpoint include: 2162 o What rate and pattern of probe packets does the PCN-ingress-node 2163 need to generate, so that there's enough traffic to make the 2164 admission decision? 2166 o What difficulty does the delay (whilst probing is done) cause 2167 applications, eg packets might be dropped? 2169 o Are there other ways of dealing with the flash crowd scenario? 2170 For instance limit the rate at which new flows are admitted; or 2171 perhaps for a PCN-egress-node to block new flows on its empty 2172 ingress-egress-aggregates when its non-empty ones are pre- 2173 congested. 2175 The second viewpoint applies in the case where there is multipath 2176 routing (ECMP) in the PCN-domain. Note that ECMP is often used on 2177 core networks. There are two possibilities: 2179 (1) If admission control is based on measurements of the ingress- 2180 egress-aggregate, then the viewpoint that probing is useful assumes: 2182 o there's a significant chance that the traffic is unevenly balanced 2183 across the ECMP paths, and hence there's a significant risk of 2184 admitting a flow that should be blocked (because it follows an 2185 ECMP path that is pre-congested) or blocking a flow that should be 2186 admitted. 2188 o Note: [PCN-email-ECMP] suggests unbalanced traffic is quite 2189 possible, even with quite a large number of flows on a PCN-link 2190 (eg 1000) when Assumption 3 (aggregation) is likely to be 2191 satisfied. 2193 (2) If admission control is based on measurements of pre-congestion 2194 on specific ECMP paths, then the viewpoint that probing is useful 2195 assumes: 2197 o There is no PCN-traffic on the ECMP path on which to base an 2198 admission decision. 2200 o Simply admitting the new flow has a significant risk of leading to 2201 overload. 2203 o The PCN-egress-node can match a packet to an ECMP path. 2205 o Note: This is similar to the first viewpoint and so similarly 2206 could occur in a flash crowd if a new flow starts more-or-less 2207 simultaneously on many of the empty ECMP paths. Because there are 2208 several (sometimes many) ECMP paths between each pair of PCN- 2209 boundary-nodes, it's presumably more likely that an ECMP path is 2210 'empty' than an ingress-egress-aggregate. To constrain the number 2211 of ECMP paths, a few tunnels could be set-up between each pair of 2212 PCN-boundary-nodes. Tunnelling also solves the bullet immediately 2213 above (which is otherwise hard because an ECMP routing decision is 2214 made independently on each node). 2216 The downsides of probing for this viewpoint are: 2218 o Probing adds delay to the admission control process. 2220 o Sufficient probing traffic has to be generated to test the pre- 2221 congestion level of the ECMP path. But there's the risk that the 2222 probing traffic itself may cause pre-congestion, causing other 2223 PCN-flows to be blocked or even terminated. 2225 o The PCN-egress-node needs to consume the probe packets to ensure 2226 they don't travel beyond the PCN-domain (eg they might confuse the 2227 destination end node). Hence somehow the PCN-egress-node has to 2228 be able to disambiguate a probe packet from a data packet, via the 2229 characteristic setting of particular bit(s) in the packet's header 2230 or body - but these bit(s) mustn't be used by any PCN-interior- 2231 node's ECMP algorithm. In the general case this isn't possible, 2232 but it should be OK for a typical ECMP algorithm which examines: 2233 the source and destination IP addresses and port numbers, the 2234 protocol ID and the DSCP. 2236 17. Informative References 2238 [I-D.briscoe-tsvwg-cl-architecture] 2239 Briscoe, B., "An edge-to-edge Deployment Model for Pre- 2240 Congestion Notification: Admission Control over a 2241 DiffServ Region", draft-briscoe-tsvwg-cl-architecture-04 2242 (work in progress), October 2006. 2244 [I-D.briscoe-tsvwg-cl-phb] 2245 Briscoe, B., "Pre-Congestion Notification marking", 2246 draft-briscoe-tsvwg-cl-phb-03 (work in progress), 2247 October 2006. 2249 [I-D.babiarz-pcn-sip-cap] 2250 Babiarz, J., "SIP Controlled Admission and Preemption", 2251 draft-babiarz-pcn-sip-cap-00 (work in progress), 2252 October 2006. 2254 [I-D.lefaucheur-rsvp-ecn] 2255 Faucheur, F., "RSVP Extensions for Admission Control over 2256 Diffserv using Pre-congestion Notification (PCN)", 2257 draft-lefaucheur-rsvp-ecn-01 (work in progress), 2258 June 2006. 2260 [I-D.chan-pcn-problem-statement] 2261 Chan, K., "Pre-Congestion Notification Problem Statement", 2262 draft-chan-pcn-problem-statement-01 (work in progress), 2263 October 2006. 2265 [I-D.ietf-pwe3-congestion-frmwk] 2266 "Pseudowire Congestion Control Framework", May 2008, . 2270 [I-D.briscoe-tsvwg-ecn-tunnel] 2271 "Layered Encapsulation of Congestion Notification", 2272 July 2008, . 2275 [I-D.charny-pcn-single-marking] 2276 "Pre-Congestion Notification Using Single Marking for 2277 Admission and Termination", November 2007, . 2281 [I-D.eardley-pcn-architecture] 2282 "Pre-Congestion Notification Architecture", June 2007, . 2286 [I-D.westberg-pcn-load-control] 2287 "LC-PCN: The Load Control PCN Solution", July 2008, . 2291 [I-D.behringer-tsvwg-rsvp-security-groupkeying] 2292 "Applicability of Keying Methods for RSVP Security", 2293 November 2007, . 2296 [I-D.briscoe-re-pcn-border-cheat] 2297 "Emulating Border Flow Policing using Re-ECN on Bulk 2298 Data", February 2008, . 2301 [RFC5129] "Explicit Congestion Marking in MPLS", RFC 5129, 2302 January 2008. 2304 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 2305 RFC 4303, December 2005. 2307 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 2308 and W. Weiss, "An Architecture for Differentiated 2309 Services", RFC 2475, December 1998. 2311 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 2312 J., Courtney, W., Davari, S., Firoiu, V., and D. 2313 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 2314 Behavior)", RFC 3246, March 2002. 2316 [RFC4594] Babiarz, J., Chan, K., and F. Baker, "Configuration 2317 Guidelines for DiffServ Service Classes", RFC 4594, 2318 August 2006. 2320 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 2321 of Explicit Congestion Notification (ECN) to IP", 2322 RFC 3168, September 2001. 2324 [RFC2211] Wroclawski, J., "Specification of the Controlled-Load 2325 Network Element Service", RFC 2211, September 1997. 2327 [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., 2328 Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. 2329 Felstaine, "A Framework for Integrated Services Operation 2330 over Diffserv Networks", RFC 2998, November 2000. 2332 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 2333 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 2334 Protocol Label Switching (MPLS) Support of Differentiated 2335 Services", RFC 3270, May 2002. 2337 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 2338 Services in the Internet Architecture: an Overview", 2339 RFC 1633, June 1994. 2341 [RFC2983] Black, D., "Differentiated Services and Tunnels", 2342 RFC 2983, October 2000. 2344 [RFC2747] Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic 2345 Authentication", RFC 2747, January 2000. 2347 [RFC3411] Harrington, D., Presuhn, R., and B. Wijnen, "An 2348 Architecture for Describing Simple Network Management 2349 Protocol (SNMP) Management Frameworks", STD 62, RFC 3411, 2350 December 2002. 2352 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 2353 Metric for IP Performance Metrics (IPPM)", RFC 3393, 2354 November 2002. 2356 [RFC4216] Zhang, R. and J. Vasseur, "MPLS Inter-Autonomous System 2357 (AS) Traffic Engineering (TE) Requirements", RFC 4216, 2358 November 2005. 2360 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 2361 Zekauskas, "A One-way Active Measurement Protocol 2362 (OWAMP)", RFC 4656, September 2006. 2364 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 2365 Explicit Congestion Notification (ECN) Field", BCP 124, 2366 RFC 4774, November 2006. 2368 [RFC4778] Kaeo, M., "Operational Security Current Practices in 2369 Internet Service Provider Environments", RFC 4778, 2370 January 2007. 2372 [ITU-MLPP] 2373 "Multilevel Precedence and Pre-emption Service (MLPP)", 2374 ITU-T Recommendation I.255.3, 1990. 2376 [Iyer] "An approach to alleviate link overload as observed on an 2377 IP backbone", IEEE INFOCOM , 2003, 2378 . 2380 [Y.1541] "Network Performance Objectives for IP-based Services", 2381 ITU-T Recommendation Y.1541, February 2006. 2383 [P.800] "Methods for subjective determination of transmission 2384 quality", ITU-T Recommendation P.800, August 1996. 2386 [Songhurst] 2387 "Guaranteed QoS Synthesis for Admission Control with 2388 Shared Capacity", BT Technical Report TR-CXR9-2006-001, 2389 Feburary 2006, . 2392 [Menth] "PCN-Based Resilient Network Admission Control: The Impact 2393 of a Single Bit"", Technical Report , 2007, . 2397 [PCN-email-ECMP] 2398 "Email to PCN WG mailing list", November 2007, . 2401 [PCN-email-traffic-empty-aggregates] 2402 "Email to PCN WG mailing list", October 2007, . 2405 [PCN-email-SRLG] 2406 "Email to PCN WG mailing list", March 2008, . 2409 [I-D.eardley-pcn-marking-behaviour] 2410 "Marking behaviour of PCN-nodes", June 2008, . 2414 [I-D.moncaster-pcn-baseline-encoding] 2415 "Baseline Encoding and Transport of Pre-Congestion 2416 Information", July 2008, . 2420 [I-D.moncaster-pcn-3-state-encoding] 2421 "A three state extended PCN encoding scheme", June 2008, < 2422 http://www.ietf.org/internet-drafts/ 2423 draft-moncaster-pcn-3-state-encoding-00.txt>. 2425 [I-D.charny-pcn-comparison] 2426 "Pre-Congestion Notification Using Single Marking for 2427 Admission and Termination", November 2007, . 2431 [I-D.tsou-pcn-racf-applic] 2432 "Applicability Statement for the Use of Pre-Congestion 2433 Notification in a Resource-Controlled Network", 2434 February 2008, . 2437 [I-D.sarker-pcn-ecn-pcn-usecases] 2438 "Usecases and Benefits of end to end ECN support in PCN 2439 Domains", May 2008, . 2442 [I-D.andersson-mpls-expbits-def] 2443 "MPLS EXP-bits definition", March 2008, . 2447 [I-D.menth-pcn-psdm-encoding] 2448 "PCN Encoding for Packet-Specific Dual Marking (PSDM)", 2449 July 2008, . 2452 [I-D.menth-pcn-emft] 2453 "Edge-Assisted Marked Flow Termination", February 2008, 2454 . 2456 [Menth08] "PCN-Based Admission Control and Flow Termination", 2008, 2457 . 2460 [Hancock] "Slide 14 of 'NSIS: An Outline Framework for QoS 2461 Signalling'", May 2002, . 2464 Author's Address 2466 Philip Eardley 2467 BT 2468 B54/77, Sirius House Adastral Park Martlesham Heath 2469 Ipswich, Suffolk IP5 3RE 2470 United Kingdom 2472 Email: philip.eardley@bt.com 2474 Full Copyright Statement 2476 Copyright (C) The IETF Trust (2008). 2478 This document is subject to the rights, licenses and restrictions 2479 contained in BCP 78, and except as set forth therein, the authors 2480 retain all their rights. 2482 This document and the information contained herein are provided on an 2483 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2484 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2485 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2486 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2487 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2488 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2490 Intellectual Property 2492 The IETF takes no position regarding the validity or scope of any 2493 Intellectual Property Rights or other rights that might be claimed to 2494 pertain to the implementation or use of the technology described in 2495 this document or the extent to which any license under such rights 2496 might or might not be available; nor does it represent that it has 2497 made any independent effort to identify any such rights. Information 2498 on the procedures with respect to rights in RFC documents can be 2499 found in BCP 78 and BCP 79. 2501 Copies of IPR disclosures made to the IETF Secretariat and any 2502 assurances of licenses to be made available, or the result of an 2503 attempt made to obtain a general license or permission for the use of 2504 such proprietary rights by implementers or users of this 2505 specification can be obtained from the IETF on-line IPR repository at 2506 http://www.ietf.org/ipr. 2508 The IETF invites any interested party to bring to its attention any 2509 copyrights, patents or patent applications, or other proprietary 2510 rights that may cover technology that may be required to implement 2511 this standard. Please address the information to the IETF at 2512 ietf-ipr@ietf.org. 2514 Acknowledgment 2516 Funding for the RFC Editor function is provided by the IETF 2517 Administrative Support Activity (IASA).