idnits 2.17.1 draft-ietf-pcn-architecture-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2027. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2038. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2045. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2051. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 19, 2007) is 5997 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '2983' on line 964 == Unused Reference: 'RFC2119' is defined on line 1945, but no explicit reference was found in the text == Outdated reference: A later version (-02) exists of draft-ietf-pwe3-congestion-frmwk-00 == Outdated reference: A later version (-05) exists of draft-westberg-pcn-load-control-02 == Outdated reference: A later version (-01) exists of draft-behringer-tsvwg-rsvp-security-groupkeying-00 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Congestion and Pre-Congestion Philip. Eardley (Editor) 3 Notification Working Group BT 4 Internet-Draft November 19, 2007 5 Intended status: Informational 6 Expires: May 22, 2008 8 Pre-Congestion Notification Architecture 9 draft-ietf-pcn-architecture-02 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on May 22, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 The purpose of this document is to describe a general architecture 43 for flow admission and termination based on aggregated pre-congestion 44 information in order to protect the quality of service of established 45 inelastic flows within a single DiffServ domain. 47 Status 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 52 3. Assumptions and constraints on scope . . . . . . . . . . . . . 9 53 3.1. Assumption 1: Trust - controlled environment . . . . . . . 10 54 3.2. Assumption 2: Real-time applications . . . . . . . . . . . 10 55 3.3. Assumption 3: Many flows and additional load . . . . . . . 11 56 3.4. Assumption 4: Emergency use out of scope . . . . . . . . . 11 57 3.5. Other assumptions . . . . . . . . . . . . . . . . . . . . 11 58 4. High-level functional architecture . . . . . . . . . . . . . . 12 59 4.1. Flow admission . . . . . . . . . . . . . . . . . . . . . . 12 60 4.2. Flow termination . . . . . . . . . . . . . . . . . . . . . 13 61 4.3. Flow admission and flow termination . . . . . . . . . . . 14 62 4.4. Information transport . . . . . . . . . . . . . . . . . . 15 63 4.5. PCN-traffic . . . . . . . . . . . . . . . . . . . . . . . 15 64 5. Detailed Functional architecture . . . . . . . . . . . . . . . 16 65 5.1. PCN-interior-node functions . . . . . . . . . . . . . . . 17 66 5.2. PCN-ingress-node functions . . . . . . . . . . . . . . . . 17 67 5.3. PCN-egress-node functions . . . . . . . . . . . . . . . . 18 68 5.4. Admission control functions . . . . . . . . . . . . . . . 18 69 5.5. Flow termination functions . . . . . . . . . . . . . . . . 19 70 5.6. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 20 71 5.7. Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . 21 72 5.8. Fault handling . . . . . . . . . . . . . . . . . . . . . . 22 73 6. Design goals and challenges . . . . . . . . . . . . . . . . . 23 74 7. Probing . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 75 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 25 76 7.2. Probing functions . . . . . . . . . . . . . . . . . . . . 26 77 7.3. Discussion of rationale for probing, its downsides and 78 open issues . . . . . . . . . . . . . . . . . . . . . . . 27 79 8. Operations and Management . . . . . . . . . . . . . . . . . . 30 80 8.1. Configuration OAM . . . . . . . . . . . . . . . . . . . . 30 81 8.1.1. System options . . . . . . . . . . . . . . . . . . . . 30 82 8.1.2. Parameters . . . . . . . . . . . . . . . . . . . . . . 31 83 8.2. Performance & Provisioning OAM . . . . . . . . . . . . . . 33 84 8.3. Accounting OAM . . . . . . . . . . . . . . . . . . . . . . 34 85 8.4. Fault OAM . . . . . . . . . . . . . . . . . . . . . . . . 34 86 8.5. Security OAM . . . . . . . . . . . . . . . . . . . . . . . 35 87 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 88 10. Security considerations . . . . . . . . . . . . . . . . . . . 36 89 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 90 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 91 13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38 92 14. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 93 15. Informative References . . . . . . . . . . . . . . . . . . . . 40 94 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 44 95 Intellectual Property and Copyright Statements . . . . . . . . . . 45 97 1. Introduction 99 The purpose of this document is to describe a general architecture 100 for flow admission and termination based on aggregated (pre-) 101 congestion information in order to protect the quality of service of 102 flows within a DiffServ domain [RFC2475]. This document defines an 103 architecture for implementing two mechanisms to protect the quality 104 of service of established inelastic flows within a single DiffServ 105 domain, where all boundary and interior nodes are PCN-enabled and 106 trust each other for correct PCN operation. Flow admission control 107 determines whether a new flow should be admitted and protects the QoS 108 of existing PCN-flows in normal circumstances, by avoiding congestion 109 occurring. However, in abnormal circumstances, for instance a 110 disaster affecting multiple nodes and causing traffic re-routes, then 111 the QoS on existing PCN-flows may degrade even though care was 112 exercised when admitting those flows before those circumstances. 113 Therefore we also propose a mechanism for flow termination, which 114 removes enough traffic in order to protect the QoS of the remaining 115 PCN-flows. 117 As a fundamental building block to enable these two mechanisms, PCN- 118 interior-nodes generate, encode and transport pre-congestion 119 information towards the PCN-egress-nodes. Two rates, a PCN-lower- 120 rate and a PCN-upper-rate, can be associated with each link of the 121 PCN-domain. Each rate is used by a marking behaviour (specified in 122 another document) that determines how and when a number of PCN- 123 packets are marked, and how the markings are encoded in packet 124 headers. PCN-egress-nodes make measurements of the packet markings 125 and send information as necessary to the nodes that make the decision 126 about which PCN-flows to accept/reject or terminate, based on this 127 information. Another document will describe the decision-making 128 behaviours. Overall the aim is to enable PCN-nodes to give an "early 129 warning" of potential congestion before there is any significant 130 build-up of PCN-packets in the queue; the admission control mechanism 131 limits the PCN-traffic on each link to *roughly* its PCN-lower-rate 132 and the flow termination mechanism limits the PCN-traffic on each 133 link to *roughly* its PCN-upper-rate. 135 We believe that the key benefits of the PCN mechanisms described in 136 this document are that they are simple, scalable, and robust because: 138 o Per flow state is only required at the PCN-ingress-nodes 139 ("stateless core"). This is required for policing purposes (to 140 prevent non-admitted PCN traffic from entering the PCN-domain) and 141 so on. It is not generally required that other network entities 142 are aware of individual flows (although they may be in particular 143 deployment scenarios). 145 o Admission control is resilient: PCN's QoS is decoupled from the 146 routing system; hence in general admitted flows can survive 147 capacity, routing or topology changes without additional 148 signalling, and they don't have to be told (or learn) about such 149 changes. The PCN-lower-rates can be chosen small enough that 150 admitted traffic can still be carried after a rerouting in most 151 failure cases. This is an important feature as QoS violations in 152 core networks due to link failures are more likely than QoS 153 violations due to increased traffic volume [Iyer]. 155 o The PCN-marking behaviours only operate on the overall PCN-traffic 156 on the link, not per flow. 158 o The information of these measurements is signalled to the PCN- 159 egress-nodes by the PCN-marks in the packet headers. No 160 additional signalling protocol is required for transporting the 161 PCN-marks. Therefore no secure binding is required between data 162 packets and separate congestion messages. 164 o The PCN-egress-nodes make separate measurements, operating on the 165 overall PCN-traffic, for each PCN-ingress-node, ie not per flow. 166 Similarly, signalling by the PCN-egress-node of PCN-feedback- 167 information (which is used for flow admission and termination 168 decisions) is at the granularity of the ingress-egress-aggregate. 170 o The admitted PCN-load is controlled dynamically. Therefore it 171 adapts as the traffic matrix changes, and also if the network 172 topology changes (eg after a link failure). Hence an operator can 173 be less conservative when deploying network capacity, and less 174 accurate in their prediction of the PCN-traffic matrix. 176 o The termination mechanism complements admission control. It 177 allows the network to recover from sudden unexpected surges of 178 PCN-traffic on some links, thus restoring QoS to the remaining 179 flows. Such scenarios are expected to be rare but not impossible. 180 They can be caused by large network failures that redirect lots of 181 admitted PCN-traffic to other links, or by malfunction of the 182 measurement-based admission control in the presence of admitted 183 flows that send for a while with an atypically low rate and then 184 increase their rates in a correlated way. 186 o The PCN-upper-rate may be set below the maximum rate that PCN- 187 traffic can be transmitted on a link, in order to trigger 188 termination of some PCN-flows before loss (or excessive delay) of 189 PCN-packets occurs, or to keep the maximum PCN-load on a link 190 below a level configured by the operator. 192 o Provisioning of the network is decoupled from the process of 193 adding new customers. By contrast, with the DiffServ architecture 194 [RFC2475] operators rely on subscription-time Service Level 195 Agreements that statically define the parameters of the traffic 196 that will be accepted from a customer, and so the operator has to 197 run the provisioning process each time a new customer is added to 198 check that the Service Level Agreement can be fulfilled. PCN does 199 not use RFC2475-style traffic conditioning. 201 Operators of networks will want to use the PCN mechanisms in various 202 arrangements, for instance depending on how they are performing 203 admission control outside the PCN-domain (users after all are 204 concerned about QoS end-to-end), what their particular goals and 205 assumptions are, and so on. Several deployment models are possible: 207 o An operator may choose to deploy either admission control or flow 208 termination or both (see Section 4.3). 210 o IntServ over DiffServ [RFC2998]. The DiffServ region is PCN- 211 enabled, RSVP signalling is used end-to-end and the PCN-domain is 212 a single RSVP hop, ie only the PCN-boundary-nodes process RSVP 213 messages. Outside the PCN-domain RSVP messages are processed on 214 each hop. This is described in 215 [I-D.briscoe-tsvwg-cl-architecture] 217 o RSVP signalling is originated and/or terminated by proxies, with 218 application-layer signalling between the end user and the proxy. 219 For instance SIP signalling with a home hub. 221 o Similar to previous bullets but NSIS signalling is used instead of 222 RSVP. 224 o NOTE: Consideration of signalling extensions for specific 225 protocols is outside the scope of the PCN WG, however it will 226 produce a "Requirements for signalling" document as potential 227 input for the appropriate WGs. 229 o Depending on the deployment scenario, the decision-making 230 functionality (about flow admission and termination) could reside 231 at the PCN-ingress-nodes or PCN-egress-nodes or at some central 232 control node in the PCN-domain. NOTE: The Charter restricts us: 233 the decision-making functionality is at the PCN-boundary-nodes. 235 o If the operator runs both the access network and the core network, 236 one deployment scenario is that only the core network uses PCN 237 admission control but per microflow policing is done at the 238 ingress to the access network and not at the PCN-ingress-node. 239 Note: to aid readability, the rest of this draft assumes that 240 policing is done by the PCN-ingress-nodes. 242 o There are several PCN-domains on the end-to-end path, each 243 operating PCN mechanisms independently. NOTE: The Charter 244 restricts us to considering a single PCN-domain. A possibility 245 after re-chartering is to consider that the PCN-domain encompasses 246 several autonomous systems that don't trust each other (ie weakens 247 Assumption 1 about trust, see Section 3.1) 249 o The PCN-domain extends to the end users. NOTE: This isn't 250 necessarily outside the Charter because it may not break 251 Assumption 3 (aggregation see later) if it's known there's 252 sufficient aggregation at any bottleneck, and it doesn't 253 necessarily break Assumption 1 (trust), because in some 254 environments, eg corporate, the end user may have a controlled 255 configuration and so be trusted. The scenario is described in 256 [I-D.babiarz-pcn-sip-cap]. A variant is that the PCN-domain 257 extends out as far as the LAN edge switch. 259 o Pseudowire: PCN may be used as a congestion avoidance mechanism 260 for edge to edge pseudowire emulations 261 [I-D.ietf-pwe3-congestion-frmwk]. NOTE: Specific consideration of 262 pseudowires is not in the PCN WG Charter. 264 o MPLS: [RFC3270] defines how to support the DiffServ architecture 265 in MPLS networks. [I-D.ietf-tsvwg-ecn-mpls] describes how to add 266 PCN for admission control of microflows into a set of MPLS 267 aggregates (Multi-protocol label switching). PCN-marking is done 268 in MPLS's EXP field. 270 o Similarly, it may be possible to extend PCN into Ethernet 271 networks, where PCN-marking is done in the Ethernet header. NOTE: 272 Specific consideration of this extension is outside the IETF's 273 remit. 275 2. Terminology 277 o PCN-domain: a PCN-capable domain; a contiguous set of PCN-enabled 278 nodes that perform DiffServ scheduling; the compete set of PCN- 279 nodes whose PCN-marking can in principle influence decisions about 280 flow admission and termination for the PCN-domain, including the 281 PCN-egress-nodes which measure these PCN-marks. 283 o PCN-boundary-node: a PCN-node that connects one PCN-domain to a 284 node either in another PCN-domain or in a non PCN-domain. 286 o PCN-interior-node: a node in a PCN-domain that is not a PCN- 287 boundary-node. 289 o PCN-node: a PCN-boundary-node or a PCN-interior-node 291 o PCN-egress-node: a PCN-boundary-node in its role in handling 292 traffic as it leaves a PCN-domain. 294 o PCN-ingress-node: a PCN-boundary-node in its role in handling 295 traffic as it enters a PCN-domain. 297 o PCN-traffic: A PCN-domain carries traffic of different DiffServ 298 classes [RFC4594]. Those using the PCN mechanisms are called PCN- 299 classes (collectively called PCN-traffic) and the corresponding 300 packets are PCN-packets. The same network may carry traffic using 301 other DiffServ classes. 303 o Ingress-egress-aggregate: The collection of PCN-packets from all 304 PCN-flows that travel in one direction between a specific pair of 305 PCN-boundary-nodes. 307 o PCN-lower-rate: a reference rate configured for each link in the 308 PCN-domain, which is lower than the PCN-upper-rate. It is used by 309 a marking behaviour that determines whether a packet should be 310 PCN-marked with a first encoding. 312 o PCN-upper-rate: a reference rate configured for each link in the 313 PCN-domain, which is higher than the PCN-lower-rate. It is used 314 by a marking behaviour that determines whether a packet should be 315 PCN-marked with a second encoding. 317 o Threshold-marking: a PCN-marking behaviour such that all PCN- 318 traffic is marked if the PCN-traffic exceeds a particular rate 319 (either the PCN-lower-rate or PCN-upper-rate). NOTE: The 320 definition reflects the overall intent rather than its 321 instantaneous behaviour, since the rate measured at a particular 322 moment depends on the behaviour, its implementation and the 323 traffic's variance as well as its rate. 325 o Excess-rate-marking: a PCN-marking behaviour such that the amount 326 of PCN-traffic that is PCN-marked is equal to the amount that 327 exceeds a particular rate (either the PCN-lower-rate or PCN-upper- 328 rate). NOTE: The definition reflects the overall intent rather 329 than its instantaneous behaviour, since the rate measured at a 330 particular moment depends on the behaviour, its implementation and 331 the traffic's variance as well as its rate. 333 o Pre-congestion: a condition of a link within a PCN-domain in which 334 the PCN-node performs PCN-marking, in order to provide an "early 335 warning" of potential congestion before there is any significant 336 build-up of PCN-packets in the real queue. 338 o PCN-marking: the process of setting the header in a PCN-packet 339 based on defined rules, in reaction to pre-congestion. 341 o PCN-feedback-information: information signalled by a PCN-egress- 342 node to a PCN-ingress-node or central control node, which is 343 needed for the flow admission and flow termination mechanisms. 345 3. Assumptions and constraints on scope 347 The PCN WG's charter restricts the initial scope by a set of 348 assumptions. Here we list those assumptions and explain them. 350 1. these components are deployed in a single DiffServ domain, within 351 which all PCN-nodes are PCN-enabled and trust each other for 352 truthful PCN-marking and transport 354 2. all flows handled by these mechanisms are inelastic and 355 constrained to a known peak rate through policing or shaping 357 3. the number of PCN-flows across any potential bottleneck link is 358 sufficiently large that stateless, statistical mechanisms can be 359 effective. To put it another way, the aggregate bit rate of PCN- 360 traffic across any potential bottleneck link needs to be 361 sufficiently large relative to the maximum additional bit rate 362 added by one flow 364 4. PCN-flows may have different precedence, but the applicability of 365 the PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc.) 366 is out of scope 368 After completion of the initial phase, the PCN WG may re-charter to 369 develop solutions for specific scenarios where some of these 370 restrictions are not in place. It may also re-charter to consider 371 applying the PCN mechanisms to additional deployment scenarios. One 372 possible example is where a single PCN-domain encompasses several 373 DiffServ domains that don't trust each other (perhaps by using a 374 mechanism like re-ECN, [I-D.briscoe-re-pcn-border-cheat]. The WG may 375 also re-charter to investigate additional response mechanisms that 376 act on (pre-)congestion information. One example could be flow-rate 377 adaptation by elastic applications (rather than flow admission or 378 termination). The details of these work items are outside the scope 379 of the initial phase, but the WG may consider their requirements in 380 order to design components that are sufficiently general to support 381 such extensions in the future. The working assumption is that the 382 standards developed in the initial phase should not need to be 383 modified to satisfy the solutions for when these restrictions are 384 removed. 386 3.1. Assumption 1: Trust - controlled environment 388 We assume that the PCN-domain is a controlled environment, i.e. all 389 the nodes in a PCN-domain run PCN and trust each other. There are 390 several reasons for proposing this assumption: 392 o The PCN-domain has to be encircled by a ring of PCN-boundary- 393 nodes, otherwise PCN-packets could enter the PCN-domain without 394 being subject to admission control, which would potentially 395 destroy the QoS of existing flows. 397 o Similarly, a PCN-boundary-node has to trust that all the PCN-nodes 398 are doing PCN-marking. A non PCN-node wouldn't be able to alert 399 that it is suffering pre-congestion, which potentially would lead 400 to too many PCN-flows being admitted (or too few being 401 terminated). Worse, a rogue node could perform various attacks, 402 as discussed in the Security Considerations section. 404 One way of assuring the above two points is that the entire PCN- 405 domain is run by a single operator. Another possibility is that 406 there are several operators but they trust each other to a sufficient 407 level, in their handling of PCN-traffic. 409 Note: All PCN-nodes need to be trustworthy. However if it's known 410 that an interface cannot become pre-congested then it's not strictly 411 necessary for it to be capable of PCN-marking. But this must be 412 known even in unusual circumstances, eg after the failure of some 413 links. 415 3.2. Assumption 2: Real-time applications 417 We assume that any variation of source bit rate is independent of the 418 level of pre-congestion. We assume that PCN-packets come from real 419 time applications generating inelastic traffic [Shenker] like voice 420 and video requiring low delay, jitter and packet loss, for example 421 the Controlled Load Service, [RFC2211], and the Telephony service 422 class, [RFC4594]. This assumption is to help focus the effort where 423 it looks like PCN would be most useful, ie the sorts of applications 424 where per flow QoS is a known requirement. For instance, the impact 425 of this assumption would be to guide simulations work. 427 3.3. Assumption 3: Many flows and additional load 429 We assume that there are many flows on any bottleneck link in the 430 PCN-domain (or, to put it another way, the aggregate bit rate of PCN- 431 traffic across any potential bottleneck link is sufficiently large 432 relative to the maximum additional bit rate added by one flow). 433 Measurement-based admission control assumes that the present is a 434 reasonable prediction of the future: the network conditions are 435 measured at the time of a new flow request, however the actual 436 network performance must be OK during the call some time later. One 437 issue is that if there are only a few variable rate flows, then the 438 aggregate traffic level may vary a lot, perhaps enough to cause some 439 packets to get dropped. If there are many flows then the aggregate 440 traffic level should be statistically smoothed. How many flows is 441 enough depends on a number of things such as the variation in each 442 flow's rate, the total rate of PCN-traffic, and the size of the 443 "safety margin" between the traffic level at which we start 444 admission-marking and at which packets are dropped or significantly 445 delayed. 447 We do not make explicit assumptions on how many PCN-flows are in each 448 ingress-egress-aggregate. Performance evaluation work may clarify 449 whether it is necessary to make any additional assumption on 450 aggregation at the ingress-egress-aggregate level. 452 3.4. Assumption 4: Emergency use out of scope 454 PCN-flows may have different precedence, but the applicability of the 455 PCN mechanisms for emergency use (911, GETS, WPS, MLPP, etc) is out 456 of scope for consideration by the PCN WG. 458 3.5. Other assumptions 460 As a consequence of Assumption 2 above, it is assumed that PCN- 461 marking is being applied to traffic scheduled with the expedited 462 forwarding per-hop behaviour, [RFC3246], or traffic with similar 463 characteristics. 465 The following two assumptions apply if the PCN WG decides to encode 466 PCN-marking in the ECN-field. 468 o It is assumed that PCN-nodes do not perform ECN, [RFC3168], on 469 PCN-packets. 471 o If a packet that is part of a PCN-flow arrives at a PCN-ingress- 472 node with its CE (Congestion experienced) codepoint set, then we 473 assume that the PCN-ingress-node drops the packet. After its 474 initial Charter is complete, the WG may decide to work on a 475 mechanism (such as through a signalling extension) that enables 476 ECN-marking to be carried transparently across the PCN-domain. 478 4. High-level functional architecture 480 The high-level approach is to split functionality between: 482 o PCN-interior-nodes 'inside' the PCN-domain, which monitor their 483 own state of pre-congestion on each outgoing interface and mark 484 PCN-packets if appropriate. They are not flow-aware, nor aware of 485 ingress-egress-aggregates. The functionality is also done by PCN- 486 ingress-nodes for their outgoing interfaces (ie those 'inside' the 487 PCN-domain). 489 o PCN-boundary-nodes at the edge of the PCN-domain, which control 490 admission of new PCN-flows and termination of existing PCN-flows, 491 based on information from PCN-interior-nodes. This information is 492 in the form of the PCN-marked data packets (which are intercepted 493 by the PCN-egress-nodes) and not signalling messages. Generally 494 PCN-ingress-nodes are flow-aware and in several deployment 495 scenarios PCN-egress-nodes will also be flow aware. 497 The aim of this split is to keep the bulk of the network simple, 498 scalable and robust, whilst confining policy, application-level and 499 security interactions to the edge of the PCN-domain. For example the 500 lack of flow awareness means that the PCN-interior-nodes don't care 501 about the flow information associated with the PCN-packets that they 502 carry, nor do the PCN-boundary-nodes care about which PCN-interior- 503 nodes its flows traverse. 505 4.1. Flow admission 507 At a high level, flow admission control works as follows. In order 508 to generate information about the current state of the PCN-domain, 509 each PCN-node PCN-marks packets if it is "pre-congested". Exactly 510 how a PCN-node decides if it is "pre-congested" (the algorithm) and 511 exactly how packets are "PCN-marked" (the encoding) will be defined 512 in a separate standards-track document, but at a high level it is 513 expected to be as follows: 515 o the algorithm: a PCN-node meters the amount of PCN-traffic on each 516 one of its outgoing links. The measurement is made as an 517 aggregate of all PCN-packets, and not per flow. The algorithm has 518 a configured parameter, PCN-lower-rate. As the amount of PCN- 519 traffic exceeds the PCN-lower-rate, then PCN-packets are PCN- 520 marked. See NOTE below for more explanation. 522 o the encoding: a PCN-node PCN-marks a PCN-packet (with a first 523 encoding) by setting fields in the header to specific values. It 524 is expected that the ECN and/or DSCP fields will be used. 526 NOTE: Two main categories of algorithm have been proposed: if the 527 algorithm uses threshold-marking then all PCN-packets are marked if 528 the current rate exceeds the PCN-lower-rate, whereas if the algorithm 529 uses excess-rate-marking the amount marked is equal to the amount in 530 excess of the PCN-lower-rate. However, note that this description 531 reflects the overall intent of the algorithm rather than its 532 instantaneous behaviour, since the rate measured at a particular 533 moment depends on the detailed algorithm, its implementation (eg 534 virtual queue, token bucket...) and the traffic's variance as well as 535 its rate (eg marking may well continue after a recent overload even 536 after the instantaneous rate has dropped). 538 The PCN-boundary-nodes monitor the PCN-marked packets in order to 539 extract information about the current state of the PCN-domain. Based 540 on this monitoring, a decision is made about whether to admit a 541 prospective new flow. Exactly how the admission control decision is 542 made will be defined separately (at the moment the intention is that 543 there will be one or more informational-track RFCs), but at a high 544 level two approaches have been proposed to date: 546 o the PCN-egress-node measures (possibly as a moving average) the 547 fraction of the PCN-traffic that is PCN-marked. The fraction is 548 measured for a specific ingress-egress-aggregate. If the fraction 549 is below a threshold value then the new flow is admitted. 551 o if the PCN-egress-node receives one (or several) PCN-marked 552 packets, then a new flow is blocked. 554 Note that the PCN-lower-rate is a parameter that can be configured by 555 the operator. It will be set lower than the traffic rate at which 556 the link becomes congested and the node drops packets. (Hence, by 557 analogy with ECN we call our mechanism Pre-Congestion Notification.) 559 Note also that the admission control decision is made for a 560 particular ingress-egress-aggregate. So it is quite possible for a 561 new flow to be admitted between one pair of PCN-boundary-nodes, 562 whilst at the same time another admission request is blocked between 563 a different pair of PCN-boundary-nodes. 565 4.2. Flow termination 567 At a high level, flow termination control works as follows. Each 568 PCN-node PCN-marks packets in a similar fashion to above. An obvious 569 approach is for the algorithm to use a second configured parameter, 570 PCN-upper-rate, and a second header encoding. However there is also 571 a proposal to use the same rate and the same encoding. Several 572 approaches have been proposed to date about how to convert this 573 information into a flow termination decision; at a high level these 574 are as follows: 576 o One approach measures the rate of unmarked PCN-traffic (ie not 577 PCN-upper-rate-marked) at the PCN-egress-node, which is the amount 578 of PCN-traffic that can actually be supported; the PCN-ingress- 579 node measures the rate of PCN-traffic that is destined for this 580 specific PCN-egress-node, and hence can calculate the excess 581 amount that should be terminated. 583 o Another approach instead measures the rate of PCN-upper-rate- 584 marked traffic and calculates and selects the flows that should be 585 terminated. 587 o Another approach terminates any PCN-flow with a PCN-upper-rate- 588 marked packet. Compared with the approaches above, PCN-marking 589 needs to be done at a reduced rate otherwise far too much traffic 590 would be terminated. 592 o Another approach uses only one sort of marking, which is based on 593 the PCN-lower-rate, to decide not only whether to admit more PCN- 594 flows but also whether any PCN-flows need to be terminated. It 595 assumes that the ratio of the (implicit) PCN-upper-rate and the 596 PCN-lower-rate is the same on all links. This approach measures 597 the rate of unmarked PCN-traffic at a PCN-egress-node. The PCN- 598 ingress-node uses this measurement to compute the implicit PCN- 599 upper-rate of the bottleneck link. It then measures the rate of 600 PCN-traffic that is destined for this specific PCN-egress-node and 601 hence can calculate the amount that should be terminated. 603 Since flow termination is designed for "abnormal" circumstances, it 604 is quite likely that some PCN-nodes are congested and hence packets 605 are being dropped and/or significantly queued. The flow termination 606 mechanism must bear this in mind. 608 Note also that the termination control decision is made for a 609 particular ingress-egress-aggregate. So it is quite possible for 610 PCN-flows to be terminated between one pair of PCN-boundary-nodes, 611 whilst at the same time none are terminated between a different pair 612 of PCN-boundary-nodes. 614 4.3. Flow admission and flow termination 616 Although designed to work together, flow admission and flow 617 termination are independent mechanisms, and the use of one does not 618 require or prevent the use of the other. 620 For example, an operator could use just admission control, solving 621 heavy congestion (caused by re-routing) by 'just waiting' - as 622 sessions end, existing microflows naturally depart from the system 623 over time, and the admission control mechanism will prevent admission 624 of new microflows that use the affected links. So the PCN-domain 625 will naturally return to normal operation, but with reduced capacity. 626 The drawback of this approach would be that until PCN-flows naturally 627 depart to relieve the congestion, all PCN-flows as well as lower 628 priority services will be adversely affected. On the other hand, an 629 operator could just rely for admission control on statically 630 provisioned capacity per PCN-ingress-node (regardless of the PCN- 631 egress-node of a flow), as is typical in the hose model of the 632 DiffServ architecture [RFC2475]. Such traffic conditioning 633 agreements can lead to focused overload: many flows happen to focus 634 on a particular link and then all flows through the congested link 635 fail catastrophically. The flow termination mechanism could then be 636 used to counteract such a problem. 638 A different possibility is to configure only the PCN-lower-rate and 639 hence only do one type of PCN-marking, but generate admission and 640 flow termination responses from different levels of marking. This is 641 suggested in [I-D.charny-pcn-single-marking] which gives some of the 642 pros and cons of this approach. 644 4.4. Information transport 646 The transport of pre-congestion information from a PCN-node to a PCN- 647 egress-node is through PCN-markings in data packet headers, no 648 signalling protocol messaging is needed. However, signalling is 649 needed to transport PCN-feedback-information between the PCN- 650 boundary-nodes, for example to convey the fraction of PCN-marked 651 traffic from a PCN-egress-node to the relevant PCN-ingress-node. 652 Exactly what information needs to be transported will be described in 653 the future PCN WG document(s) about the boundary mechanisms. The 654 signalling could be done by an extension of RSVP or NSIS, for 655 instance; protocol work will be done by the relevant WG, but for 656 example [I-D.lefaucheur-rsvp-ecn] describes the extensions needed for 657 RSVP. 659 4.5. PCN-traffic 661 The following are some high-level points about how PCN works: 663 o There needs to be a way for a PCN-node to distinguish PCN-traffic 664 from non PCN-traffic. They may be distinguished using the DSCP 665 field and/or ECN field. 667 o The PCN mechanisms may be applied to more than one traffic class 668 (which are distinguished by DSCP). 670 o There may be traffic that is more important than PCN, perhaps a 671 particular application or an operator's control messages. A PCN- 672 node may dedicate capacity to such traffic or priority schedule it 673 over PCN. In the latter case its traffic needs to contribute to 674 the PCN meters. 676 o There will be traffic less important than PCN. For instance best 677 effort or assured forwarding traffic. It will be scheduled at 678 lower priority than PCN, and use a separate queue or queues. 679 However, a PCN-node should dedicate some capacity to lower 680 priority traffic so that it isn't starved. 682 o There may be other traffic with the same priority as PCN-traffic. 683 For instance, Expedited Forwarding sessions that are originated 684 either without capacity admission or with traffic engineering. In 685 [I-D.ietf-tsvwg-admitted-realtime-dscp] the two traffic classes 686 are called EF and EF-ADMIT. A PCN-node could either use separate 687 queues, or separate policers and a common queue; the draft 688 provides some guidance when each is better, but for instance the 689 latter is preferred when the two traffic classes are carrying the 690 same type of application with the same jitter requirements. 692 5. Detailed Functional architecture 694 This section is intended to provide a systematic summary of the new 695 functional architecture in the PCN-domain. First it describes 696 functions needed at the three specific types of PCN-node; these are 697 data plane functions and are in addition to their normal router 698 functions. Then it describes further functionality needed for both 699 flow admission control and flow termination; these are signalling and 700 decision-making functions, and there are various possibilities for 701 where the functions are physically located. The section is split 702 into: 704 1. functions needed at PCN-interior-nodes 706 2. functions needed at PCN-ingress-nodes 708 3. functions needed at PCN-egress-nodes 710 4. other functions needed for flow admission control 712 5. other functions needed for flow termination control 713 Note: Probing is covered in Section 7. 715 The section then discusses some other detailed topics: 717 1. addressing 719 2. tunnelling 721 3. fault handling 723 5.1. PCN-interior-node functions 725 Each interface of the PCN-domain is upgraded with the following 726 functionality: 728 o Packet classify - decide whether an incoming packet is a PCN- 729 packet or not. Another PCN WG document will specify encoding, 730 using the DSCP and/or ECN fields. 732 o PCN-meter - measure the 'amount of PCN-traffic'. The measurement 733 is made as an aggregate of all PCN-packets, and not per flow. 735 o PCN-mark - algorithms determine whether to PCN-mark PCN-packets 736 and what packet encoding is used (as specified in another PCN WG 737 document). 739 The same general approach of metering and PCN-marking is performed 740 for both flow admission control and flow termination, however the 741 algorithms and encoding may be different. 743 These functions are needed for each interface of the PCN-domain. 744 They are therefore needed on all interfaces of PCN-interior-nodes, 745 and on the interfaces of PCN-boundary-nodes that are internal to the 746 PCN-domain. There may be more than one PCN-meter and marker 747 installed at a given interface, eg one for admission and one for 748 termination. 750 5.2. PCN-ingress-node functions 752 Each ingress interface of the PCN-domain is upgraded with the 753 following functionality: 755 o Packet classify - decide whether an incoming packet is part of a 756 previously admitted microflow, by using a filter spec (eg DSCP, 757 source and destination addresses and port numbers) 759 o Police - police, by dropping or re-marking with a non-PCN DSCP, 760 any packets received with a DSCP demanding PCN transport that do 761 not belong to an admitted flow. Similarly, police packets that 762 are part of a previously admitted microflow, to check that the 763 microflow keeps to the agreed rate or flowspec (eg RFC1633 764 [RFC1633] and NSIS equivalent). 766 o PCN-colour - set the DSCP field or DSCP and ECN fields to the 767 appropriate value(s) for a PCN-packet. The draft about PCN- 768 encoding will discuss further. 770 o PCN-meter - make "measurements of PCN-traffic". Some approaches 771 to flow termination require the PCN-ingress-node to measure the 772 (aggregate) rate of PCN-traffic towards a particular PCN-egress- 773 node. 775 The first two are policing functions, needed to make sure that PCN- 776 packets let into the PCN-domain belong to a flow that's been admitted 777 and to ensure that the flow doesn't go at a faster rate than agreed. 778 The filter spec will for example come from the flow request message 779 (outside scope of PCN WG, see [I-D.briscoe-tsvwg-cl-architecture] for 780 an example using RSVP). PCN-colouring allows the rest of the PCN- 781 domain to recognise PCN-packets. 783 5.3. PCN-egress-node functions 785 Each egress interface of the PCN-domain is upgraded with the 786 following functionality: 788 o Packet classify - determine which PCN-ingress-node a PCN-packet 789 has come from. 791 o PCN-meter - make measurements of PCN-traffic. The measurement(s) 792 is made as an aggregate (ie not per flow) of all PCN-packets from 793 a particular PCN-ingress-node. 795 o PCN-colour - for PCN-packets, set the DSCP and ECN fields to the 796 appropriate values for use outside the PCN-domain. 798 Another PCN WG document, about boundary mechanisms, will describe 799 what the "measurements of PCN-traffic" are. This depends on whether 800 the measurement is targeted at admission control or flow termination. 801 It also depends on what encoding and PCN-marking algorithms are 802 specified by the PCN WG. 804 5.4. Admission control functions 806 Specific admission control functions can be performed at a PCN- 807 boundary-node (PCN-ingress-node or PCN-egress-node) or at a 808 centralised node, but not at normal PCN-interior-nodes. The 809 functions are: 811 o Make decision about admission - compare the required "measurements 812 of PCN-traffic" (output of the PCN-egress-node's PCN-meter 813 function) with some reference level, and hence decide whether to 814 admit the potential new PCN-flow. As well as the PCN 815 measurements, the decision takes account of policy and application 816 layer requirements. 818 o Communicate decision about admission - signal the decision to the 819 node making the admission control request (which may be outside 820 the PCN-domain), and to the policer (PCN-ingress-node function) 822 There are various possibilities for how the functionality can be 823 distributed (we assume the operator would configure which is used): 825 o The decision is made at the PCN-egress-node and signalled to the 826 PCN-ingress-node 828 o The decision is made at the PCN-ingress-node, which requires that 829 the PCN-egress-node signals to the PCN-ingress-node the fraction 830 of PCN-traffic that is PCN-marked (or whatever the PCN WG agrees 831 as the required "measurements of PCN-traffic"). 833 o The decision is made at a centralised node, which requires that 834 the PCN-egress-node signals its measurements to the centralised 835 node, and that the centralised node signals to the PCN-ingress- 836 node about the decision about admission control. It would be 837 possible for the centralised node to be one of the PCN-boundary- 838 nodes, when clearly the signalling would sometimes be replaced by 839 a message internal to the node. 841 5.5. Flow termination functions 843 Specific termination control functions can be performed at a PCN- 844 boundary-node (PCN-ingress-node or PCN-egress-node) or at a 845 centralised node, but not at normal PCN-interior-nodes. There are 846 various possibilities for how the functionality can be distributed, 847 similar to those discussed above in the Admission control section; 848 the flow termination decision could be made at the PCN-ingress-node, 849 the PCN-egress-node or at some centralised node. The functions are: 851 o PCN-meter at PCN-egress-node - make "measurements of PCN-traffic" 852 from a particular PCN-ingress-node. 854 o (if required) PCN-meter at PCN-ingress-node - make "measurements 855 of PCN-traffic" being sent towards a particular PCN-egress-node; 856 again, this is done for the ingress-egress-aggregate and not per 857 flow. 859 o (if required) Communicate "measurements of PCN-traffic" to the 860 node that makes the flow termination decision. For example, if 861 the PCN-ingress-node makes the decision then communicate the PCN- 862 egress-node's measurements to it (as in 863 [I-D.briscoe-tsvwg-cl-architecture]). 865 o Make decision about flow termination - use the "measurements of 866 PCN-traffic" to decide which PCN-flow or PCN-flows to terminate. 867 The decision takes account of policy and application layer 868 requirements. 870 o Communicate decision about flow termination - signal the decision 871 to the node that is able to terminate the flow (which may be 872 outside the PCN-domain), and to the policer (PCN-ingress-node 873 function). 875 5.6. Addressing 877 PCN-nodes may need to know the address of other PCN-nodes: 879 o Note: in all cases PCN-interior-nodes don't need to know the 880 address of any other PCN-nodes (except as normal their next hop 881 neighbours, for routing purposes) 883 o in the cases of admission or termination decision by a PCN- 884 boundary-node, the PCN-egress-node needs to know the address of 885 the PCN-ingress-node associated with a flow, at a minimum so that 886 the PCN-ingress-node can be informed to enforce the admission 887 decision (and any flow termination decision) through policing. 888 The addressing information can be gathered from signalling, for 889 example as described for RSVP in [I-D.lefaucheur-rsvp-ecn]. 890 Another alternative is to use a probe packet that includes as 891 payload the address of the PCN-ingress-node. Alternatively, if 892 PCN-traffic is always tunnelled across the PCN-domain, then the 893 PCN-ingress-node's address is simply the source address of the 894 outer packet header; then the PCN-ingress-node needs to learn the 895 address of the PCN-egress-node, either by manual configuration or 896 by one of the automated tunnel endpoint discovery mechanisms (such 897 as signalling or probing over the data route, interrogating 898 routing or using a centralised broker). 900 o in the cases of admission or termination decision by a central 901 control node, the PCN-egress-node needs to be configured with the 902 address of the centralised node. In addition, depending on the 903 exact deployment scenario and its signalling, the centralised node 904 may need to know the addresses of the PCN-ingress-node and PCN- 905 egress-node, the PCN-egress-node may need to know the address of 906 the PCN-ingress-node, and the PCN-ingress-node may need to know 907 the address of the centralised node and the PCN-egress-node. 908 NOTE: Consideration of the centralised case is out of scope of the 909 initial PCN WG Charter. 911 5.7. Tunnelling 913 Tunnels may originate and/or terminate within a PCN-domain. It is 914 important that the PCN-marking of any packet can potentially 915 influence PCN's flow admission control and termination - it shouldn't 916 matter whether the packet happens to be tunnelled at the PCN-node 917 that PCN-marks the packet, or indeed whether it's decapsulated or 918 encapsulated by a subsequent PCN-node. This suggests that the 919 "uniform conceptual model" described in [RFC2983] should be re- 920 applied in the PCN context. In line with this and the approach of 921 [RFC4303] and [I-D.briscoe-tsvwg-ecn-tunnel], the following rule is 922 applied if encapsulation is done within the PCN-domain: 924 o any PCN-marking is copied into the outer header 926 Similarly, in line with the "uniform conceptual model" of [RFC2983] 927 and the "full-functionality option" of [RFC3168], the following rule 928 is applied if decapsulation is done within the PCN-domain: 930 o if the outer header's marking state is more severe then it is 931 copied onto the inner header 933 o NB the order of increasing severity is: unmarked; PCN-marking with 934 first encoding (ie associated with the PCN-lower-rate); PCN- 935 marking with second encoding (ie associated with the PCN-upper- 936 rate) 938 An operator may wish to tunnel PCN-traffic from PCN-ingress-nodes to 939 PCN-egress-nodes. The PCN-marks shouldn't be visible outside the 940 PCN-domain, which can be achieved by doing the PCN-colour function 941 (Section 5.3) after all the other (PCN and tunnelling) functions. 942 The potential reasons for doing such tunnelling are: the PCN-egress- 943 node then automatically knows the address of the relevant PCN- 944 ingress-node for a flow; even if ECMP is running, all PCN-packets on 945 a particular ingress-egress-aggregate follow the same path. But it 946 also has drawbacks, for example the additional overhead in terms of 947 bandwidth and processing. 949 Potential issues arise for a "partially PCN-capable tunnel", ie where 950 only one tunnel endpoint is in the PCN domain: 952 1. The tunnel starts outside a PCN-domain and finishes inside it. 953 If the packet arrives at the tunnel ingress with the same 954 encoding as used within the PCN-domain to indicate PCN-marking, 955 then this could lead the PCN-egress-node to falsely measure pre- 956 congestion. 958 2. The tunnel starts inside a PCN-domain and finishes outside it. 959 If the packet arrives at the tunnel ingress already PCN-marked, 960 then it will still have the same encoding when it's decapsulated 961 which could potentially confuse nodes beyond the tunnel egress. 963 In line with the solution for partially capable DiffServ tunnels in 964 [2983], the following rules are applied: 966 o For case (1), the tunnel egress node clears any PCN-marking on the 967 inner header. This rule is applied before the 'copy on 968 decapsulation' rule above. 970 o For case (2), the tunnel ingress node clears any PCN-marking on 971 the inner header. This rule is applied after the 'copy on 972 encapsulation' rule above. 974 Note that the above implies that one has to know, or figure out, the 975 characteristics of the other end of the tunnel as part of setting it 976 up. 978 5.8. Fault handling 980 If a PCN-interior-node fails (or one of its links), then lower layer 981 protection mechanisms or the regular IP routing protocol will 982 eventually re-route round it. If the new route can carry all the 983 admitted traffic, flows will gracefully continue. If instead this 984 causes early warning of pre-congestion on the new route, then 985 admission control based on pre-congestion notification will ensure 986 new flows will not be admitted until enough existing flows have 987 departed. Re-routing may result in heavy (pre-)congestion, when the 988 flow termination mechanism will kick in. 990 If a PCN-boundary-node fails then we would like the regular QoS 991 signalling protocol to take care of things. As an example 992 [I-D.briscoe-tsvwg-cl-architecture] considers what happens if RSVP is 993 the QoS signalling protocol. The details for a specific signalling 994 protocol are out of scope of the PCN WG, however there is a WG 995 Milestone on generic "Requirements for signalling". 997 6. Design goals and challenges 999 Prior work on PCN and similar mechanisms has thrown up a number of 1000 considerations about PCN's design goals (things PCN should be good 1001 at) and some issues that have been hard to solve in a fully 1002 satisfactory manner. Taken as a whole it represents a list of trade- 1003 offs (it's unlikely that they can all be 100% achieved) and perhaps 1004 as evaluation criteria to help an operator (or the IETF) decide 1005 between options. 1007 The following are key design goals for PCN (based on 1008 [I-D.chan-pcn-problem-statement]): 1010 o The PCN-enabled packet forwarding network should be simple, 1011 scalable and robust 1013 o Compatibility with other traffic (ie a proposed solution should 1014 work well when non-PCN traffic is also present in the network) 1016 o Support of different types of real-time traffic (eg should work 1017 well with CBR and VBR voice and video sources treated together) 1019 o Reaction time of the mechanisms should be commensurate with the 1020 desired application-level requirements (eg a termination mechanism 1021 needs to terminate flows before significant QoS issues are 1022 experienced by real-time traffic, and before most users hang up). 1024 o Compatibility with different precedence levels of real-time 1025 applications (e.g. preferential treatment of higher precedence 1026 calls over lower precedence calls, [ITU-MLPP]. 1028 The following are open issues. They are mainly taken from 1029 [I-D.briscoe-tsvwg-cl-architecture] which also describes some 1030 possible solutions. Note that some may be considered unimportant in 1031 general or in specific deployment scenarios or by some operators. 1033 NOTE: Potential solutions are out of scope for this document. 1035 o ECMP (Equal Cost Multi-Path) Routing: The level of pre-congestion 1036 is measured on a specific ingress-egress-aggregate. However, if 1037 the PCN-domain runs ECMP, then traffic on this ingress-egress- 1038 aggregate may follow several different paths - some of the paths 1039 could be pre-congested whilst others are not. There are three 1040 potential problems: 1042 1. over-admission: a new flow is admitted (because the pre- 1043 congestion level measured by the PCN-egress-node is 1044 sufficiently diluted by unmarked packets from non-congested 1045 paths that a new flow is admitted), but its packets travel 1046 through a pre-congested PCN-node 1048 2. under-admission: a new flow is blocked (because the pre- 1049 congestion level measured by the PCN-egress-node is 1050 sufficiently increased by PCN-marked packets from pre- 1051 congested paths that a new flow is blocked), but its packets 1052 travel along an uncongested path 1054 3. ineffective termination: flows are terminated, however their 1055 path doesn't travel through the (pre-)congested router(s). 1056 Since flow termination is a 'last resort' that protects the 1057 network should over-admission occur, this problem is probably 1058 more important to solve than the other two. 1060 o ECMP and signalling: It is possible that, in a PCN-domain running 1061 ECMP, the signalling packets (eg RSVP, NSIS) follow a different 1062 path than the data packets - it depends on which fields the ECMP 1063 algorithm uses. This could matter if the signalling packets are 1064 used as probes. 1066 o Tunnelling: There are scenarios where tunnelling makes it hard to 1067 determine the path in the PCN-domain. The problem, its impact and 1068 the potential solutions are similar to those for ECMP. 1070 o Scenarios with only one tunnel endpoint in the PCN domain may make 1071 it harder for the PCN-egress-node to gather from the signalling 1072 messages (eg RSVP, NSIS) the identity of the PCN-ingress-node. 1074 o Bi-Directional Sessions: Many applications have bi-directional 1075 sessions - hence there are two flows that should be admitted (or 1076 terminated) as a pair - for instance a bi-directional voice call 1077 only makes sense if flows in both directions are admitted. 1078 However, PCN's mechanisms concern admission and termination of a 1079 single flow, and coordination of the decision for both flows is a 1080 matter for the signalling protocol and out of scope of PCN. One 1081 possible example would use SIP pre-conditions; there are others. 1083 o Global Coordination: PCN makes its admission decision based on 1084 PCN-markings on a particular ingress-egress-aggregate. Decisions 1085 about flows through a different ingress-egress-aggregate are made 1086 independently. However, one can imagine network topologies and 1087 traffic matrices where, from a global perspective, it would be 1088 better to make a coordinated decision across all the ingress- 1089 egress-aggregates for the whole PCN-domain. For example, to block 1090 (or even terminate) flows on one ingress-egress-aggregate so that 1091 more important flows through a different ingress-egress-aggregate 1092 could be admitted. The problem may well be second order. 1094 o Aggregate Traffic Characteristics: Even when the number of flows 1095 is stable, the traffic level through the PCN-domain will vary 1096 because the sources vary their traffic rates. PCN works best when 1097 there's not too much variability in the total traffic level at a 1098 PCN-node's interface (ie in the aggregate traffic from all 1099 sources). Too much variation means that a node may (at one 1100 moment) not be doing any PCN-marking and then (at another moment) 1101 drop packets because it's overloaded. This makes it hard to tune 1102 the admission control scheme to stop admitting new flows at the 1103 right time. Therefore the problem is more likely with fewer, 1104 burstier flows. 1106 o Flash crowds and Speed of Reaction: PCN is a measurement-based 1107 mechanism and so there is an inherent delay between packet marking 1108 by PCN-interior-nodes and any admission control reaction at PCN- 1109 boundary-nodes. For example, potentially if a big burst of 1110 admission requests occurs in a very short space of time (eg 1111 prompted by a televote), they could all get admitted before enough 1112 PCN-marks are seen to block new flows. In other words, any 1113 additional load offered within the reaction time of the mechanism 1114 mustn't move the PCN-domain directly from no congestion to 1115 overload. This 'vulnerability period' may impact at the 1116 signalling level, for instance QoS requests should be rate limited 1117 to bound the number of requests able to arrive within the 1118 vulnerability period. 1120 o Silent at start: after a successful admission request the source 1121 may wait some time before sending data (eg waiting for the called 1122 party to answer). Then the risk is that, in some circumstances, 1123 PCN's measurements underestimate what the pre-congestion level 1124 will be when the source does start sending data. 1126 o Compatibility of PCN-encoding with ECN-encoding. This issue will 1127 be considered further in the PCN WG Milestone 'Survey of encoding 1128 choices'. 1130 7. Probing 1132 7.1. Introduction 1134 Probing is an optional mechanism to assist admission control. 1136 PCN's admission control, as described so far, is essentially a 1137 reactive mechanism where the PCN-egress-node monitors the pre- 1138 congestion level for traffic from each PCN-ingress-node; if the level 1139 rises then it blocks new flows on that ingress-egress-aggregate. 1140 However, it's possible that an ingress-egress-aggregate carries no 1141 traffic, and so the PCN-egress-node can't make an admission decision 1142 using the usual method described earlier. 1144 One approach is to be "optimistic" and simply admit the new flow. 1145 However it's possible to envisage a scenario where the traffic levels 1146 on other ingress-egress-aggregates are already so high that they're 1147 blocking new PCN-flows, and admitting a new flow onto this 'empty' 1148 ingress-egress-aggregate adds extra traffic onto the link that's 1149 already pre-congested - which may 'tip the balance' so that PCN's 1150 flow termination mechanism is activated or some packets are dropped. 1151 This risk could be lessened by configuring on each link sufficient 1152 'safety margin' above the PCN-lower-rate. 1154 An alternative approach is to make PCN a more proactive mechanism. 1155 The PCN-ingress-node explicitly determines, before admitting the 1156 prospective new flow, whether the ingress-egress-aggregate can 1157 support it. This can be seen as a "pessimistic" approach, in 1158 contrast to the "optimism" of the approach above. It involves 1159 probing: a PCN-ingress-node generates and sends probe packets in 1160 order to test the pre-congestion level that the flow would 1161 experience. 1163 One possibility is that a probe packet is just a dummy data packet, 1164 generated by the PCN-ingress-node and addressed to the PCN-egress- 1165 node. Another possibility is that a probe packet is a signalling 1166 packet that is anyway travelling from the PCN-ingress-node to the 1167 PCN-egress-node (eg an RSVP PATH message travelling from source to 1168 destination). 1170 7.2. Probing functions 1172 The probing functions are: 1174 o Make decision that probing is needed. As described above, this is 1175 when the ingress-egress-aggregate or the ECMP path carries no PCN- 1176 traffic. An alternative is always to probe, ie probe before 1177 admitting every PCN-flow. 1179 o (if required) Communicate the request that probing is needed - the 1180 PCN-egress-node signals to the PCN-ingress-node that probing is 1181 needed 1183 o (if required) Generate probe traffic - the PCN-ingress-node 1184 generates the probe traffic. The appropriate number (or rate) of 1185 probe packets will depend on the PCN-marking algorithm; for 1186 example an excess-rate-marking algorithm generates fewer PCN-marks 1187 than a threshold-marking algorithm, and so will need more probe 1188 packets. 1190 o Forward probe packets - as far as PCN-interior-nodes are 1191 concerned, probe packets must be handled the same as (ordinary 1192 data) PCN-packets, in terms of routing, scheduling and PCN- 1193 marking. 1195 o Consume probe packets - the PCN-egress-node consumes probe packets 1196 to ensure that they don't travel beyond the PCN-domain. 1198 7.3. Discussion of rationale for probing, its downsides and open issues 1200 It is an unresolved question whether probing is really needed, but 1201 three viewpoints have been put forward as to why it is useful. The 1202 first is perhaps the most obvious: there is no PCN-traffic on the 1203 ingress-egress-aggregate. The second assumes that multipath routing 1204 ECMP is running in the PCN-domain. The third viewpoint is that 1205 admission control is always done by probing. We now consider each in 1206 turn. 1208 The first viewpoint assumes the following: 1210 o There is no PCN-traffic on the ingress-egress-aggregate (so a 1211 normal admission decision cannot be made). 1213 o Simply admitting the new flow has a significant risk of leading to 1214 overload: packets dropped or flows terminated. 1216 On the former bullet, [PCN-email-traffic-empty-aggregates] suggests 1217 that, during the future busy hour of a national network with about 1218 100 PCN-boundary-nodes, there are likely to be significant numbers of 1219 aggregates with very few flows under nearly all circumstances. 1221 The latter bullet could occur if a new flow starts on many of the 1222 empty ingress-egress-aggregates and causes overload on a link in the 1223 PCN-domain. To be a problem this would probably have to happen in a 1224 short time period (flash crowd) because, after the reaction time of 1225 the system, other (non-empty) ingress-egress-aggregates that pass 1226 through the link will measure pre-congestion and so block new flows, 1227 and also flows naturally end anyway. 1229 The downsides of probing for this viewpoint are: 1231 o Probing adds delay to the admission control process. 1233 o Sufficient probing traffic has to be generated to test the pre- 1234 congestion level of the ingress-egress-aggregate. But the probing 1235 traffic itself may cause pre-congestion, causing other PCN-flows 1236 to be blocked or even terminated - and in the flash crowd scenario 1237 there will be probing on many ingress-egress-aggregates. 1239 The open issues associated with this viewpoint include: 1241 o What rate and pattern of probe packets does the PCN-ingress-node 1242 need to generate, so that there's enough traffic to make the 1243 admission decision? 1245 o What difficulty does the delay (whilst probing is done) cause 1246 applications, eg packets might be dropped? 1248 o Are there other ways of dealing with the flash crowd scenario? 1249 For instance limit the rate at which new flows are admitted; or 1250 perhaps for a PCN-egress-node to block new flows on its empty 1251 ingress-egress-aggregates when its non-empty ones are pre- 1252 congested. 1254 The second viewpoint applies in the case where there is multipath 1255 routing (ECMP) in the PCN-domain. Note that ECMP is often used on 1256 core networks. There are two possibilities: 1258 (1) If admission control is based on measurements of the ingress- 1259 egress-aggregate, then the viewpoint that probing is useful assumes: 1261 o there's a significant chance that the traffic is unevenly balanced 1262 across the ECMP paths, and hence there's a significant risk of 1263 admitting a flow that should be blocked (because it follows an 1264 ECMP path that is pre-congested) or blocking a flow that should be 1265 admitted. 1267 o Note: [PCN-email-ECMP] suggests unbalanced traffic is quite 1268 possible, even with quite a large number of flows on a PCN-link 1269 (eg 1000) when Assumption 3 (aggregation) is likely to be 1270 satisfied. 1272 (2) If admission control is based on measurements of pre-congestion 1273 on specific ECMP paths, then the viewpoint that probing is useful 1274 assumes: 1276 o There is no PCN-traffic on the ECMP path on which to base an 1277 admission decision. 1279 o Simply admitting the new flow has a significant risk of leading to 1280 overload. 1282 o The PCN-egress-node can match a packet to an ECMP path. 1284 o Note: This is similar to the first viewpoint and so similarly 1285 could occur in a flash crowd if a new flow starts more-or-less 1286 simultaneously on many of the empty ECMP paths. Because there are 1287 several (sometimes many) ECMP paths between each pair of PCN- 1288 boundary-nodes, it's presumably more likely that an ECMP path is 1289 'empty' than an ingress-egress-aggregate. To constrain the number 1290 of ECMP paths, a few tunnels could be set-up between each pair of 1291 PCN-boundary-nodes. Tunnelling also solves the third bullet 1292 (which is otherwise hard because an ECMP routing decision is made 1293 independently on each node). 1295 The downsides of probing for this viewpoint are: 1297 o Probing adds delay to the admission control process. 1299 o Sufficient probing traffic has to be generated to test the pre- 1300 congestion level of the ECMP path. But there's the risk that the 1301 probing traffic itself may cause pre-congestion, causing other 1302 PCN-flows to be blocked or even terminated. 1304 o The PCN-egress-node needs to consume the probe packets to ensure 1305 they don't travel beyond the PCN-domain (eg they might confuse the 1306 destination end node). Hence somehow the PCN-egress-node has to 1307 be able to disambiguate a probe packet from a data packet, via the 1308 characteristic setting of particular bit(s) in the packet's header 1309 or body - but these bit(s) mustn't be used by any PCN-interior- 1310 node's ECMP algorithm. In the general case this isn't possible, 1311 but it should be OK for a typical ECMP algorithm which examines: 1312 the source and destination IP addresses and port numbers, the 1313 protocol ID and the DSCP. 1315 The third viewpoint assumes the following: 1317 o Simply admitting the new flow has a significant risk of leading to 1318 overload, because the PCN-domain reaches out towards the end 1319 terminals where link capacity is low. 1321 o Every admission control decision involves probing, using the 1322 signalling set-up message as the probe packet (eg RSVP PATH). 1324 o The PCN-marking behaviour is such that every packet is PCN-marked 1325 if the flow should be blocked, hence only a single probing packet 1326 is needed. 1328 The first point breaks Assumption 3 (aggregation) and hence means 1329 that this viewpoint is out of scope of the initial Charter of the PCN 1330 WG. 1332 8. Operations and Management 1334 This Section considers operations and management issues, under the 1335 FCAPS headings: OAM of Faults, Configuration, Accounting, Performance 1336 and Security. Provisioning is discussed with performance. 1338 8.1. Configuration OAM 1340 This architecture document predates the detailed standards actions of 1341 the PCN WG. Here we assume that only interoperable PCN-marking 1342 behaviours will be standardised, otherwise we would have to consider 1343 how to avoid interactions between non-interoperable marking 1344 behaviours. However, more diversity in edge-node behaviours is 1345 expected, in order to interface with diverse industry architectures. 1347 PCN configuration control variables fall into the following 1348 categories: 1350 o system options (enabling or disabling behaviours) 1352 o parameters (setting levels, addresses etc) 1354 All configurable variables will need to sit within an SNMP management 1355 framework [RFC3411], being structured within a defined management 1356 information base (MIB) on each node, and being remotely readable and 1357 settable via a suitably secure management protocol (SNMPv3). 1359 Some configuration options and parameters have to be set once to 1360 'globally' control the whole PCN-domain. Where possible, these are 1361 identified below. This may affect operational complexity and the 1362 chances of interoperability problems between kit from different 1363 vendors. 1365 8.1.1. System options 1367 On PCN-interior-nodes there will be very few system options: 1369 o Whether two PCN-markings (based on the PCN-lower-rate and PCN- 1370 upper-rate) are enabled or only one (see Section 4.3). Typically 1371 all nodes throughout a PCN-domain will be configured the same in 1372 this respect. However, exceptions could be made. For example, if 1373 most PCN-nodes used both markings, but some legacy hardware was 1374 incapable of running two algorithms, an operator might be willing 1375 to configure these legacy nodes solely for PCN-marking based on 1376 the PCN-upper-rate to enable flow termination as a back-stop. It 1377 would be sensible to place such nodes where they could be 1378 provisioned with a greater leeway over expected traffic levels. 1380 o which marking algorithm to use, if an equipment vendor provides a 1381 choice 1383 PCN-boundary-nodes (ingress and egress) will have more system 1384 options: 1386 o Which of admission and flow termination are enabled. If any PCN- 1387 interior-node is configured to generate a marking, all PCN- 1388 boundary-nodes must be able to handle that marking. Therefore all 1389 PCN-boundary-nodes must be configured the same in this respect. 1391 o Where flow admission and termination decisions are made: at the 1392 PCN-ingress-node, PCN-egress-node or at a centralised node (see 1393 Sections 5.4 and 5.5). Theoretically, this configuration choice 1394 could be negotiated for each pair of PCN-boundary-nodes, but we 1395 cannot imagine why such complexity would be required, except 1396 perhaps in future inter-domain scenarios. 1398 PCN-egress-nodes will have further system options: 1400 o How the mapping should be established between each packet and its 1401 aggregate, eg by MPLS label, by IP packet filterspec; and how to 1402 take account of ECMP. 1404 o If an equipment vendor provides a choice, there may be options to 1405 select which smoothing algorithm to use for measurements. 1407 8.1.2. Parameters 1409 Like any DiffServ domain, every node within a PCN-domain will need to 1410 be configured with the DSCP(s) used to identify PCN-packets. On each 1411 interior link the main configuration parameters are the PCN-lower- 1412 rate and PCN-upper-rate. A larger PCN-lower-rate enables more PCN- 1413 traffic to be admitted on a link, hence improving capacity 1414 utilisation. A PCN-upper-rate set further above the PCN-lower-rate 1415 allows greater increases in traffic (whether due to natural 1416 fluctuations or some unexpected event) before any flows are 1417 terminated, ie minimises the chances of unnecessarily triggering the 1418 termination mechanism. For instance an operator may want to design 1419 their network so that it can cope with a failure of any single PCN- 1420 node without terminating any flows. 1422 Setting these rates on first deployment of PCN will be very similar 1423 to the traditional process for sizing an admission controlled 1424 network, depending on: the operator's requirements for minimising 1425 flow blocking (grade of service), the expected PCN traffic load on 1426 each link and its statistical characteristics (the traffic matrix), 1427 contingency for re-routing the PCN traffic matrix in the event of 1428 single or multiple failures and the expected load from other classes 1429 relative to link capacities. But once a domain is up and running, a 1430 PCN design goal is to be able to determine growth in these configured 1431 rates much more simply, by monitoring PCN-marking rates from actual 1432 rather than expected traffic (see Section 8.2 on Performance & 1433 Provisioning). 1435 Operators may also wish to configure a rate greater than the PCN- 1436 upper-rate that is the absolute maximum rate that a link allows for 1437 PCN-traffic. This may simply be the physical link rate, but some 1438 operators may wish to configure a logical limit to prevent starvation 1439 of other traffic classes during any brief period after PCN-traffic 1440 exceeds the PCN-upper-rate but before flow termination brings it back 1441 below this rate. 1443 Specific marking algorithms will also depend on further configuration 1444 parameters. For instance, threshold-marking will require a threshold 1445 queue depth and excess-rate-marking may require a scaling parameter. 1446 It will be preferable for each marking algorithm to have rules to set 1447 defaults for these parameters relative to the reference marking rate, 1448 but then allow operators to change them, for instance if average 1449 traffic characteristics change over time. The PCN-egress-node may 1450 allow configuration of the following: 1452 o how it smoothes metering of PCN-markings (eg EWMA parameters) 1454 Whichever node makes admission and flow termination decisions will 1455 contain algorithms for converting PCN-marking levels into admission 1456 or flow termination decisions. These will also require configurable 1457 parameters, for instance: 1459 o Any admission control algorithm will at least require a marking 1460 threshold setting above which it denies admission to new flows; 1462 o flow termination algorithms will probably require a parameter to 1463 delay termination of any flows until it is more certain that an 1464 anomalous event is not transient; 1466 o a parameter to control the trade-off between how quickly excess 1467 flows are terminated and over-termination. 1469 One particular proposal, [I-D.charny-pcn-single-marking] would 1470 require a global parameter to be defined on all PCN-nodes, but only 1471 needs the PCN-lower-rate to be configured on each link. The global 1472 parameter is a scaling factor between admission and termination, for 1473 example the amount by which the PCN-upper-rate is implicitly assumed 1474 to be above the PCN-lower-rate. [I-D.charny-pcn-single-marking] 1475 discusses in full the impact of this particular proposal on the 1476 operation of PCN. 1478 8.2. Performance & Provisioning OAM 1480 Monitoring of performance factors measurable from *outside* the PCN 1481 domain will be no different with PCN than with any other packet-based 1482 flow admission control system, both at the flow level (blocking 1483 probability etc) and the packet level (jitter [RFC3393], [Y.1541], 1484 loss rate [RFC4656], mean opinion score [P.800], etc). The 1485 difference is that PCN is intentionally designed to indicate 1486 *internally* which exact resource(s) are the cause of performance 1487 problems and by how much. 1489 Even better, PCN indicates which resources will probably cause 1490 problems if they are not upgraded soon. This can be achieved by the 1491 management system monitoring the total amount (in bytes) of PCN- 1492 marking generated by each queue over a period. Given possible long 1493 provisioning lead times, pre-congestion volume is the best metric to 1494 reveal whether sufficient persistent demand has mounted up to warrant 1495 an upgrade. Because, even before utilisation becomes problematic, 1496 the statistical variability of traffic will cause occasional bursts 1497 of pre-congestion. This 'early warning system' decouples the process 1498 of adding customers from the provisioning process. This should cut 1499 the time to add a customer when compared against admission control 1500 provided over native DiffServ [RFC2998], because it saves having to 1501 re-run the capacity planning process before adding each customer. 1503 Alternatively, before triggering an upgrade, the long term pre- 1504 congestion volume on each link can be used to balance traffic load 1505 across the PCN-domain by adjusting the link weights of the routing 1506 system. When an upgrade to a link's configured PCN-rates is 1507 required, it may also be necessary to upgrade the physical capacity 1508 available to other classes. But usually there will be sufficient 1509 physical capacity for the upgrade to go ahead as a simple 1510 configuration change. Alternatively, [Songhurst] has proposed an 1511 adaptive rather than preconfigured system, where the configured PCN- 1512 lower-rate is replaced with a high and low water mark and the marking 1513 algorithm automatically optimises how physical capacity is shared 1514 using the relative loads from PCN and other traffic classes. 1516 All the above processes require just three extra counters associated 1517 with each PCN queue: PCN-markings associated with the PCN-lower-rate 1518 and PCN-upper-rate, and drop. Every time a PCN packet is marked or 1519 dropped its size in bytes should be added to the appropriate counter. 1520 Then the management system can read the counters at any time and 1521 subtract a previous reading to establish the incremental volume of 1522 each type of (pre-)congestion. Readings should be taken frequently, 1523 so that anomalous events (eg re-routes) can be separated from regular 1524 fluctuating demand if required. 1526 8.3. Accounting OAM 1528 Accounting is only done at trust boundaries so it is out of scope of 1529 the initial Charter of the PCN WG which is confined to intra-domain 1530 issues. Use of PCN internal to a domain makes no difference to the 1531 flow signalling events crossing trust boundaries outside the PCN- 1532 domain, which are typically used for accounting. 1534 8.4. Fault OAM 1536 Fault OAM is about preventing faults, telling the management system 1537 (or manual operator) that the system has recovered (or not) from a 1538 failure, and about maintaining information to aid fault diagnosis. 1540 Admission blocking and particularly flow termination mechanisms 1541 should rarely be needed in practice. It would be unfortunate if they 1542 didn't work after an option had been accidentally disabled. 1543 Therefore it will be necessary to regularly test that the live system 1544 works as intended (devising a meaningful test is left as an exercise 1545 for the operator). 1547 Section 5.9 describes how the PCN architecture has been designed to 1548 ensure admitted flows continue gracefully after recovering 1549 automatically from link or node failures. The need to record and 1550 monitor re-routing events affecting signalling is unchanged by the 1551 addition of PCN to a DiffServ domain. Similarly, re-routing events 1552 within the PCN-domain will be recorded and monitored just as they 1553 would be without PCN. 1555 PCN-marking does make it possible to record 'near-misses'. For 1556 instance, at the PCN-egress-node a 'reporting threshold' could be set 1557 to monitor how often the system comes close to triggering flow 1558 blocking without actually doing so. Similarly, bursts of flow 1559 termination marking could be recorded even if they are not 1560 sufficiently sustained to trigger flow termination. Such statistics 1561 could be correlated with per-queue counts of marking volume (Section 1562 8.2) to upgrade resources in danger of causing service degradation, 1563 or to trigger manual tracing of intermittent incipient errors that 1564 would otherwise have gone unnoticed. 1566 Finally, of course, many faults are caused by failings in the 1567 management process ('human error'): a wrongly configured address in a 1568 node, a wrong address given in a signalling protocol, a wrongly 1569 configured parameter in a queueing algorithm, a node set into a 1570 different mode from other nodes, and so on. Generally, a clean 1571 design with few configurable options ensures this class of faults can 1572 be traced more easily and prevented more often. Sound management 1573 practice at run-time also helps. For instance: a management system 1574 should be used that constrains configuration changes within system 1575 rules (eg preventing an option setting inconsistent with other 1576 nodes); configuration options should also be recorded in an offline 1577 database; and regular automatic consistency checks between live 1578 systems and the database. PCN adds nothing specific to this class of 1579 problems. By the time standards are in place, we expect that the PCN 1580 WG will have ruthlessly removed gratuitous configuration choices. 1581 However, at the time of writing, the WG is yet to choose between 1582 multiple competing proposals, so the range of possible options in 1583 Section 8.1 does seem rather wide compared to the original near-zero 1584 configuration intent of the architecture. 1586 8.5. Security OAM 1588 Security OAM is about using secure operational practices as well as 1589 being able to track security breaches or near-misses at run-time. 1590 PCN adds few specifics to the general good practice required in this 1591 field [RFC4778], other than those below. The correct functions of 1592 the system should be monitored (Section 8.2) in multiple independent 1593 ways and correlated to detect possible security breaches. Persistent 1594 (pre-)congestion marking should raise an alarm (both on the node 1595 doing the marking and on the PCN-egress-node metering it). 1596 Similarly, persistently poor external QoS metrics such as jitter or 1597 MOS should raise an alarm. The following are examples of symptoms 1598 that may be the result of innocent faults, rather than attacks, but 1599 until diagnosed they should be logged and trigger a security alarm: 1601 o Anomalous patterns of non-conforming incoming signals and packets 1602 rejected at the PCN-ingress-nodes (eg packets already marked PCN- 1603 capable, or traffic persistently starving token bucket policers). 1605 o PCN-capable packets arriving at a PCN-egress-node with no 1606 associated state for mapping them to a valid ingress-egress- 1607 aggregate. 1609 o A PCN-ingress-node receiving feedback signals about the pre- 1610 congestion level on a non-existent aggregate, or that are 1611 inconsistent with other signals (eg unexpected sequence numbers, 1612 inconsistent addressing, conflicting reports of the pre-congestion 1613 level, etc). 1615 o Pre-congestion marking arriving at an PCN-egress-node with 1616 (pre-)congestion markings focused on particular flows, rather than 1617 randomly distributed throughout the aggregate. 1619 9. IANA Considerations 1621 This memo includes no request to IANA. 1623 10. Security considerations 1625 Security considerations essentially come from the Trust Assumption 1626 (Section 3.1), ie that all PCN-nodes are PCN-enabled and trust each 1627 other for truthful PCN-marking and transport. PCN splits 1628 functionality between PCN-interior-nodes and PCN-boundary-nodes, and 1629 the security considerations are somewhat different for each, mainly 1630 because PCN-boundary-nodes are flow-aware and PCN-interior-nodes are 1631 not. 1633 o because the PCN-boundary-nodes are flow-aware, they are trusted to 1634 use that awareness correctly. The degree of trust required 1635 depends on the kinds of decisions they have to make and the kinds 1636 of information they need to make them. For example when the PCN- 1637 boundary-node needs to know the contents of the sessions for 1638 making the admission and termination decisions (perhaps based on 1639 the MLPP precedence), or when the contents are highly classified, 1640 then the security requirements for the PCN-boundary-nodes involved 1641 will also need to be high. 1643 o the PCN-ingress-nodes police packets to ensure a flow sticks 1644 within its agreed limit, and to ensure that only flows which have 1645 been admitted contribute PCN-traffic into the PCN-domain. The 1646 policer must drop (or perhaps re-mark to a different DSCP) any 1647 PCN-packets received that are outside this remit. This is similar 1648 to the existing IntServ behaviour. Between them the PCN-boundary- 1649 nodes must encircle the PCN-domain, otherwise PCN-packets could 1650 enter the PCN-domain without being subject to admission control, 1651 which would potentially destroy the QoS of existing flows. 1653 o PCN-interior-nodes aren't flow-aware. This prevents some security 1654 attacks where an attacker targets specific flows in the data plane 1655 - for instance for DoS or eavesdropping. 1657 o PCN-marking by the PCN-interior-nodes along the packet forwarding 1658 path needs to be trusted, because the PCN-boundary-nodes rely on 1659 this information. For instance a rogue PCN-interior-node could 1660 PCN-mark all packets so that no flows were admitted. Another 1661 possibility is that it doesn't PCN-mark any packets, even when 1662 it's pre-congested. More subtly, the rogue PCN-interior-node 1663 could perform these attacks selectively on particular flows, or it 1664 could PCN-mark the correct fraction overall, but carefully choose 1665 which flows it marked. 1667 o the PCN-boundary-nodes should be able to deal with DoS attacks and 1668 state exhaustion attacks based on fast changes in per flow 1669 signalling. 1671 o the signalling between the PCN-boundary-nodes (and possibly a 1672 central control node) must be protected from attacks. For example 1673 the recipient needs to validate that the message is indeed from 1674 the node that claims to have sent it. Possible measures include 1675 digest authentication and protection against replay and man-in- 1676 the-middle attacks. For the specific protocol RSVP, hop-by-hop 1677 authentication is in [RFC2747], and 1678 [I-D.behringer-tsvwg-rsvp-security-groupkeying] may also be 1679 useful; for a generic signalling protocol the PCN WG document on 1680 "Requirements for signalling" will describe the requirements in 1681 more detail. 1683 Operational security advice is given in Section 8.5. 1685 11. Conclusions 1687 The document describes a general architecture for flow admission and 1688 termination based on aggregated pre-congestion information in order 1689 to protect the quality of service of established inelastic flows 1690 within a single DiffServ domain. The main topic is the functional 1691 architecture (first covered at a high level and then at a greater 1692 level of detail). It also mentions other topics like the assumptions 1693 and open issues. 1695 12. Acknowledgements 1697 This document is a revised version of [I-D.eardley-pcn-architecture]. 1698 Its authors were: P. Eardley, J. Babiarz, K. Chan, A. Charny, R. 1699 Geib, G. Karagiannis, M. Menth, T. Tsou. They are therefore 1700 contributors to this document. 1702 Thanks to those who've made comments on 1703 [I-D.eardley-pcn-architecture] and on earlier versions of this draft: 1704 Lachlan Andrew, Joe Babiarz, Fred Baker, David Black, Steven Blake, 1705 Bob Briscoe, Ken Carlberg, Anna Charny, Joachim Charzinski, Andras 1706 Csaszar, Lars Eggert, Ruediger Geib, Robert Hancock, Georgios 1707 Karagiannis, Michael Menth, Tom Taylor, Tina Tsou, Delei Yu. Thanks 1708 to Bob Briscoe who extensively revised the Operations and Management 1709 section. 1711 This document is the result of discussions in the PCN WG and 1712 forerunner activity in the TSVWG. A number of previous drafts were 1713 presented to TSVWG: [I-D.chan-pcn-problem-statement], 1714 [I-D.briscoe-tsvwg-cl-architecture], [I-D.briscoe-tsvwg-cl-phb], 1715 [I-D.charny-pcn-single-marking], [I-D.babiarz-pcn-sip-cap], 1716 [I-D.lefaucheur-rsvp-ecn], [I-D.westberg-pcn-load-control]. The 1717 authors of them were: B, Briscoe, P. Eardley, D. Songhurst, F. Le 1718 Faucheur, A. Charny, J. Babiarz, K. Chan, S. Dudley, G. Karagiannis, 1719 A. Bader, L. Westberg, J. Zhang, V. Liatsos, X-G. Liu, A. Bhargava. 1721 13. Comments Solicited 1723 Comments and questions are encouraged and very welcome. They can be 1724 addressed to the IETF PCN working group mailing list . 1726 14. Changes 1728 Changes from -01 to -02: 1730 o S1: Benefits: provisioning bullet extended to stress that PCN does 1731 not use RFC2475-style traffic conditioning. 1733 o S1: Deployment models: mentioned, as variant of PCN-domain 1734 extending to end nodes, that may extend to LAN edge switch. 1736 o S3.1: Trust Assumption: added note about not needing PCN-marking 1737 capability if known that an interface cannot become pre-congested. 1739 o S4: now divided into sub-sections 1741 o S4.1: Admission control: added second proposed method for how to 1742 decide to block new flows (PCN-egress-node receives one (or 1743 several) PCN-marked packets). 1745 o S5: Probing sub-section removed. Material now in new S7. 1747 o S5.6: Addressing: clarified how PCN-ingress-node can discover 1748 address of PCN-egress-node 1750 o S5.6: Addressing: centralised node case, added that PCN-ingress- 1751 node may need to know address of PCN-egress-node 1753 o S5.8: Tunnelling: added case of "partially PCN-capable tunnel" and 1754 degraded bullet on this in S6 (Open Issues) 1756 o S7: Probing: new section. Much more comprehensive than old S5.5. 1758 o S8: Operations and Management: substantially revised. 1760 o other minor changes not affecting semantics 1762 Changes from -00 to -01: 1764 In addition to clarifications and nit squashing, the main changes 1765 are: 1767 o S1: Benefits: added one about provisioning (and contrast with 1768 DiffServ SLAs) 1770 o S1: Benefits: clarified that the objective is also to stop PCN- 1771 packets being significantly delayed (previously only mentioned not 1772 dropping packets) 1774 o S1: Deployment models: added one where policing is done at ingress 1775 of access network and not at ingress of PCN-domain (assume trust 1776 between networks) 1778 o S1: Deployment models: corrected MPLS-TE to MPLS 1780 o S2: Terminology: adjusted definition of PCN-domain 1782 o S3.5: Other assumptions: corrected, so that two assumptions (PCN- 1783 nodes not performing ECN and PCN-ingress-node discarding arriving 1784 CE packet) only apply if the PCN WG decides to encode PCN-marking 1785 in the ECN-field. 1787 o S4 & S5: changed PCN-marking algorithm to marking behaviour 1789 o S4: clarified that PCN-interior-node functionality applies for 1790 each outgoing interface, and added clarification: "The 1791 functionality is also done by PCN-ingress-nodes for their outgoing 1792 interfaces (ie those 'inside' the PCN-domain)." 1794 o S4 (near end): altered to say that a PCN-node "should" dedicate 1795 some capacity to lower priority traffic so that it isn't starved 1796 (was "may") 1798 o S5: clarified to say that PCN functionality is done on an 1799 'interface' (rather than on a 'link') 1801 o S5.2: deleted erroneous mention of service level agreement 1803 o S5.5: Probing: re-written, especially to distinguish probing to 1804 test the ingress-egress-aggregate from probing to test a 1805 particular ECMP path. 1807 o S5.7: Addressing: added mention of probing; added that in the case 1808 where traffic is always tunnelled across the PCN-domain, add a 1809 note that he PCN-ingress-node needs to know the address of the 1810 PCN-egress-node. 1812 o S5.8: Tunnelling: re-written, especially to provide a clearer 1813 description of copying on tunnel entry/exit, by adding explanation 1814 (keeping tunnel encaps/decaps and PCN-marking orthogonal), 1815 deleting one bullet ("if the inner header's marking state is more 1816 sever then it is preserved" - shouldn't happen), and better 1817 referencing of other IETF documents. 1819 o S6: Open issues: stressed that "NOTE: Potential solutions are out 1820 of scope for this document" and edited a couple of sentences that 1821 were close to solution space. 1823 o S6: Open issues: added one about scenarios with only one tunnel 1824 endpoint in the PCN domain . 1826 o S6: Open issues: ECMP: added under-admission as another potential 1827 risk 1829 o S6: Open issues: added one about "Silent at start" 1831 o S10: Conclusions: a small conclusions section added. 1833 15. Informative References 1835 [I-D.briscoe-tsvwg-cl-architecture] 1836 Briscoe, B., "An edge-to-edge Deployment Model for Pre- 1837 Congestion Notification: Admission Control over a 1838 DiffServ Region", draft-briscoe-tsvwg-cl-architecture-04 1839 (work in progress), October 2006. 1841 [I-D.briscoe-tsvwg-cl-phb] 1842 Briscoe, B., "Pre-Congestion Notification marking", 1843 draft-briscoe-tsvwg-cl-phb-03 (work in progress), 1844 October 2006. 1846 [I-D.babiarz-pcn-sip-cap] 1847 Babiarz, J., "SIP Controlled Admission and Preemption", 1848 draft-babiarz-pcn-sip-cap-00 (work in progress), 1849 October 2006. 1851 [I-D.lefaucheur-rsvp-ecn] 1852 Faucheur, F., "RSVP Extensions for Admission Control over 1853 Diffserv using Pre-congestion Notification (PCN)", 1854 draft-lefaucheur-rsvp-ecn-01 (work in progress), 1855 June 2006. 1857 [I-D.chan-pcn-problem-statement] 1858 Chan, K., "Pre-Congestion Notification Problem Statement", 1859 draft-chan-pcn-problem-statement-01 (work in progress), 1860 October 2006. 1862 [I-D.ietf-pwe3-congestion-frmwk] 1863 Bryant, S., "Pseudowire Congestion Control Framework", 1864 draft-ietf-pwe3-congestion-frmwk-00 (work in progress), 1865 February 2007. 1867 [I-D.ietf-tsvwg-admitted-realtime-dscp] 1868 "DSCPs for Capacity-Admitted Traffic", November 2006, . 1872 [I-D.briscoe-tsvwg-ecn-tunnel] 1873 "Layered Encapsulation of Congestion Notification", 1874 June 2007, . 1877 [I-D.ietf-tsvwg-ecn-mpls] 1878 "Explicit Congestion Marking in MPLS", October 2007, . 1882 [I-D.charny-pcn-single-marking] 1883 "Pre-Congestion Notification Using Single Marking for 1884 Admission and Termination", November 2007, . 1888 [I-D.eardley-pcn-architecture] 1889 "Pre-Congestion Notification Architecture", June 2007, . 1893 [I-D.westberg-pcn-load-control] 1894 "LC-PCN: The Load Control PCN Solution", November 2007, . 1898 [I-D.behringer-tsvwg-rsvp-security-groupkeying] 1899 "A Framework for RSVP Security Using Dynamic Group 1900 Keying", June 2007, . 1903 [I-D.briscoe-re-pcn-border-cheat] 1904 "Emulating Border Flow Policing using Re-ECN on Bulk 1905 Data", June 2006, . 1908 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 1909 RFC 4303, December 2005. 1911 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 1912 and W. Weiss, "An Architecture for Differentiated 1913 Services", RFC 2475, December 1998. 1915 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 1916 J., Courtney, W., Davari, S., Firoiu, V., and D. 1917 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 1918 Behavior)", RFC 3246, March 2002. 1920 [RFC4594] Babiarz, J., Chan, K., and F. Baker, "Configuration 1921 Guidelines for DiffServ Service Classes", RFC 4594, 1922 August 2006. 1924 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1925 of Explicit Congestion Notification (ECN) to IP", 1926 RFC 3168, September 2001. 1928 [RFC2211] Wroclawski, J., "Specification of the Controlled-Load 1929 Network Element Service", RFC 2211, September 1997. 1931 [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., 1932 Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. 1933 Felstaine, "A Framework for Integrated Services Operation 1934 over Diffserv Networks", RFC 2998, November 2000. 1936 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 1937 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 1938 Protocol Label Switching (MPLS) Support of Differentiated 1939 Services", RFC 3270, May 2002. 1941 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 1942 Services in the Internet Architecture: an Overview", 1943 RFC 1633, June 1994. 1945 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1946 Requirement Levels", BCP 14, RFC 2119, March 1997. 1948 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1949 RFC 2983, October 2000. 1951 [RFC2747] Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic 1952 Authentication", RFC 2747, January 2000. 1954 [RFC3411] Harrington, D., Presuhn, R., and B. Wijnen, "An 1955 Architecture for Describing Simple Network Management 1956 Protocol (SNMP) Management Frameworks", STD 62, RFC 3411, 1957 December 2002. 1959 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 1960 Metric for IP Performance Metrics (IPPM)", RFC 3393, 1961 November 2002. 1963 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1964 Zekauskas, "A One-way Active Measurement Protocol 1965 (OWAMP)", RFC 4656, September 2006. 1967 [RFC4778] Kaeo, M., "Operational Security Current Practices in 1968 Internet Service Provider Environments", RFC 4778, 1969 January 2007. 1971 [ITU-MLPP] 1972 "Multilevel Precedence and Pre-emption Service (MLPP)", 1973 ITU-T Recommendation I.255.3, 1990. 1975 [Iyer] "An approach to alleviate link overload as observed on an 1976 IP backbone", IEEE INFOCOM , 2003, 1977 . 1979 [Shenker] "Fundamental design issues for the future Internet", IEEE 1980 Journal on selected areas in communications pp 1176 - 1981 1188, Vol 13 (7), 1995. 1983 [Y.1541] "Network Performance Objectives for IP-based Services", 1984 ITU-T Recommendation Y.1541, February 2006. 1986 [P.800] "Methods for subjective determination of transmission 1987 quality", ITU-T Recommendation P.800, August 1996. 1989 [Songhurst] 1990 "Guaranteed QoS Synthesis for Admission Control with 1991 Shared Capacity", BT Technical Report TR-CXR9-2006-001, 1992 Feburary 2006, . 1995 [PCN-email-ECMP] 1996 "Email to PCN WG mailing list", November 2007, . 1999 [PCN-email-traffic-empty-aggregates] 2000 "Email to PCN WG mailing list", October 2007, . 2003 Author's Address 2005 Philip Eardley 2006 BT 2007 B54/77, Sirius House Adastral Park Martlesham Heath 2008 Ipswich, Suffolk IP5 3RE 2009 United Kingdom 2011 Email: philip.eardley@bt.com 2013 Full Copyright Statement 2015 Copyright (C) The IETF Trust (2007). 2017 This document is subject to the rights, licenses and restrictions 2018 contained in BCP 78, and except as set forth therein, the authors 2019 retain all their rights. 2021 This document and the information contained herein are provided on an 2022 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2023 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2024 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2025 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2026 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2027 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2029 Intellectual Property 2031 The IETF takes no position regarding the validity or scope of any 2032 Intellectual Property Rights or other rights that might be claimed to 2033 pertain to the implementation or use of the technology described in 2034 this document or the extent to which any license under such rights 2035 might or might not be available; nor does it represent that it has 2036 made any independent effort to identify any such rights. Information 2037 on the procedures with respect to rights in RFC documents can be 2038 found in BCP 78 and BCP 79. 2040 Copies of IPR disclosures made to the IETF Secretariat and any 2041 assurances of licenses to be made available, or the result of an 2042 attempt made to obtain a general license or permission for the use of 2043 such proprietary rights by implementers or users of this 2044 specification can be obtained from the IETF on-line IPR repository at 2045 http://www.ietf.org/ipr. 2047 The IETF invites any interested party to bring to its attention any 2048 copyrights, patents or patent applications, or other proprietary 2049 rights that may cover technology that may be required to implement 2050 this standard. Please address the information to the IETF at 2051 ietf-ipr@ietf.org. 2053 Acknowledgment 2055 Funding for the RFC Editor function is provided by the IETF 2056 Administrative Support Activity (IASA).