idnits 2.17.1 draft-westberg-loadcntr-03.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 2000) is 8777 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC2119' on line 505 looks like a reference -- Missing reference section? 'RFC2475' on line 462 looks like a reference -- Missing reference section? 'Berson97' on line 489 looks like a reference -- Missing reference section? 'Guerin97' on line 492 looks like a reference -- Missing reference section? 'Stoica99' on line 486 looks like a reference -- Missing reference section? 'Bernet99' on line 482 looks like a reference -- Missing reference section? 'Tur99' on line 498 looks like a reference -- Missing reference section? 'RFC2481' on line 779 looks like a reference -- Missing reference section? 'RFC2402' on line 468 looks like a reference -- Missing reference section? '2406' on line 468 looks like a reference -- Missing reference section? 'RFC2406' on line 479 looks like a reference -- Missing reference section? 'Gross99' on line 714 looks like a reference -- Missing reference section? 'IAB-QoS' on line 502 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 1 warning (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft Load Control April 2000 4 Load Control of Real-Time Traffic 5 draft-westberg-loadcntr-03.txt 6 Document Revision: 1.3 7 2000/04/19 12:43:19 9 A Two-bit Resource Allocation Scheme 11 April 2000 13 L. Westberg 14 Z. R. Turanyi 15 D. Partain 17 Ericsson 19 Status of this Memo 21 This document is an Internet-Draft and is in full conformance with all 22 provisions of Section 10 of RFC2026. 24 Internet-Drafts are working documents of the Internet Engineering Task 25 Force (IETF), its areas, and its working groups. Note that other groups 26 may also distribute working documents as Internet-Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet- Drafts as reference material 31 or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html. 39 1. Abstract 41 The purpose of this memo is to present a new resource allocation 42 scheme for DiffServ (DS) networks, called Load Control. The main 43 purpose of Load Control is to provide a simple and scalable solution 44 to the resource provisioning problem. 46 Load Control addresses two particular issues: 47 1. Measurement-based access control, whereby a probe packet is 48 sent along the forwarding path in a network to determine 49 whether a flow can be admitted based upon the current 50 congestion state of the network 51 2. A lightweight reservation of a certain amount of network 52 resources. 54 Load Control uses two-bit markers in packet headers to carry load 55 information from core routers to edge devices. The scheme provides 56 the capability of controlling the traffic load in the network without 57 requiring signaling or any per-flow processing in the core routers. 58 The complexity of Load Control is kept to a minimum to make 59 implementation simple. 61 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 62 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 63 document are to be interpreted as described in [RFC2119]. 65 2. Background and Motivation 67 The amount of traffic carried on the Internet is now greater than the 68 traffic on the world's telephony network. Still, Internet-based 69 communication services generate less income than plain old telephony 70 services. Enabling value-added services over the Internet is 71 therefore crucial for service providers. One significant class of 72 such value-added services requires real-time packet transportation. 73 It can be expected that these real-time services will be popular as 74 they replicate or are natural extensions of existing communication 75 services like telephony. Exact and reliable resource management 76 (e.g., admission control) is essential for achieving high utilization 77 in networks with real-time transportation capabilities. The problem 78 is difficult mainly due to scalability issues. 80 With the introduction of differentiated services (DS) [RFC2475], it 81 is now possible to provide large scale, real-time services. The basic 82 idea of DiffServ is that, rather than classifying packets at each 83 router, packets are only classified at the edge devices. The result 84 - the required packet treatment - is stored and carried in the packet 85 headers, and core routers can carry out appropriate scheduling. 87 The current definition of DiffServ, however, does not contain any 88 simple, scalable solution to the problem of resource provisioning and 89 control. A number of approaches to solving the problem already exist 90 [Berson97, Guerin97, Stoica99, Bernet99]. The scheme presented in 91 this document does not require any state aggregation and aims at 92 extreme simplicity and low cost of implementation along with good 93 scaling properties. Load control operates edge-to-edge in a DS 94 domain, or between two RSVP-capable routers, where only the edge 95 devices keep flow state and do per-flow processing. The main purpose 96 of Load Control is to provide a simple and scalable solution to the 97 resource provisioning problem. 99 3. Overview 101 Load control is achieved by two actions: measurement-based admission 102 control of incoming requests and the dropping of admitted flows in 103 case of exceptional events such as link failures. Load Control uses 104 two-bit markers in the packet headers to gather information about the 105 load level along various paths through the network. The core routers 106 are able to mark passing packets to signal the exhaustion of 107 resources to the edge devices. 109 For admission control, the resource state of core routers is gathered 110 by sending a specially marked packet, denoted a "probe" packet, from 111 the ingress to the egress edge device. The probe result is then used 112 by the ingress to decide flow acceptance or rejection and to set up 113 traffic conditioning/policy. If rigid admission control is required, 114 soft-state based reservations are also supported. In this case the 115 probe packet does both the probing and allocation of resources along 116 the path. The latter method is comparable to signaling based schemes 117 but does not require processing of signaling messages in the core 118 routers. 120 Under normal circumstances, admission control is enough to control 121 the load in the network. Nevertheless, when exceptional events (such 122 as link failures) cause too much traffic to be re-routed over a link, 123 the resulting severe congestion may degrade the quality of all of the 124 flows on the link. In that case, the best solution might be to keep 125 existing flows and suffer the loss of quality. However, for some 126 services, it may be desirable to drop some of the previously admitted 127 flows to protect the quality of the remaining flows. Thus, when 128 severe congestion occurs, the core routers mark the headers of all 129 (not only probe) packets to notify the edge devices of the congestion 130 condition. 132 In the following sections, we assume a DS (DiffServ) domain where 133 connection requests arrive at the edges of the domain via RSVP, at 134 the request of a Bandwidth Broker, or by other means. The requests 135 may be generated directly at the edge by a gateway, which provides 136 connection to other types of networks, or in hosts that are connected 137 directly to the domain. 139 4. Operation of Load Control 141 The load control scheme has two modes of operation: 143 a) 'Simple marking': This refers to a measurement-based admission 144 scheme where routers measure the traffic volume and base the 145 marking on these results. 147 b) 'Unit-based reservations': A "unit" represents a share of 148 bandwidth in the network that could be reserved by the edge 149 devices. This mode makes it possible to perform resource 150 reservations, independently of the amount of traffic that is 151 actually transmitted. 153 Both modes can perform admission control of incoming requests and 154 indicate exceptional events. 156 In the appendices, we present some analysis of Load Control 157 properties, but a more detailed investigation can be found in 158 [Tur99]. 160 4.1. Simple Marking 162 The idea of simple marking is that core routers measure the traffic, 163 and, if they encounter near exhaustion of resources, they mark 164 passing probe packets and thereby notify the edge devices of the lack 165 of resources. 167 The scheme has the following steps of operation: 169 1) Resource Probing: Before establishing the flow, the initiating 170 edge device sends a probe packet into the network. The probe 171 packet passes through the same routers as the actual traffic will 172 pass through (in any case, with a high degree of probability) and 173 is exposed to the marking function in each router. The marking 174 performs an OR-operation of its own status and the incoming probe 175 packet status (a packet once marked must not be changed). When 176 the packet reaches the egress edge device, its header will reflect 177 the aggregated resource status along that path. 179 2) Send resource status to ingress: When the egress edge device 180 receives the probe packet, it copies the marker from the header to 181 the header/payload of a reverse packet and sends it back to the 182 initiating party (the ingress edge device). The probe packet may 183 be discarded, converted to an ordinary data packet, or 184 encapsulated (as mentioned above) and sent to the ingress edge 185 device. The packet containing the probing result can also serve as 186 a probe packet for the reverse path. This allows the initiating 187 party to check for bi-directional resources. 189 3) Acceptance/Rejection: The report packet is returned to the 190 initiating ingress edge device, which uses the result of the probe 191 to admit or block the request by setting up appropriate packet 192 filtering, measuring, and marking rules. 194 4) Reaction to exceptional events: If a core router detects severe 195 congestion on an interface, it starts marking all packets on that 196 interface. If the egress edge device receives a marked packet 197 which is not a probe packet, this can be interpreted as a sign of 198 severe congestion along the path. The fact that the incoming 199 marked packet was not sent as a probe packet can be determined 200 from the packet content, by multi-field classification or by 201 checking the admittance state at the egress edge device. If 202 severe congestion occurs, a signaling message can be sent to the 203 ingress edge device, which can then take the appropriate action. 205 To make the scheme more robust against packet loss, the initiating 206 edge device MAY maintain a timer associated with each probe packet. 207 If a probe packet is lost, the device simply re-transmits on time- 208 out. How often and how many times the probe packet should be 209 retransmitted before failure is declared is an implementation issue, 210 but these parameters SHOULD be configurable (e.g., via an SNMP MIB). 211 Furthermore, whether probes are retransmitted at all SHOULD be 212 configurable. 214 4.2. Unit-based Reservations 216 While measurement-based admission control has important advantages 217 over non-measurement based algorithms, it has disadvantages as well. 218 Unit-based reservations allow the sources to keep their reservations 219 irrespective of the volume of the traffic they transmit. Although 220 the admission scheme is very similar to the simple marking case, the 221 presence of actual reservations is a fundamental difference. 223 Each flow can occupy any number of units of resources, and even 224 fractions of units by allowing a number of flows to share a common 225 resource unit. The unit is not necessarily a simple bandwidth value: 226 it may be defined in terms of any resource unit (e.g., effective 227 bandwidth) to support statistical multiplexing at packet level (use 228 of silence period). The definition of the unit may vary from network 229 to network and is outside the scope of this document. The basic idea 230 of unit-based reservation is to allow the edge devices periodically 231 to mark some of the data packets to refresh resource reservation. 232 Each refresh packet reserves one unit of resources for one refresh 233 period. Reservations are timed out after a refresh period and have to 234 be refreshed in a soft state manner. The length of the refresh 235 period must be the same throughout the DS domain and SHOULD be 236 configurable. 238 Core routers estimate the number of reservations by counting the 239 number of refresh packets during a refresh interval. If the router 240 runs out of units, it goes into blocking state, starts to mark probe 241 packets indicating congestion and thereby rejects new flows. The 242 probe packets that pass the router unmarked and the refresh packets 243 reserve one unit of resources for the following refresh period. 244 (Editor note: It is clear that we need to have the capability of 245 reserving more than one unit, but it is not yet clear how that will 246 be encoded in the packet header. See below.) Thus, after the probe 247 packet has passed along the path unmarked, the ingress edge device is 248 required to send the first reservation refresh packet during the next 249 refresh period. 251 If a flow occupies more than one unit, more than one probe packet may 252 be sent to allocate the required number of resources (an alternative 253 using only one packet should be defined). Similarly, more than one 254 refresh packet must be sent for such a flow. By proper definition of 255 the unit, a wide range of flows can be described and handled using 256 this simple mechanism. 258 If a probe packet was forwarded unmarked by a core router, but was 259 marked later downstream, that core router will not be notified and 260 will incorrectly maintain the reservation. However, as the flow is 261 rejected, no refresh packets will arrive, and the reservation will 262 time out at the end of the refresh period and will be released. 264 Severe congestion is handled in the same way as in 'Simple marking' 265 (see below). 267 If a refresh packet is lost, the downstream routers will 268 underestimate the number of reserved units. Refresh and probe packets 269 should therefore be protected from losses in the manner described 270 above. 272 Core routers estimate the number of allocated units by counting the 273 number of refresh packets during a refresh period. The accuracy of 274 the estimate can be increased by generating refresh packets evenly 275 spread in time over the refresh period. This minimizes errors 276 resulting from time alignment differences between routers and edge 277 devices. 279 4.3. Multiple Unit reservation 281 In some cases it might feasible to add functionality for reservation 282 of several units in one single reservation request. A similar 283 semantic (as the two-bit reservation scheme) could be used to provide 284 such functionality but it will of course require addition of a 285 integer value denoting the number of units. 287 The coding of such proposal is still under discussion and needs to 288 studied further. 290 4.4. Codepoints for Flow Types 292 In both variants of Load Control, routers making marking decisions 293 have very little information about the resource or QoS requirement of 294 the flow in question. The DS field of the probe packet can be used to 295 indicate the DiffServ class the flow will arrive on and thus the QoS 296 requirements. The marking function of core routers can take the 297 required PHB into account when deciding on the marking. 299 Information on the resource requirements for incoming flows can also 300 be expressed using the DS field by dividing real-time traffic into 301 classes based on resource requirements and using different codepoints 302 for different classes. If the DSCPs denote not only the PHB that the 303 flow is to receive, but implicitly also the bandwidth requirements 304 for the flow, core routers will be able to mark packets more 305 intelligently, resulting in less resource waste and greater 306 flexibility. 308 In the unit-based case, the major benefit is that the size of the 309 unit can be different in different classes, making it possible to 310 allocate resources with finer granularity. 312 5. Objects for Standardization 314 A forthcoming standard might only include the encoding of the Load 315 Control information into the IP header and some design 316 recommendations. 318 5.1. Packet Types 320 We need four types of packets in the algorithm: 322 - Ordinary Packet (OP) 323 - Probe Packet (PP) 324 - Marked Packet (MP) 325 - Refresh Packet (RP) 327 During transport through the network, a probe packet can be changed 328 to a marked packet. This indicates that at least one router does not 329 accept the reservation associated by the probe packet. 331 ------ Rejection ------ 332 | PP |---------------------->| MP | 333 ------ ------ 335 An ordinary packet can also be changed to a marked packet, meaning 336 that some exceptional event caused severe congestion on one link of 337 the path the packet took. 339 ------ Severe Congestion ------ 340 | OP |---------------------->| MP | 341 ------ ------ 343 In the simple marking scheme, only three packet types are used. 344 Refresh packets are treated as ordinary packets, except that these 345 packets cannot be changed to marked packets. 347 5.2. Coding of Packet Types 349 We have two alternative solutions for storing Load Control related 350 information in the packet headers: using new DS codepoints or using 351 the two currently unused bits (intended for ECN) in the DS byte. The 352 latter case is only considered in Appendix E. 354 In the first alternative (where PHBs are intended to be used together 355 for Load Control), two or three new codepoints would have to be 356 defined for probe, marked and (optionally) refresh packets. For 357 example, in the case of the EF PHB, in addition to the codepoint used 358 for the EF packets, EF-probe, EF-marked and EF-refresh packets can 359 also be sent. The new codepoints can be drawn from the LU/EXP space. 361 5.3. Behavior Description 363 The behavior of the edge devices depends greatly on the application 364 or signaling protocol that uses the load control scheme. Below we 365 only describe the few aspects of the edge device behavior that are 366 necessary for interworking with the core routers. 368 5.3.1. Behavior of the Core Routers 370 All core routers continuously maintain a state of accepting or 371 rejecting more flows. If the state is accepting, the router passes 372 all packets unchanged. If the state is congestion, then the router 373 changes the marking of incoming packets from probe to marked. 375 If the router is capable of detecting severe congestion, and this 376 occurs, then the router forwards both ordinary and probe packets as 377 marked. The router MUST NOT change the marking of refresh packets. 379 Addition for Unit-based Reservations: 381 The router uses the refresh and probe markers in packets to 382 maintain its estimation of reserved resources. A refresh packet 383 signals previously admitted resource usage, while a probe packet 384 signals a new request. When passed unmarked, both types of packets 385 reserve one unit for one refresh period. 387 5.3.2. Behavior of the Edge Devices 389 When a new reservation is needed, the ingress edge device should send 390 the appropriate number of packets marked as probe. 392 If the egress edge device receives a probe packet that is marked, 393 this means that the network has insufficient capacity along the path 394 between the two edge devices. The egress edge device should take care 395 of blocking the flow by notifying the ingress device. If the egress 396 device receives a marked packet that is not initially sent as probe 397 packet, it shall inform the ingress device to reject admitted flows. 398 This can be determined from the packet content, multi-field 399 classification of the IP header, or by checking the admittance state 400 at the egress edge device. 402 Addition for Unit-based Reservations: 404 For the unit-based reservation scheme, the ingress edge device 405 should generate the required number of refresh packets per refresh 406 period and per flow. If there are not enough data packets to mark 407 as refresh packets, the ingress device must generate dummy packets 408 and mark those as refresh packets. The generated refresh packets 409 should be as uniformly distributed through the refresh interval as 410 possible to minimize the effect of refresh interval timing between 411 routers. 413 6. Interworking with RSVP/Intserv 415 Load control can also be used in DiffServ regions (backbones) that 416 connect RSVP/Intserv regions. This inter-operation is described in 417 detail in [Bernet99]. For load control, border routers of the 418 DiffServ region must be RSVP-aware in order to detect the arrival of 419 new connections. 421 RSVP PATH messages can be used as probe packets to gather congestion 422 information along the path between the two border routers. When a new 423 RSVP path state is installed at the egress border router, the 424 collective admission state of the path (collected in the packet of 425 the PATH message) is also stored. If a RESV message for the installed 426 state arrives within a time period during which the congestion state 427 can be considered valid, then the egress border router can perform 428 the admission control for the DiffServ network as well. If the first 429 RESV message arrives too late, then the egress border router MUST 430 solicit a new (dummy) probe packet from the ingress router to 431 determine the current congestion state. 433 When the egress receives a marked packet that is not a PATH message 434 nor a dummy probe packet, this signals a severe congestion state 435 along the path. The identity of the ingress router can easily be 436 determined from the path state, but in this case the egress router 437 can itself decide to drop certain reservations. The ingress router 438 can be notified via ResvTear messages while the receiver end systems 439 get ResvErr messages. 441 RSVP routers can also be placed inside the domain. In this case, 442 probing is performed between RSVP routers instead of edge devices. 443 Thus adding a simple and cheap extension to non-RSVP capable routers, 444 correct admission control is possible on non-RSVP capable parts of an 445 end-to-end path. 447 Unit-based reservations can also be used to provide resources in a DS 448 domain that is used to provide VPN tunnels between customer sites. 449 Using a load control scheme, it is fast and easy to modify the size 450 of these tunnels. Thus, tunnel size selection can be a very dynamic 451 process. Note that tunnels are not necessarily real-time tunnels. 452 Packets of any DSCP can travel on them after receiving the 453 appropriate PHB. Even best-effort tunnels can be reserved this way. 454 Provisioning can be done on a per-DSCP basis or in aggregates as the 455 service provider wishes. 457 7. Security Considerations 459 We propose using two-bit markers in packet headers (DS field) to 460 reserve resources within a DiffServ domain. This poses similar 461 security problems to the use of the DS field to differentiate packets 462 in general [RFC2475]. 464 If the interior of the DS domain fully contains a tunnel, then by 465 copying the outer marking into the inner header at de-encapsulation, 466 load control can be exercised over the links of the tunnel as well. 467 The procedure is similar to the one described in [RFC2481]. As IPSec 468 [RFC2402, 2406] does not allow the copying of the DS field from the 469 outer to the inner header at de-encapsulation, load control cannot be 470 exercised over regions where IPSec tunnels are used. 472 8. Identification of Edge Nodes 474 In the absense of RSVP, an alternative method for identificatiof of 475 edge nodes will be required. This section needs to be written. 477 9. Multicast-related Issues 479 [RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload 480 (ESP)", RFC 2406, November 1998. 482 [Bernet99] Bernett, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L., 483 Speer, M., Braden, R., "Interoperation of RSVP/Intserv and Diffserv 484 Networks", Work in Progress, March 1999 486 [Stoica99] Stoica, I., et al "Per Hop Behaviors Based on Dynamic Packet 487 States", Work in Progress, February 1999 489 [Berson97] Berson, S. and Vincent, R., "Aggregation of Internet 490 Integrated Services State", Work in Progress, December 1997. 492 [Guerin97] Guerin, R., Blake, S. and Herzog, S.,"Aggregating RSVP based 493 QoS Requests", Work in Progress, November 1997. 495 [Gross99] Grossglauser, M., Tse, D. N. C., "A Time-Scale Decomposition 496 Approach to Measurement-Based Admission Control", Infocom '99 498 [Tur99] Z. R. Turanyi, L. Westberg "Load Control: Lightweight 499 Provisioning of Internet Resources" submitted to Networking 2000, 500 Paris, May 2000, http: //www.ericsson.co.hu/ethzrt/ 502 [IAB-QoS] G. Huston (Internet Architecture Board), "Next Steps for the 503 IP QoS Architecture", Work in Progress, March 2000. 505 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 506 Requirement Levels", BCP 14, RFC2119, March 1997. 508 Appendix A. Admission Precision of Simple Marking 510 Simple marking is basically a measurement-based admission control 511 scheme, where flows do not say anything about their traffic 512 characteristics. In addition, flow departure is not signaled 513 explicitly. 515 When the network carries more types of flows with different bandwidth 516 requirements, the core routers do not know the bandwidth requirements 517 of the incoming flows. They simply declare whether they will accept 518 more flows or not irrespective of the bandwidth demands of the new 519 flow. Thus the marking algorithm in the routers should conservatively 520 always expect the largest type of flow that the network carries and 521 start rejecting flows when there is not enough bandwidth left for one 522 such flow. On the positive side, this will result in fair rejection 523 among different flow types, but on the negative side, some bandwidth 524 will be wasted. However, if the links of our domain can carry at 525 least several hundred requests even from the most bandwidth-demanding 526 types of flow, then this is not a significant waste. 528 Appendix B. Effect of Delays on Admission 530 When a probe packet is passed unmarked without correcting the 531 estimate of the free resources, we in fact admit a flow without 532 immediately reserving resources for it. The reservation will be 533 implicitly done later by the arriving traffic or refresh packets of 534 the flow. During the time between admission and the arrival of the 535 traffic of the flow, new requests can be admitted without taking the 536 previously admitted flow into account. To illustrate the effects of 537 this delay, we took an old and simple Markovian example. Flows are 538 identical with an average flow-holding time of 180 seconds and flow 539 arrivals and departures follow a Poisson process. Let the link be 540 able to carry N calls and let the delay be T. The link starts 541 refusing flows when the measured traffic exceeds N-H calls. We can 542 say that a space of size H is put aside to cater for the errors 543 caused by the delay. 545 If the link is properly dimensioned, then the usual blocking ratio 546 should not exceed 1%. However, in a mass call situation (such as 547 occurs at New Year's Eve for example) it can be considerably higher. 548 In this example, 50% blocking was chosen to demonstrate the extreme 549 load case. Thus, the offered traffic is roughly twice the link 550 capacity. 552 QoS violation occurs if during time T the difference between the 553 number of arriving and departing flows is larger than H. Under the 554 above assumptions, the chance of QoS violation can be calculated. 555 Naturally the larger H is, the less the chance is that QoS will be 556 violated. The required value of H can be determined for a low value 557 of QoS violation probability (e.g. 10e-5). 559 The following table presents the value of H as a function of link 560 size (N), delay length (T) and load (causing 1% or 50% blocking). 562 | 1ms | 10ms | 100ms | 500ms | 1s | 563 | 1% | 50% | 1% | 50% | 1% | 50% | 1% | 50% | 1% | 50% | 564 ------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 565 50 | 2 | 2 | 2 | 3 | 3 | 4 | 4 | 5 | 5 | 7 | 566 100 | 2 | 2 | 3 | 3 | 4 | 4 | 4 | 7 | 6 | 9 | 567 500 | 2 | 3 | 3 | 4 | 4 | 7 | 9 | 13 | 12 | 18 | 568 1000 | 3 | 3 | 4 | 4 | 5 | 9 | 12 | 18 | 16 | 25 | 569 5000 | 3 | 4 | 5 | 7 | 12 | 18 | 24 | 44 | 33 | 69 | 570 10000 | 4 | 4 | 7 | 9 | 16 | 25 | 33 | 69 | 47 | 113 | 572 The amount of required safety margin is highest for small links, 573 since less statistical multiplexing is possible there. 575 Appendix C. A Simple Algorithm for Core Routers 577 In this appendix, we present an algorithm for core routers that use 578 unit-based reservations. The algorithm is simple, so it can be easily 579 implemented in hardware by simple counters. Its inputs are the 580 refresh interval and the number of flows allowed on the link. The 581 latter is denoted by . (We assume flows with similar 582 characteristics (e.g., voice) and that one flow sends one refresh 583 packet per refresh interval.) If the network uses more DSCPs for 584 real-time traffic, then a separate copy of the algorithm may be run 585 for each DSCP, resulting in per-DSCP admission. 587 The algorithm counts the number of refresh and admitted probe packets 588 in refresh intervals (). The result of the counting is an 589 upper limit on the number of units reserved on the link, as some 590 reservations may have gone by the end of the refresh interval. The 591 value of this counter is used in the next interval to decide on 592 admission (). When a new reservation is admitted, this value is 593 increased to take the new reservation into account. If this value is 594 high above the admission limit, then we start sending severe 595 congestion notifications by marking regular packets as well. 597 On initialization: 598 last = 0 599 count = 0 601 On arrival of a refresh packet 602 count++ 604 On arrival of a probe packet 605 if last < threshold then 606 last ++ 607 count ++ 608 elseif 609 Mark Packet 610 endif 612 On arrival of a regular packet 613 if last < threshold*1.1 then 614 Mark Packet 615 endif 617 At the end of the refresh interval 618 last = count 619 count = 0 621 Appendix D. Simulation Results 623 The purpose of the simulations described in this appendix is to give 624 some insight into the performance of load control. The simulation 625 cases are by no means representative, and the scheme may work 626 differently in other situations. In section C.1, the simple marking 627 case is demonstrated with a purely measurement-based admission 628 algorithm by using a single link with both constant bit-rate and 629 on/off sources. In appendix C.2, the unit-based reservation method is 630 shown, using the algorithm in appendix B. 632 Severe congestion signaling is not used in any of the examples; only 633 admission control is used. 635 We simulated a very simple network of one link. This can be viewed as 636 the single bottleneck in the domain. The link had a 2 Mbit/s 637 throughput, 50% of which was designated to carry real-time traffic. 638 The round trip propagation delay was set to 100ms. The real time 639 flows arrived according to a Poisson process, holding time was 640 exponential with a 90 second mean. The arrival rate of flows was set 641 to produce approximately 50% blocking. Only real-time traffic was 642 simulated, so scheduling was simple FIFO. 644 D.1 Simple Marking 646 D.1.1 Constant Bit-Rate Sources 648 In the first case, flows emitted 40 byte long packets every 20 ms, 649 producing a constant 16 kbit/s load. The 1 Mbit/s capacity assigned 650 to this traffic can thus carry 62.5 flows. From the table in appendix 651 A, we can see that 4 calls should be reserved in addition to the 652 62.5. After an initial transient of 5 minutes, we simulated 2.5 653 hours. 655 During the 2.5 hour simulation time, utilization was measured over 656 5-minute intervals. Utilization was also measured in 20ms slots and 657 the percentage of slots in which it was above 1.064 Mbit/s (66.5 658 calls) was counted. 660 min/avg/max of the utilization was: 881 / 899 / 914 kbit/s 661 min/avg/max of the violation ratio was: 98.96% / 99.78% / 100% 663 D.1.2 On/Off Sources 665 In the second simulation case, on/off sources were used. During an 666 "off" period, no packets were generated, while in the "on" state the 667 behavior is the same as in the previous case: 40 byte long packets 20 668 ms apart. The distributions of the on and off periods were both drawn 669 from a pareto distribution with the shape parameter of 1.1 and mean 670 of 5 seconds. The average bit-rate of the sources is thus 8 kbit/s. 671 The flow arrival rate has been doubled to produce50% blocking, 672 since the link is capable of carrying nearly twice the number of 673 flows. The same set of measurements was carried out as in the 674 previous case. 676 min/avg/max of the utilization was: 808 / 819 / 837 kbit/s 677 min/avg/max of the violation ratio was: 98.98% / 99.40% / 99.70% 679 It can be seen that although the measurement-based approach was not able 680 to prevent the over-use of the real-time resources in this high load 681 case, it is a viable alternative. In no case did the 20 ms measurements 682 exceed 1.15 Mbit/s, so the over-use just means a temporary steal from 683 the resources provisioned to the lower priority traffic. 685 D.1.3 The Router Algorithm 687 The mbac algorithm used by the router is presented here only for the 688 completeness of the simulation description. The marking strategy was 689 the same for both types of traffic. The router counts the number of 690 bytes transmitted in every 20 ms interval and calculates the average 691 bit rate in these 20 ms slots. Then it smoothes these values in time 692 through an exponentially weighted moving average (ewma) filter. The 693 window size of the ewma was set to 9 seconds, i.e., running a unit 694 step function through it, the output will be 0.63 after 9 seconds. 695 The algorithm also calculated the histogram of the difference between 696 the original slot values and the filtered values. The histogram has 697 been counted in 1000 bins between the range of -1 and +1 Mbit/s. The 698 99% quantile of the histogram was calculated every 100 seconds. The 699 router marks all passing packets if the sum of the output of the ewma 700 filter and the calculated quantile is greater than 1 Mbit/s. The 701 router makes no correction to its measurements when a new flow is a 703 Thus, the target violation probability was set to 1%, which was in 704 fact fulfilled in the long run. 706 On arrival of a new packet, only counters are incremented. Every 20 707 ms a new value for the ewma must be calculated, a marking decision 708 must be made for the next 20 ms and the value of one bin in the 709 histogram must be increased. Every 100 seconds, the 99% quantile 710 value must be looked up in the histogram and the histogram must be 711 initialized. 713 The interested reader can read more about the design rationale of the 714 above algorithm in [Gross99]. 716 D.2 Unit-Based Reservations 718 In this section we demonstrate the unit-based reservation scheme. The 719 routers use the simple algorithm in Appendix B, except that it never 720 marks regular packets. The simulation setup is otherwise the same as 721 in the previous section. The traffic inside the flows does not affect 722 the admission algorithm, so during simulation, sources send only 723 probe and refresh packets. The definition of the unit is a peak bit- 724 rate of 16 kbit/s. The flow number threshold was set to 62 flows 725 resulting in close to the same target utilization of 1Mbits/s as in 726 appendix C.1. The length of the refresh period was changed between 727 100 ms and 10 seconds. The actual number of flows on the link never 728 exceeded 62 (no violation), so only the utilization values are shown 729 in kbit/s. 731 | interval | min | avg | max | 732 +----------+-----+-----+-----+ 733 | -- | 968 | 972 | 976 | 734 | 100 ms | 952 | 954 | 959 | 735 | 1 sec. | 941 | 946 | 949 | 736 | 2 sec. | 927 | 933 | 936 | 737 | 4 sec. | 908 | 913 | 920 | 738 | 7 sec. | 861 | 870 | 879 | 739 | 10 sec. | 827 | 837 | 852 | 741 The first line shows the utilization value for the case when the source 742 limits itself to 62 flows, i.e., blocking is not done by the network, 743 but by the source. This emulates the case when the refresh period is 744 infinitely short or when a state approach is used, as in RSVP. The 745 utilization is not 100% due to the burstiness of the arrivals. 747 It can be seen that as the refresh packets becomes less frequent, more 748 resources are wasted, as the resources allocated to departing flows 749 remain allocated until the end of the next refresh period. The result is 750 not only lower average utilization, but lower maximal utilization as 751 well. When the refresh period is 10 seconds long, the highest 752 utilization experienced was 952 kbit/sec, which is 3 units below the 753 limit. 755 This motivates the use of as short a refresh period as possible. 756 However, too short a refresh period will increase the effects of clock 757 differences between edge and core devices (which was not taken into 758 account during simulation). It also decreases the chance of finding a 759 packet to mark as refresh if the flow is currently transmitting below 760 its reserved rate. 762 Appendix E: Marking using ECN bits 764 If the ECN bits were to be used for load control marking, the values are 765 encoded in the two unused bits as described below, and the DS field 766 contains the PHB. 768 DS byte Load Control 769 01234567 codepoint (in ECN) 770 ----------------------------- 771 xxxxxx00 Ordinary 772 xxxxxx01 Probe 773 xxxxxx10 Marked 774 xxxxxx11 Refresh 776 The interpretation of the two unused bits remains 777 unspecified for other PHBs that do not support Load 778 Control. This is done so as not to interfere with 779 possible ECN deployment [RFC2481]. 781 Table of Contents 783 1 Abstract ........................................................ 2 784 2 Background and Motivation ....................................... 3 785 3 Overview ........................................................ 3 786 4 Operation of Load Control ....................................... 4 787 4.1 Simple Marking ................................................ 5 788 4.2 Unit-based Reservations ....................................... 6 789 4.3 Multiple Unit reservation ..................................... 8 790 4.4 Codepoints for Flow Types ..................................... 8 791 5 Objects for Standardization ..................................... 8 792 5.1 Packet Types .................................................. 9 793 5.2 Coding of Packet Types ........................................ 9 794 5.3 Behavior Description .......................................... 10 795 5.3.1 Behavior of the Core Routers ................................ 10 796 5.3.2 Behavior of the Edge Devices ................................ 10 797 6 Interworking with RSVP/Intserv .................................. 11 798 7 Security Considerations ......................................... 12 799 8 Identification of Edge Nodes .................................... 12 800 9 Multicast-related Issues ........................................ 12 801 Appendix A. Admission Precision of Simple Marking ................ 13 802 Appendix B. Effect of Delays on Admission ........................ 14 803 Appendix C. A Simple Algorithm for Core Routers .................. 15 804 Appendix D. Simulation Results ................................... 16 805 D.1 Simple Marking ............................................... 16 806 D.1.1 Constant Bit-Rate Sources .................................. 16 807 D.1.2 On/Off Sources ............................................. 17 808 D.1.3 The Router Algorithm ....................................... 17 809 D.2 Unit-Based Reservations ...................................... 18 810 Appendix E: Marking using ECN bits ............................... 19 812 Authors' Addresses 814 Lars Westberg 815 Ericsson Research 816 Kistagangen 26 817 SE-164 80 Stockholm 818 Sweden 819 EMail: Lars.Westberg@era-t.ericsson.se 821 Zoltan R. Turanyi 822 Ericcson Telecommunications 823 Budapest, Laborc u. 1 824 H-1037 825 Hungary 826 EMail: Zoltan.Turanyi@ericsson.com 828 David Partain 829 Ericsson Radio Systems AB 830 P.O. Box 1248 831 SE-581 12 Linkoping 832 Sweden 833 EMail: David.Partain@ericsson.com