idnits 2.17.1 draft-stein-pwe3-congcons-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 99: '...collapse the PWs MUST behave in a fash...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 15, 2012) is 4297 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PWE3 YJ. Stein 3 Internet-Draft RAD Data Communications 4 Intended status: Informational D. Black 5 Expires: January 16, 2013 EMC Corporation 6 B. Briscoe 7 BT 8 July 15, 2012 10 PW Congestion Considerations 11 draft-stein-pwe3-congcons-01 13 Abstract 15 Pseudowires (PWs) have become a common mechanism for tunneling 16 traffic, and may be found competing for network resources both with 17 other PWs and with non-PW traffic, such as TCP/IP flows. It is thus 18 worthwhile specifying under what conditions such competition is safe, 19 i.e., the PW traffic does not significantly harm other traffic or 20 contribute more than it should to congestion. We conclude that PWs 21 transporting responsive traffic behave as desired without the need 22 for additional mechanisms. For inelastic PWs (such as TDM PWs) we 23 derive a bound under which such PWs consume no more network capacity 24 than a TCP flow. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 16, 2013. 43 Copyright Notice 45 Copyright (c) 2012 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. PWs Comprising Elastic Flows . . . . . . . . . . . . . . . . . 4 62 3. PWs Comprising Inelastic Flows . . . . . . . . . . . . . . . . 5 63 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 64 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 65 6. Informative References . . . . . . . . . . . . . . . . . . . . 10 66 Appendix A. Loss Probabilities for TDM PWs . . . . . . . . . . . 11 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 69 1. Introduction 71 A pseudowire (PW) is a construct for tunneling a native service over 72 a Packet Switched Network (PSN)(see [RFC3985]), such as IPv4, IPv6, 73 or MPLS. The PW packet encapsulates a unit of native service 74 information by prepending the headers required for transport in the 75 particular PSN (which must include a demultiplexer field to 76 distinguish the different PWs) and preferably the 4 byte PWE3 control 77 word. PWs have no bandwidth reservation mechanism, meaning that when 78 multiple PWs are transported in parallel there is no defined means 79 for guaranteeing network resources for any particular PW. This 80 competition for resources may translate to a particular PW not being 81 able to deliver the QoS required to emulate the native service. For 82 example, MPLS-TE enables achieving a particular desired allocation of 83 resources between multiple LSPs; however, when multiple Ethernet PWs 84 are placed in a single MPLS tunnel, there is no way to similarly 85 divide resources amongst them (although DiffServ QoS prioritization 86 may be available for PWs). The use of PWs in service provider MPLS 87 networks is well understood and will not be discussed further here. 89 While PWs are most often placed in MPLS tunnels, there are several 90 mechanisms that enable transporting PWs over an IP infrastructure. 91 These include: 92 TDM PWs ([RFC4553][RFC5086][RFC5087]) that define UDP/IP 93 encapsulations, 94 L2TPv3 PWs, 95 MPLS PWs directly over IP according to RFC 4023 [RFC4023], 96 MPLS PWs over GRE over IP according to RFC 4023 [RFC4023]. 97 Whenever PWs are transported over IP, they may compete with 98 congestion-responsive flows (e.g., TCP flows). Hence in order to 99 prevent congestion collapse the PWs MUST behave in a fashion that 100 does not cause undue damage to the throughput of such congestion- 101 responsive flows [RFC2914]. 103 At first glance one may think that this would require a PW 104 transported over IP to be considered as a single flow, on a par with 105 a single TCP flow. Were we to accept this tenet, we would require a 106 PW to back off under congestion to consume no more bandwidth than a 107 single TCP flow under such conditions (see [RFC5348]). However, 108 since PWs may carry traffic from many users, it makes more sense to 109 consider each PW to be equivalent to multiple TCP flows. We will 110 discuss whether PWs consisting of elastic flows need a back-off 111 strategy in Section 2. 113 TDM PWs ([RFC4553][RFC5086][RFC5087]) represent inelastic constant 114 bit-rate (CBR) flows that may require lower or higher throughput than 115 that consumed by an otherwise-unconstrained TCP flow would under the 116 same network conditions. In any case a TDM PW is not able to respond 117 to congestion in a TCP-like manner; on the other hand, the total 118 bandwidth they consume remains constant and does not increase to 119 consume additional bandwidth as TCP rates back off. If the bandwidth 120 consumed by a TDM PW is considered detrimental, the only available 121 remedy is to completely shut down the PW. Such a shutdown would 122 impact multiple users, and the service restoration time would in 123 general be lengthy. We will discuss when the shut down of inelastic 124 PWs can be avoided in Section 3. 126 2. PWs Comprising Elastic Flows 128 In this section we consider Ethernet PWs that primarily carry 129 congestion-responsive traffic. We will show that we automatically 130 obtain the desired congestion avoidance behavior, and that additional 131 mechanisms are not needed. 133 Let us assume that an Ethernet PW aggregating several TCP flows is 134 flowing alongside several TCP/IP flows. Each Ethernet PW packet 135 carries a single Ethernet frame that carries a single IP packet that 136 carries a single TCP segment. Thus, if congestion is signaled by an 137 intermediate router dropping a packet, a single end-user TCP/IP 138 packet is dropped, whether or not that packet is encapsulated in the 139 PW. 141 The result is that the individual TCP flows inside the PW experience 142 the same drop probability as the non-PW TCP flows. Thus the behavior 143 of a TCP sender (retransmitting the packet and appropriately reducing 144 its sending rate) is the same for flows directly over IP and for 145 flows inside the PW. In other words, individual TCP flows are 146 neither rewarded nor penalized for being carried over the PW. On the 147 other hand, the PW does not behave as a single TCP flow; it will 148 consume the aggregated bandwidth of its component flows, and backs 149 off much less sharply than a single flow would. 151 We claim that this is precisely the desired behavior. Any fairness 152 considerations should be applied to the individual TCP flows, and not 153 to the aggregate. Were individual TCP flows rewarded for being 154 carried over a PW, this would create an incentive to create PWs for 155 no operational reason. Were individual flows penalized, there would 156 be a deterrence that could impede pseudowire deployment. 158 There have been proposals to add additional TCP-friendly mechanisms 159 to PWs, for example by carrying PWs over DCCP. In light of the above 160 arguments, it is clear that this would force the PW to behave as a 161 single flow, rather than N flows, and penalize the constituent TCP 162 flows. In addition, the individual TCP flows would still back off 163 due to their end points being oblivious to the fact that they are 164 carried over a PW. This will further degrade the flow's throughput 165 as compared to a non-PW-encapsulated flow. Thus, such additional 166 mechanisms contradict the behavior previously described as desirable. 168 3. PWs Comprising Inelastic Flows 170 TDM PWs ([RFC4553][RFC5086][RFC5087]) are more problematic than the 171 elastic PWs of the previous section. Being constant bit-rate (CBR), 172 they can not be made responsive to congestion. On the other hand, 173 being CBR, they also do not attempt to capture additional bandwidth 174 when TCP flows back off. 176 Since a TDM PW continuously consumes a constant amount of bandwidth, 177 if the bandwidth occupied by a TDM PW endangers the network as a 178 whole, the only recourse is to shut it down, denying service to all 179 customers of the TDM native service. We should mention in passing 180 that under certain conditions it may be possible to reduce the 181 bandwidth consumption of a TDM PW. A prevalent case is that of a TDM 182 native service that carries voice channels that may not all be 183 active. Using the AAL2 mode of [RFC5087] (perhaps along with 184 connection admission control) can enable bandwidth adaptation, at the 185 expense of more sophisticated native service processing (NSP). 187 In the following we will show that for many cases of interest a TDM 188 PW, treated as a single flow, will behave in a reasonable manner 189 without any additional mechanisms. We will focus on structure- 190 agnostic TDM PWs [RFC4553] although our analysis can be readily 191 applied to structure-aware PWs (see Appendix A). 193 There are two network parameters relevant to our discussion, namely 194 the one-way delay D and the loss probability p. The one-way delay of 195 a native TDM service consists of the physical time-of-flight plus 125 196 microseconds for each TDM switch traversed. This is very small as 197 compared to PSN network-crossing latencies. Many protocols and 198 applications running over TDM circuits thus require low delay, and we 199 need thus only consider delays of up to about 32 milliseconds. 201 The TDM PW RFCs specify the egress behavior upon experiencing packet 202 loss. Structure-agnostic transport has no alternative to outputting 203 an "all-ones" AIS pattern towards the TDM circuit, which if long 204 enough in duration is recognized by the receiving TDM device as a 205 fault indication (see Appendix A). International standards place 206 stringent limits on the number of such faults tolerated. 207 Calculations presented in the appendix show that only loss 208 probabilities in the realm of fractions of a percent are relevant for 209 structure-agnostic transport (see Appendix A). 211 Structure-aware transport regenerates frame alignment signals thus 212 hiding AIS indications resulting from infrequent packet loss. 213 Furthermore, for TDM circuits carrying voice channels the use of 214 packet loss concealment algorithms is possible (such algorithms have 215 been previously described for TDM PWs). However, even structure- 216 aware transport ceases to provide a useful service at about 2 percent 217 loss probability. 219 RFC 5348 on TCP Friendly Rate Control (TFRC) [RFC5348] provides the 220 following simplified formula for throughput that is used as the basis 221 for TFRC's sending rate control. 223 S 224 X_Bps = ------------------------------------------------ 225 R ( sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2) ) 227 where 228 X_Bps is average sending rate in Bytes per second, 229 S is the segment (packet payload) size in Bytes, 230 R is the round-trip time in seconds, 231 p is the loss probability. 233 We can use this formula to determine when a TDM PW consumes no more 234 bandwidth than a TCP flow between the same endpoints would consume 235 under the same conditions. Replacing the round-trip delay with twice 236 the one-way delay D, setting the bandwidth to that of the TDM service 237 BW, and the segment size to be the TDM fragment TDM plus 4 Bytes to 238 account for the PWE3 control word, we obtain the following condition 239 for a TDM PW. 241 (TDM + 4) 242 D < --------------- 243 BW f(p) / 4 245 where 246 D is the one-way delay, 247 TDM is the TDM segment size in Bytes, 248 BW is TDM service bandwidth in bits per second, 249 f(p) = sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2). 251 One may view this condition as defining a safe operating envelope for 252 a TDM PW, as a TDM PW that consumes no more bandwidth than a TCP flow 253 would not affect congestion more than were it to be TCP traffic. 254 Under this condition it should hence be safe to mix the TDM PW with 255 congestion-responsive traffic such as TCP, without causing 256 significant additional congestion problems. Were the TDM PW to 257 consume significantly more bandwidth a TCP flow, it could contribute 258 disproportionately to congestion, and its mixture with congestion- 259 responsive traffic may be inappropriate. 261 We derived the condition assuming steady-state conditions, and thus 262 two caveats are in order. First, the condition does not specify how 263 to treat a TDM PW that initially satisfies the condition, but is then 264 faced with a deteriorating network environment. In such cases one 265 additionally needs to analyze the reaction times of the responsive 266 flows to congestion events. Second, the derivation assumed that the 267 TDM PW was competing with long-lived TDM flows, because under this 268 assumption it was straightforward to obtain a quantitative comparison 269 with something widely considered to offer a safe response to 270 congestion. Short-lived TCP flows may find themselves disadvantaged 271 as compared to a long-lived TDM PW satisfying the condition. These 272 dynamic cases will be considered in future versions of this draft. 274 The results are displayed in the accompanying figures (available only 275 in the PDF version of this document). TCP compatible behavior is 276 obtained for the area under curves appropriate for each TDM fragment 277 size. 279 -------------------------------------------------------------------- 280 I I 281 I I 282 I I 283 I I 284 I E1 compatibility regions I 285 I I 286 I I 287 I I 288 I I 289 I (only in PDF version) I 290 I I 291 I I 292 I I 293 I I 294 I I 295 -------------------------------------------------------------------- 297 Figure 1 TCP Compatibility areas for E1 SAToP 298 -------------------------------------------------------------------- 299 I I 300 I I 301 I I 302 I I 303 I E3 compatibility regions I 304 I I 305 I I 306 I I 307 I I 308 I (only in PDF version) I 309 I I 310 I I 311 I I 312 I I 313 I I 314 -------------------------------------------------------------------- 316 Figure 2 TCP Compatibility areas for E3 SAToP 317 We see in Figure 1 that a TDM PW carrying an E1 native service (2.048 318 Mbps) satisfies the condition for all parameters of interest if each 319 packet carries at least S=512 Bytes of TDM data. For the SAToP 320 default of 256 Bytes, as long as the one-way delay is less than 10 321 milliseconds, the loss probability can exceed 0.3 percent. For 322 packets containing 128 or 64 Bytes the constraints are more 323 troublesome, but there are still parameter ranges where the TDM PW 324 consumes less than a TCP flow under similar conditions. Similarly, 325 Figure 2 demonstrates that an E3 native service (34.368 Mbps) with 326 the SAToP default of 1024 Bytes of TDM per packet satisfies the 327 condition for delays up to about 5 milliseconds. 329 Note that violating the condition for a short amount of time is not 330 sufficient justification for shutting down the TDM PW. While TCP 331 flows react within a round trip time, PW commissioning and 332 decommissioning are time consuming processes that should only be 333 undertaken when it becomes clear that the congestion is not 334 transient. Future versions of this draft will provide guidance as to 335 when a TDM PW should be terminated. 337 4. Security Considerations 339 This document does not introduce any new congestion-specific 340 mechanisms and thus does not introduce any new security 341 considerations above those present for PWs in general. 343 5. IANA Considerations 345 This document requires no IANA actions. 347 6. Informative References 349 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 350 RFC 2914, September 2000. 352 [RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to- 353 Edge (PWE3) Architecture", RFC 3985, March 2005. 355 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating 356 MPLS in IP or Generic Routing Encapsulation (GRE)", 357 RFC 4023, March 2005. 359 [RFC4553] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time 360 Division Multiplexing (TDM) over Packet (SAToP)", 361 RFC 4553, June 2006. 363 [RFC5086] Vainshtein, A., Sasson, I., Metz, E., Frost, T., and P. 364 Pate, "Structure-Aware Time Division Multiplexed (TDM) 365 Circuit Emulation Service over Packet Switched Network 366 (CESoPSN)", RFC 5086, December 2007. 368 [RFC5087] Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi, 369 "Time Division Multiplexing over IP (TDMoIP)", RFC 5087, 370 December 2007. 372 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 373 Friendly Rate Control (TFRC): Protocol Specification", 374 RFC 5348, September 2008. 376 [G775] International Telecommunications Union, "Loss of Signal 377 (LOS), Alarm Indication Signal (AIS) and Remote Defect 378 Indication (RDI) defect detection and clearance criteria 379 for PDH signals", ITU Recommendation G.775, October 1998. 381 [G826] International Telecommunications Union, "Error Performance 382 Parameters and Objectives for International Constant Bit 383 Rate Digital Paths at or above Primary Rate", 384 ITU Recommendation G.826, December 2002. 386 Appendix A. Loss Probabilities for TDM PWs 388 ITU-T Recommendation G.826 [G826] specifies limits on the Errored 389 Second Ratio (ESR) and the Severely Errored Second Ratio (SESR). For 390 our purposes, we will simplify the definitions and understand an 391 Errored Second (ES) to be a second of time during which a TDM bit 392 error occurred or a defect indication was detected. A Severely 393 Errored Second (SES) is an ES second during which the Bit Error Rate 394 (BER) exceeded one in one thousand (10^-3). Note that if the error 395 condition AIS was detected according to the criteria of ITU-T 396 Recommendation G.775 [G826] a SES was considered to have occurred. 397 The respective ratios are the fraction of ES or SES to the total 398 number of seconds in the measurement interval. 400 For both E1 and T1 TDM circuits, G.826 allows ESR of 4% (0.04), and 401 SESR of 1/5% (0.002). For E3 and T3 the ESR must be no more than 402 7.5% (0.075), while the SESR is unchanged. 404 Focusing on E1 circuits, the ESR of 4% translates, assuming the worst 405 case of isolated exactly periodic packet loss, to a packet loss event 406 no more than every 25 seconds. However, once a packet is lost, 407 another packet lost in the same second doesn't change the ESR, 408 although it may contribute to the ES becoming a SES. Assuming an 409 integer number of TDM frames per PW packet, the number of packets per 410 second is given by packets per second = 8000 / (frames per packet), 411 where prevalent cases are 1, 2, 4 and 8 frames per packet. Since for 412 these cases there will be 8000, 4000, 2000, and 1000 packets per 413 second, respectively, the maximum allowed packet loss probability is 414 0.0005%, 0.001%, 0.002%, and 0.004% respectively. 416 These extremely low allowed packet loss probabilities are only for 417 the worst case scenario. In reality, when packet loss is above 418 0.001%, it is likely that loss bursts will occur. If the lost 419 packets are sufficiently close together (we ignore the precise 420 details here) then the permitted packet loss rate increases by the 421 appropriate factor, without G.826 being cognizant of any change. 422 Hence the worst-case analysis is expected to be extremely pessimistic 423 for real networks. Next we will go to the opposite extreme and 424 assume that all packet loss events are in periodic loss bursts. In 425 order to minimize the ESR we will assume that the burst lasts no more 426 than one second, and so we can afford to lose no more than packet per 427 second packets in each burst. As long as such one-second bursts do 428 not exceed four percent of the time, we still maintain the allowable 429 ESR. Hence the maximum permissible packet loss rate is 4%. Of 430 course, this estimate is extremely optimistic, and furthermore does 431 not take into consideration the SESR criteria. 433 As previously explained, a SES is declared whenever AIS is detected. 435 There is a major difference between structure-aware and structure- 436 agnostic transport in this regards. When a packet is lost SAToP 437 outputs an "all-ones" pattern to the TDM circuit, which is 438 interpreted as AIS according to G.775 [G775]. For E1 circuits, G.775 439 specifies for AIS to be detected when four consecutive TDM frames 440 have no more than 2 alternations. This means that if a PW packet or 441 consecutive packets containing at least four frames are lost, and 442 four or more frames of "all-ones" output to the TDM circuit, a SES 443 will be declared. Thus burst packet loss, or packets containing a 444 large number of TDM frames, lead SAToP to cause high SESR, which is 445 20 times more restricted than ESR. On the other hand, since 446 structure-aware transport regenerates the correct frame alignment 447 pattern, even when the corresponding packet has been lost, packet 448 loss will not cause declaration of SES. This is the main reason that 449 SAToP is much more vulnerable to packet loss than the structure-aware 450 methods. 452 For realistic networks, the maximum allowed packet loss for SAToP 453 will be intermediate between the extremely pessimistic estimates and 454 the extremely optimistic ones. In order to numerically gauge the 455 situation, we have modeled the network as a four-state Markov model, 456 (corresponding to a successfully received packet, a packet received 457 within a loss burst, a packet lost within a burst, and a packet lost 458 when not within a burst). This model is an extension of the widely 459 used Gilbert model. We set the transition probabilities in order to 460 roughly correspond to anecdotal evidence, namely low background 461 isolated packet loss, and infrequent bursts wherein most packets are 462 lost. Such simulation shows that up to 0.5% average packet loss may 463 occur and the recovered TDM still conform to the G.826 ESR and SESR 464 criteria. 466 Authors' Addresses 468 Yaakov (Jonathan) Stein 469 RAD Data Communications 470 24 Raoul Wallenberg St., Bldg C 471 Tel Aviv 69719 472 ISRAEL 474 Phone: +972 (0)3 645-5389 475 Email: yaakov_s@rad.com 476 David L. Black 477 EMC Corporation 478 176 South St. 479 Hopkinton, MA 69719 480 USA 482 Phone: +1 (508) 293-7953 483 Email: david.black@emc.com 485 Bob Briscoe 486 BT 487 B54/77, Adastral Park 488 Martlesham Heath 489 Ipswich IP5 3RE 490 UK 492 Phone: +44 1473 645196 493 Email: bob.briscoe@bt.com 494 URI: http://bobbriscoe.net/