idnits 2.17.1 

draft-stein-pwe3-congcons-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 99: '...collapse the PWs MUST behave in a fash...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 15, 2012) is 4297 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	PWE3                                                           YJ. Stein
3	Internet-Draft                                   RAD Data Communications
4	Intended status: Informational                                  D. Black
5	Expires: January 16, 2013                                EMC Corporation
6	                                                              B. Briscoe
7	                                                                      BT
8	                                                           July 15, 2012

10	                      PW Congestion Considerations
11	                      draft-stein-pwe3-congcons-01

13	Abstract

15	   Pseudowires (PWs) have become a common mechanism for tunneling
16	   traffic, and may be found competing for network resources both with
17	   other PWs and with non-PW traffic, such as TCP/IP flows.  It is thus
18	   worthwhile specifying under what conditions such competition is safe,
19	   i.e., the PW traffic does not significantly harm other traffic or
20	   contribute more than it should to congestion.  We conclude that PWs
21	   transporting responsive traffic behave as desired without the need
22	   for additional mechanisms.  For inelastic PWs (such as TDM PWs) we
23	   derive a bound under which such PWs consume no more network capacity
24	   than a TCP flow.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on January 16, 2013.

43	Copyright Notice

45	   Copyright (c) 2012 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	   2.  PWs Comprising Elastic Flows . . . . . . . . . . . . . . . . .  4
62	   3.  PWs Comprising Inelastic Flows . . . . . . . . . . . . . . . .  5
63	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
64	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
65	   6.  Informative References . . . . . . . . . . . . . . . . . . . . 10
66	   Appendix A.  Loss Probabilities for TDM PWs  . . . . . . . . . . . 11
67	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12

69	1.  Introduction

71	   A pseudowire (PW) is a construct for tunneling a native service over
72	   a Packet Switched Network (PSN)(see [RFC3985]), such as IPv4, IPv6,
73	   or MPLS.  The PW packet encapsulates a unit of native service
74	   information by prepending the headers required for transport in the
75	   particular PSN (which must include a demultiplexer field to
76	   distinguish the different PWs) and preferably the 4 byte PWE3 control
77	   word.  PWs have no bandwidth reservation mechanism, meaning that when
78	   multiple PWs are transported in parallel there is no defined means
79	   for guaranteeing network resources for any particular PW.  This
80	   competition for resources may translate to a particular PW not being
81	   able to deliver the QoS required to emulate the native service.  For
82	   example, MPLS-TE enables achieving a particular desired allocation of
83	   resources between multiple LSPs; however, when multiple Ethernet PWs
84	   are placed in a single MPLS tunnel, there is no way to similarly
85	   divide resources amongst them (although DiffServ QoS prioritization
86	   may be available for PWs).  The use of PWs in service provider MPLS
87	   networks is well understood and will not be discussed further here.

89	   While PWs are most often placed in MPLS tunnels, there are several
90	   mechanisms that enable transporting PWs over an IP infrastructure.
91	   These include:
92	      TDM PWs ([RFC4553][RFC5086][RFC5087]) that define UDP/IP
93	      encapsulations,
94	      L2TPv3 PWs,
95	      MPLS PWs directly over IP according to RFC 4023 [RFC4023],
96	      MPLS PWs over GRE over IP according to RFC 4023 [RFC4023].
97	   Whenever PWs are transported over IP, they may compete with
98	   congestion-responsive flows (e.g., TCP flows).  Hence in order to
99	   prevent congestion collapse the PWs MUST behave in a fashion that
100	   does not cause undue damage to the throughput of such congestion-
101	   responsive flows [RFC2914].

103	   At first glance one may think that this would require a PW
104	   transported over IP to be considered as a single flow, on a par with
105	   a single TCP flow.  Were we to accept this tenet, we would require a
106	   PW to back off under congestion to consume no more bandwidth than a
107	   single TCP flow under such conditions (see [RFC5348]).  However,
108	   since PWs may carry traffic from many users, it makes more sense to
109	   consider each PW to be equivalent to multiple TCP flows.  We will
110	   discuss whether PWs consisting of elastic flows need a back-off
111	   strategy in Section 2.

113	   TDM PWs ([RFC4553][RFC5086][RFC5087]) represent inelastic constant
114	   bit-rate (CBR) flows that may require lower or higher throughput than
115	   that consumed by an otherwise-unconstrained TCP flow would under the
116	   same network conditions.  In any case a TDM PW is not able to respond
117	   to congestion in a TCP-like manner; on the other hand, the total
118	   bandwidth they consume remains constant and does not increase to
119	   consume additional bandwidth as TCP rates back off.  If the bandwidth
120	   consumed by a TDM PW is considered detrimental, the only available
121	   remedy is to completely shut down the PW.  Such a shutdown would
122	   impact multiple users, and the service restoration time would in
123	   general be lengthy.  We will discuss when the shut down of inelastic
124	   PWs can be avoided in Section 3.

126	2.  PWs Comprising Elastic Flows

128	   In this section we consider Ethernet PWs that primarily carry
129	   congestion-responsive traffic.  We will show that we automatically
130	   obtain the desired congestion avoidance behavior, and that additional
131	   mechanisms are not needed.

133	   Let us assume that an Ethernet PW aggregating several TCP flows is
134	   flowing alongside several TCP/IP flows.  Each Ethernet PW packet
135	   carries a single Ethernet frame that carries a single IP packet that
136	   carries a single TCP segment.  Thus, if congestion is signaled by an
137	   intermediate router dropping a packet, a single end-user TCP/IP
138	   packet is dropped, whether or not that packet is encapsulated in the
139	   PW.

141	   The result is that the individual TCP flows inside the PW experience
142	   the same drop probability as the non-PW TCP flows.  Thus the behavior
143	   of a TCP sender (retransmitting the packet and appropriately reducing
144	   its sending rate) is the same for flows directly over IP and for
145	   flows inside the PW.  In other words, individual TCP flows are
146	   neither rewarded nor penalized for being carried over the PW.  On the
147	   other hand, the PW does not behave as a single TCP flow; it will
148	   consume the aggregated bandwidth of its component flows, and backs
149	   off much less sharply than a single flow would.

151	   We claim that this is precisely the desired behavior.  Any fairness
152	   considerations should be applied to the individual TCP flows, and not
153	   to the aggregate.  Were individual TCP flows rewarded for being
154	   carried over a PW, this would create an incentive to create PWs for
155	   no operational reason.  Were individual flows penalized, there would
156	   be a deterrence that could impede pseudowire deployment.

158	   There have been proposals to add additional TCP-friendly mechanisms
159	   to PWs, for example by carrying PWs over DCCP.  In light of the above
160	   arguments, it is clear that this would force the PW to behave as a
161	   single flow, rather than N flows, and penalize the constituent TCP
162	   flows.  In addition, the individual TCP flows would still back off
163	   due to their end points being oblivious to the fact that they are
164	   carried over a PW.  This will further degrade the flow's throughput
165	   as compared to a non-PW-encapsulated flow.  Thus, such additional
166	   mechanisms contradict the behavior previously described as desirable.

168	3.  PWs Comprising Inelastic Flows

170	   TDM PWs ([RFC4553][RFC5086][RFC5087]) are more problematic than the
171	   elastic PWs of the previous section.  Being constant bit-rate (CBR),
172	   they can not be made responsive to congestion.  On the other hand,
173	   being CBR, they also do not attempt to capture additional bandwidth
174	   when TCP flows back off.

176	   Since a TDM PW continuously consumes a constant amount of bandwidth,
177	   if the bandwidth occupied by a TDM PW endangers the network as a
178	   whole, the only recourse is to shut it down, denying service to all
179	   customers of the TDM native service.  We should mention in passing
180	   that under certain conditions it may be possible to reduce the
181	   bandwidth consumption of a TDM PW.  A prevalent case is that of a TDM
182	   native service that carries voice channels that may not all be
183	   active.  Using the AAL2 mode of [RFC5087] (perhaps along with
184	   connection admission control) can enable bandwidth adaptation, at the
185	   expense of more sophisticated native service processing (NSP).

187	   In the following we will show that for many cases of interest a TDM
188	   PW, treated as a single flow, will behave in a reasonable manner
189	   without any additional mechanisms.  We will focus on structure-
190	   agnostic TDM PWs [RFC4553] although our analysis can be readily
191	   applied to structure-aware PWs (see Appendix A).

193	   There are two network parameters relevant to our discussion, namely
194	   the one-way delay D and the loss probability p.  The one-way delay of
195	   a native TDM service consists of the physical time-of-flight plus 125
196	   microseconds for each TDM switch traversed.  This is very small as
197	   compared to PSN network-crossing latencies.  Many protocols and
198	   applications running over TDM circuits thus require low delay, and we
199	   need thus only consider delays of up to about 32 milliseconds.

201	   The TDM PW RFCs specify the egress behavior upon experiencing packet
202	   loss.  Structure-agnostic transport has no alternative to outputting
203	   an "all-ones" AIS pattern towards the TDM circuit, which if long
204	   enough in duration is recognized by the receiving TDM device as a
205	   fault indication (see Appendix A).  International standards place
206	   stringent limits on the number of such faults tolerated.
207	   Calculations presented in the appendix show that only loss
208	   probabilities in the realm of fractions of a percent are relevant for
209	   structure-agnostic transport (see Appendix A).

211	   Structure-aware transport regenerates frame alignment signals thus
212	   hiding AIS indications resulting from infrequent packet loss.
213	   Furthermore, for TDM circuits carrying voice channels the use of
214	   packet loss concealment algorithms is possible (such algorithms have
215	   been previously described for TDM PWs).  However, even structure-
216	   aware transport ceases to provide a useful service at about 2 percent
217	   loss probability.

219	   RFC 5348 on TCP Friendly Rate Control (TFRC) [RFC5348] provides the
220	   following simplified formula for throughput that is used as the basis
221	   for TFRC's sending rate control.

223	                                    S
224	       X_Bps = ------------------------------------------------
225	                 R  ( sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2) )

227	   where
228	      X_Bps is average sending rate in Bytes per second,
229	      S is the segment (packet payload) size in Bytes,
230	      R is the round-trip time in seconds,
231	      p is the loss probability.

233	   We can use this formula to determine when a TDM PW consumes no more
234	   bandwidth than a TCP flow between the same endpoints would consume
235	   under the same conditions.  Replacing the round-trip delay with twice
236	   the one-way delay D, setting the bandwidth to that of the TDM service
237	   BW, and the segment size to be the TDM fragment TDM plus 4 Bytes to
238	   account for the PWE3 control word, we obtain the following condition
239	   for a TDM PW.

241	              (TDM + 4)
242	       D < ---------------
243	             BW f(p) / 4

245	   where
246	      D is the one-way delay,
247	      TDM is the TDM segment size in Bytes,
248	      BW is TDM service bandwidth in bits per second,
249	      f(p) = sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2).

251	   One may view this condition as defining a safe operating envelope for
252	   a TDM PW, as a TDM PW that consumes no more bandwidth than a TCP flow
253	   would not affect congestion more than were it to be TCP traffic.
254	   Under this condition it should hence be safe to mix the TDM PW with
255	   congestion-responsive traffic such as TCP, without causing
256	   significant additional congestion problems.  Were the TDM PW to
257	   consume significantly more bandwidth a TCP flow, it could contribute
258	   disproportionately to congestion, and its mixture with congestion-
259	   responsive traffic may be inappropriate.

261	   We derived the condition assuming steady-state conditions, and thus
262	   two caveats are in order.  First, the condition does not specify how
263	   to treat a TDM PW that initially satisfies the condition, but is then
264	   faced with a deteriorating network environment.  In such cases one
265	   additionally needs to analyze the reaction times of the responsive
266	   flows to congestion events.  Second, the derivation assumed that the
267	   TDM PW was competing with long-lived TDM flows, because under this
268	   assumption it was straightforward to obtain a quantitative comparison
269	   with something widely considered to offer a safe response to
270	   congestion.  Short-lived TCP flows may find themselves disadvantaged
271	   as compared to a long-lived TDM PW satisfying the condition.  These
272	   dynamic cases will be considered in future versions of this draft.

274	   The results are displayed in the accompanying figures (available only
275	   in the PDF version of this document).  TCP compatible behavior is
276	   obtained for the area under curves appropriate for each TDM fragment
277	   size.

279	   --------------------------------------------------------------------
280	   I                                                                  I
281	   I                                                                  I
282	   I                                                                  I
283	   I                                                                  I
284	   I                    E1 compatibility regions                      I
285	   I                                                                  I
286	   I                                                                  I
287	   I                                                                  I
288	   I                                                                  I
289	   I                     (only in PDF version)                        I
290	   I                                                                  I
291	   I                                                                  I
292	   I                                                                  I
293	   I                                                                  I
294	   I                                                                  I
295	   --------------------------------------------------------------------

297	   Figure 1 TCP Compatibility areas for E1 SAToP
298	   --------------------------------------------------------------------
299	   I                                                                  I
300	   I                                                                  I
301	   I                                                                  I
302	   I                                                                  I
303	   I                    E3 compatibility regions                      I
304	   I                                                                  I
305	   I                                                                  I
306	   I                                                                  I
307	   I                                                                  I
308	   I                     (only in PDF version)                        I
309	   I                                                                  I
310	   I                                                                  I
311	   I                                                                  I
312	   I                                                                  I
313	   I                                                                  I
314	   --------------------------------------------------------------------

316	   Figure 2 TCP Compatibility areas for E3 SAToP
317	   We see in Figure 1 that a TDM PW carrying an E1 native service (2.048
318	   Mbps) satisfies the condition for all parameters of interest if each
319	   packet carries at least S=512 Bytes of TDM data.  For the SAToP
320	   default of 256 Bytes, as long as the one-way delay is less than 10
321	   milliseconds, the loss probability can exceed 0.3 percent.  For
322	   packets containing 128 or 64 Bytes the constraints are more
323	   troublesome, but there are still parameter ranges where the TDM PW
324	   consumes less than a TCP flow under similar conditions.  Similarly,
325	   Figure 2 demonstrates that an E3 native service (34.368 Mbps) with
326	   the SAToP default of 1024 Bytes of TDM per packet satisfies the
327	   condition for delays up to about 5 milliseconds.

329	   Note that violating the condition for a short amount of time is not
330	   sufficient justification for shutting down the TDM PW.  While TCP
331	   flows react within a round trip time, PW commissioning and
332	   decommissioning are time consuming processes that should only be
333	   undertaken when it becomes clear that the congestion is not
334	   transient.  Future versions of this draft will provide guidance as to
335	   when a TDM PW should be terminated.

337	4.  Security Considerations

339	   This document does not introduce any new congestion-specific
340	   mechanisms and thus does not introduce any new security
341	   considerations above those present for PWs in general.

343	5.  IANA Considerations

345	   This document requires no IANA actions.

347	6.  Informative References

349	   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
350	              RFC 2914, September 2000.

352	   [RFC3985]  Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to-
353	              Edge (PWE3) Architecture", RFC 3985, March 2005.

355	   [RFC4023]  Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating
356	              MPLS in IP or Generic Routing Encapsulation (GRE)",
357	              RFC 4023, March 2005.

359	   [RFC4553]  Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time
360	              Division Multiplexing (TDM) over Packet (SAToP)",
361	              RFC 4553, June 2006.

363	   [RFC5086]  Vainshtein, A., Sasson, I., Metz, E., Frost, T., and P.
364	              Pate, "Structure-Aware Time Division Multiplexed (TDM)
365	              Circuit Emulation Service over Packet Switched Network
366	              (CESoPSN)", RFC 5086, December 2007.

368	   [RFC5087]  Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi,
369	              "Time Division Multiplexing over IP (TDMoIP)", RFC 5087,
370	              December 2007.

372	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
373	              Friendly Rate Control (TFRC): Protocol Specification",
374	              RFC 5348, September 2008.

376	   [G775]     International Telecommunications Union, "Loss of Signal
377	              (LOS), Alarm Indication Signal (AIS) and Remote Defect
378	              Indication (RDI) defect detection and clearance criteria
379	              for PDH signals", ITU Recommendation G.775, October 1998.

381	   [G826]     International Telecommunications Union, "Error Performance
382	              Parameters and Objectives for International Constant Bit
383	              Rate Digital Paths at or above Primary Rate",
384	              ITU Recommendation G.826, December 2002.

386	Appendix A.  Loss Probabilities for TDM PWs

388	   ITU-T Recommendation G.826 [G826] specifies limits on the Errored
389	   Second Ratio (ESR) and the Severely Errored Second Ratio (SESR).  For
390	   our purposes, we will simplify the definitions and understand an
391	   Errored Second (ES) to be a second of time during which a TDM bit
392	   error occurred or a defect indication was detected.  A Severely
393	   Errored Second (SES) is an ES second during which the Bit Error Rate
394	   (BER) exceeded one in one thousand (10^-3).  Note that if the error
395	   condition AIS was detected according to the criteria of ITU-T
396	   Recommendation G.775 [G826] a SES was considered to have occurred.
397	   The respective ratios are the fraction of ES or SES to the total
398	   number of seconds in the measurement interval.

400	   For both E1 and T1 TDM circuits, G.826 allows ESR of 4% (0.04), and
401	   SESR of 1/5% (0.002).  For E3 and T3 the ESR must be no more than
402	   7.5% (0.075), while the SESR is unchanged.

404	   Focusing on E1 circuits, the ESR of 4% translates, assuming the worst
405	   case of isolated exactly periodic packet loss, to a packet loss event
406	   no more than every 25 seconds.  However, once a packet is lost,
407	   another packet lost in the same second doesn't change the ESR,
408	   although it may contribute to the ES becoming a SES.  Assuming an
409	   integer number of TDM frames per PW packet, the number of packets per
410	   second is given by packets per second = 8000 / (frames per packet),
411	   where prevalent cases are 1, 2, 4 and 8 frames per packet.  Since for
412	   these cases there will be 8000, 4000, 2000, and 1000 packets per
413	   second, respectively, the maximum allowed packet loss probability is
414	   0.0005%, 0.001%, 0.002%, and 0.004% respectively.

416	   These extremely low allowed packet loss probabilities are only for
417	   the worst case scenario.  In reality, when packet loss is above
418	   0.001%, it is likely that loss bursts will occur.  If the lost
419	   packets are sufficiently close together (we ignore the precise
420	   details here) then the permitted packet loss rate increases by the
421	   appropriate factor, without G.826 being cognizant of any change.
422	   Hence the worst-case analysis is expected to be extremely pessimistic
423	   for real networks.  Next we will go to the opposite extreme and
424	   assume that all packet loss events are in periodic loss bursts.  In
425	   order to minimize the ESR we will assume that the burst lasts no more
426	   than one second, and so we can afford to lose no more than packet per
427	   second packets in each burst.  As long as such one-second bursts do
428	   not exceed four percent of the time, we still maintain the allowable
429	   ESR.  Hence the maximum permissible packet loss rate is 4%.  Of
430	   course, this estimate is extremely optimistic, and furthermore does
431	   not take into consideration the SESR criteria.

433	   As previously explained, a SES is declared whenever AIS is detected.

435	   There is a major difference between structure-aware and structure-
436	   agnostic transport in this regards.  When a packet is lost SAToP
437	   outputs an "all-ones" pattern to the TDM circuit, which is
438	   interpreted as AIS according to G.775 [G775].  For E1 circuits, G.775
439	   specifies for AIS to be detected when four consecutive TDM frames
440	   have no more than 2 alternations.  This means that if a PW packet or
441	   consecutive packets containing at least four frames are lost, and
442	   four or more frames of "all-ones" output to the TDM circuit, a SES
443	   will be declared.  Thus burst packet loss, or packets containing a
444	   large number of TDM frames, lead SAToP to cause high SESR, which is
445	   20 times more restricted than ESR.  On the other hand, since
446	   structure-aware transport regenerates the correct frame alignment
447	   pattern, even when the corresponding packet has been lost, packet
448	   loss will not cause declaration of SES.  This is the main reason that
449	   SAToP is much more vulnerable to packet loss than the structure-aware
450	   methods.

452	   For realistic networks, the maximum allowed packet loss for SAToP
453	   will be intermediate between the extremely pessimistic estimates and
454	   the extremely optimistic ones.  In order to numerically gauge the
455	   situation, we have modeled the network as a four-state Markov model,
456	   (corresponding to a successfully received packet, a packet received
457	   within a loss burst, a packet lost within a burst, and a packet lost
458	   when not within a burst).  This model is an extension of the widely
459	   used Gilbert model.  We set the transition probabilities in order to
460	   roughly correspond to anecdotal evidence, namely low background
461	   isolated packet loss, and infrequent bursts wherein most packets are
462	   lost.  Such simulation shows that up to 0.5% average packet loss may
463	   occur and the recovered TDM still conform to the G.826 ESR and SESR
464	   criteria.

466	Authors' Addresses

468	   Yaakov (Jonathan) Stein
469	   RAD Data Communications
470	   24 Raoul Wallenberg St., Bldg C
471	   Tel Aviv  69719
472	   ISRAEL

474	   Phone: +972 (0)3 645-5389
475	   Email: yaakov_s@rad.com
476	   David L. Black
477	   EMC Corporation
478	   176 South St.
479	   Hopkinton, MA  69719
480	   USA

482	   Phone: +1 (508) 293-7953
483	   Email: david.black@emc.com

485	   Bob Briscoe
486	   BT
487	   B54/77, Adastral Park
488	   Martlesham Heath
489	   Ipswich  IP5 3RE
490	   UK

492	   Phone: +44 1473 645196
493	   Email: bob.briscoe@bt.com
494	   URI:   http://bobbriscoe.net/