idnits 2.17.1 

draft-bryant-filsfils-fat-pw-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 2, 2009) is 5534 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 4447 (ref. '6') (Obsoleted by RFC 8077)

  ** Downref: Normative reference to an Informational RFC: RFC 4378 (ref. '8')

  ** Obsolete normative reference: RFC 4379 (ref. '9') (Obsoleted by RFC 8029)

  == Outdated reference: A later version (-02) exists of
     draft-kompella-mpls-entropy-label-00


     Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	PWE3                                                      S. Bryant, Ed.
3	Internet-Draft                                               C. Filsfils
4	Intended status: Standards Track                           Cisco Systems
5	Expires: September 3, 2009                                      U. Drafz
6	                                                        Deutsche Telekom
7	                                                             V. Kompella
8	                                                                J. Regan
9	                                                          Alcatel-Lucent
10	                                                               S. Amante
11	                                                  Level 3 Communications
12	                                                           March 2, 2009

14	                Flow Aware Transport of MPLS Pseudowires
15	                    draft-bryant-filsfils-fat-pw-03

17	Status of this Memo

19	   This Internet-Draft is submitted to IETF in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF), its areas, and its working groups.  Note that
24	   other groups may also distribute working documents as Internet-
25	   Drafts.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   The list of current Internet-Drafts can be accessed at
33	   http://www.ietf.org/ietf/1id-abstracts.txt.

35	   The list of Internet-Draft Shadow Directories can be accessed at
36	   http://www.ietf.org/shadow.html.

38	   This Internet-Draft will expire on September 3, 2009.

40	Copyright Notice

42	   Copyright (c) 2009 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents in effect on the date of
47	   publication of this document (http://trustee.ietf.org/license-info).
48	   Please review these documents carefully, as they describe your rights
49	   and restrictions with respect to this document.

51	Abstract

53	   Where the payload carried over a pseudowire carries a number of
54	   identifiable flows it can in some circumstances be desirable to carry
55	   those flows over the equal cost multiple paths (ECMPs) that exist in
56	   the packet switched network.  Most forwarding engines are able to
57	   hash based on label stacks and use this to balance flows over ECMPs.
58	   This draft describes a method of identifying the flows, or flow
59	   groups, to the label switched routers by including an additional
60	   label in the label stack.

62	Requirements Language

64	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
65	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
66	   document are to be interpreted as described in RFC2119 [1].

68	Table of Contents

70	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
71	     1.1.  ECMP in Label Switched Routers . . . . . . . . . . . . . .  5
72	     1.2.  Flow Label . . . . . . . . . . . . . . . . . . . . . . . .  5
73	   2.  Native Service Processing Function . . . . . . . . . . . . . .  6
74	   3.  Pseudowire Forwarder . . . . . . . . . . . . . . . . . . . . .  6
75	     3.1.  Encapsulation  . . . . . . . . . . . . . . . . . . . . . .  6
76	   4.  Signaling the Presence of the Flow Label . . . . . . . . . . .  7
77	     4.1.  Structure of Flow Label TLV  . . . . . . . . . . . . . . .  8
78	   5.  OAM  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
79	   6.  Applicability  . . . . . . . . . . . . . . . . . . . . . . . . 10
80	     6.1.  ECMP . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
81	     6.2.  Link Aggregation Groups  . . . . . . . . . . . . . . . . . 12
82	     6.3.  The Single Large Flow Case . . . . . . . . . . . . . . . . 12
83	   7.  Applicability to MPLS  . . . . . . . . . . . . . . . . . . . . 13
84	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
85	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
86	   10. Congestion Considerations  . . . . . . . . . . . . . . . . . . 14
87	   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
88	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
89	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 15
90	     12.2. Informative References . . . . . . . . . . . . . . . . . . 16
91	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16

93	1.  Introduction

95	   A pseudowire [11] is normally transported over one single network
96	   path, even if multiple Equal Cost Multiple Paths (ECMP) exit between
97	   the ingress and egress PEs[2] [3].  This is required to preserve the
98	   characteristics of the emulated service (e.g. to avoid misordering
99	   SAToP pseudowire's [4]).  The use of a single path to preserve order
100	   remains the default mode of operation of a pseudowire (PW).  The new
101	   capability proposed in this document is an OPTIONAL mode which may be
102	   used when the use of ECMP paths for is known to be beneficial (and
103	   not harmful) to the operation of the PW.

105	   Some pseudowires are used to transport large volumes of IP traffic
106	   between routers at two locations.  One example of this is the use of
107	   an Ethernet pseudowire to create a virtual direct link between a pair
108	   of routers.  Such pseudowire's may carry from hundred's of Mbps to
109	   Gbps of traffic.  Such pseudowire's do not require strict ordering to
110	   be preserved between packets of the pseudowire.  They only require
111	   ordering to be preserved within the context of each individual
112	   transported IP flow.  Some operators have requested the ability to
113	   explicitly configure such a pseudowire to leverage the availability
114	   of multiple ECMP paths.  This allows for better capacity planning as
115	   the statistical multiplexing of a larger number of smaller flows is
116	   more efficient than with a smaller set of larger flows.  Although
117	   Ethernet is used as an example above, the mechanisms described in
118	   this draft are general mechanisms that may be applied to any
119	   pseudowire type in which there are identifiable flows, and in which
120	   the there is no requirement to preserve the order between those
121	   flows.

123	   Typically, forwarding hardware can deduce that an IP payload is being
124	   directly carried by an MPLS label stack, and is capable of looking at
125	   some fields in packets to construct hash buckets for conversations or
126	   flows.  However, an intermediate node has no information on the type
127	   pseudowire being carried in the packet.  This limits the forwarder at
128	   the intermediate node to only being able to make an ECMP choice based
129	   on a hash of the label stack.  In the case of a pseudowire emulating
130	   a high bandwidth trunk, the granularity obtained by hashing the
131	   default label stack is inadequate for satisfactory load-balancing.
132	   The ingress node, however, is in the special position of being able
133	   to look at the un-encapsulated packet and spread flows amongst an
134	   available ECMP paths, or even Loop-Free Alternates I [12] .  This
135	   draft proposes a method to introduce granularity on the hashing of
136	   traffic running over pseudowires by introducing an additional label,
137	   chosen by the ingress node, and placed at the bottom of the label
138	   stack.

140	   In addition to providing an indication of the flow structure for use
141	   in ECMP forwarding decisions, the mechanism described in the document
142	   may also be used to select flows for distribution over an 802.1ad
143	   link aggregation group that has been used in an MPLS network.

145	1.1.  ECMP in Label Switched Routers

147	   Label switched routers commonly hash the label stack or some elements
148	   of the label stack as a method of discriminating between flows, in
149	   order to distribute those flows over the available equal cost
150	   multiple paths that exist in the network.  Since the label at the
151	   bottom of stack is usually the label most closely associated with the
152	   flow, this normally provides the greatest entropy, and hence is
153	   usually included in the hash.  This draft describes a method of
154	   adding an additional label at the bottom of stack in order to
155	   facilitate the load balancing of the flows within a pseudowire over
156	   the available ECMPs.  A similar design for general MPLS use has also
157	   been proposed [13], however that is outside the scope of this draft.

159	   An alternative method of load balancing by creating a number of
160	   pseudowires and distributing the flows amongst them was considered,
161	   but was rejected because:

163	   o  It did not introduce as much entropy as the load balance label
164	      method.

166	   o  It required additional pseudowires to be set up and maintained.

168	1.2.  Flow Label

170	   An additional label is interposed between the pseudowire label and
171	   the control word, or if the control word is not present, between the
172	   pseudowire label and the pseudowire payload.  This additional label
173	   is called the Flow label.  Indivisible flows within the pseudowire
174	   MUST be mapped to the same Flow label by the ingress PE.  The flow
175	   label stimulates the correct ECMP load balancing behaviour in the
176	   PSN.  On receipt of the pseudowire packet at the egress PE (which
177	   knows this additional label is present) the flow label is discarded
178	   without processing.

180	   Note that the flow label MUST NOT be an MPLS reserved label (values
181	   in the range 0..15) [5], but is otherwise unconstrained by the
182	   protocol.

184	   Considerations of the TTL value are described in the Security section
185	   of this document.  In the case of a pseudowire there are no lower
186	   restrictions on the label value since the TTL is never the top label.
187	   The designers of the generalized solution [13].

189	2.  Native Service Processing Function

191	   The Native Service Processing (NSP) function is a component of a PE
192	   that has knowledge of the structure of the emulated service and is
193	   able to take action on the service outside the scope of the
194	   pseudowire.  In this case it is required that the NSP in the ingress
195	   PE identify flows, or groups of flows within the service, and
196	   indicate the flow (group) identity of each packet as it is passed to
197	   the pseudowire forwarder.  Since this is an NSP function, by
198	   definition, the method used to identify a flow is outside the scope
199	   of the pseudowire design.  Similarly, since the NSP is internal to
200	   the PE, the method of flow indication to the pseudowire forwarder is
201	   outside the scope of this document

203	3.  Pseudowire Forwarder

205	   The pseudowire forwarder must be provided with a method of mapping
206	   flows to load balanced paths.

208	   The forwarder must generate a label for the flow or group of flows.
209	   How the load balance label values are determined is outside the scope
210	   of this document, however the load balance label allocated to a flow
211	   MUST NOT be an MPLS reserved label and SHOULD remain constant for the
212	   life of the flow.  It is recommended that the method chosen to
213	   generate the load balancing labels introduces a high degree of
214	   entropy in their values, to maximise the entropy presented to the
215	   ECMP path selection mechanism in the LSRs in the PSN, and hence
216	   distribute the flows as evenly as possible over the available PSN
217	   ECMP paths.  The forwarder at the ingress PE prepends the pseudowire
218	   control word (if applicable), and then pushes the flow label,
219	   followed by the pseudowire label.

221	   The forwarder at the egress PE uses the pseudowire label to identify
222	   the pseudowire.  From the context associated with the pseudowire
223	   label, the egress PE can determine whether a flow label is present.
224	   If a flow label is present, the label is discarded.

226	   All other pseudowire forwarding operations are unmodified by the
227	   inclusion of the flow label.

229	3.1.  Encapsulation

231	   The PWE3 Protocol Stack Reference Model modified to include flow
232	   label is shown in Figure 1 below
233	      +-------------+                                +-------------+
234	      |  Emulated   |                                |  Emulated   |
235	      |  Ethernet   |                                |  Ethernet   |
236	      | (including  |         Emulated Service       | (including  |
237	      |  VLAN)      |<==============================>|  VLAN)      |
238	      |  Services   |                                |  Services   |
239	      +-------------+                                +-------------+
240	      |    Flow     |                                |    Flow     |
241	      +-------------+            Pseudowire          +-------------+
242	      |Demultiplexer|<==============================>|Demultiplexer|
243	      +-------------+                                +-------------+
244	      |    PSN      |            PSN Tunnel          |    PSN      |
245	      |   MPLS      |<==============================>|   MPLS      |
246	      +-------------+                                +-------------+
247	      |  Physical   |                                |  Physical   |
248	      +-----+-------+                                +-----+-------+

250	               Figure 1: PWE3 Protocol Stack Reference Model

252	   The encapsulation of a pseudowire with a flow label is shown in
253	   Figure 2 below

255	    +-------------------------------+
256	    |      MPLS Tunnel label(s)     | n*4 octets (four octets per label)
257	    +-------------------------------+
258	    |      PW label                 |  4 octets
259	    +-------------------------------+
260	    |      Flow label               |  4 octets
261	    +-------------------------------+
262	    |   Optional Control Word       |  4 octets
263	    +-------------------------------+
264	    |            Payload            |
265	    |                               |
266	    |                               |  n octets
267	    |                               |
268	    +-------------------------------+

270	      Figure 2: Encapsulation of a pseudowire with a pseudowire load
271	                              balancing label

273	4.  Signaling the Presence of the Flow Label

275	   When using the signalling procedures in [6], there is a Pseudowire
276	   Interface Parameter Sub-TLV type used to synchronize the flow label
277	   states between the ingress and egress PEs.

279	   The absence of a flow label (FL) TLV by either party indicates that
280	   the PE concerned is unable to recognize this TLV and the sender of
281	   the FL TLV MUST send a new label mapping without the FL TLV.  This
282	   preserves backwards compatibility with existing PEs that do not
283	   understand the FL TLV or that cannot, do not wish to, process the
284	   flow label.

286	   A PE that wishes to use a flow label sends an FL TLV with the F bit
287	   set.  A PE that can correctly process a flow label and is willing to
288	   receive on, but does not wish to send a flow label sends an FL TLV
289	   with the F bit clear.  A PE that sends an FL TLV with the F bit set
290	   and receives an FL TLV with or without the F bit set MUST include the
291	   flow label between the pseudowire label and the control word (or is
292	   the control word is not present between the pseudowire label and the
293	   pseudowire payload).

295	   If PWE3 signalling [6] is not in use for a pseudowire, then whether
296	   the flow label is used MUST be identically provisioned in both PEs at
297	   the pseudowire endpoints.  If there is no provisioning support for
298	   this option, the default behaviour is not to include the flow label.

300	   Note that what is signalled is the desire to include the flow label
301	   in the label stack.  The value of the label is a local matter for the
302	   ingress PE, and the label value itself is not signalled.

304	4.1.  Structure of Flow Label TLV

306	   The structure of the flow label TLV is shown in Figure 3.

308	    0                   1                   2                   3
309	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
310	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
311	   | FL            |    Length     |F|       must be zero          |
312	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

314	                         Figure 3: Multiple VC TLV

316	   Where:

318	   o  FL is the flow label TLV identifier assigned by IANA.

320	   o  Length is the length of the TLV in octets and is 4.

322	   o  When F=1 a flow label will be pushed.  When F=0 a flow label will
323	      not be pushed.

325	5.  OAM

327	   The following OAM considerations apply to this method of load
328	   balancing.

330	   Where the OAM is only to be used to perform a basic test that the
331	   pseudowires have been configured at the PEs, VCCV [7] messages may be
332	   sent using any load balance pseudowire path, i.e. using any value for
333	   the flow label.

335	   Where it is required to verify that a pseudowire is fully functional
336	   for all flows, VCCV [7] connection verification message MUST be sent
337	   over each ECMP path to the pseudowire egress PE.  This problem is
338	   difficult to solve and scales poorly.  We believe that this problem
339	   is addressed by the following two methods:

341	   1.  If a failure occurs within the PSN, this failure will normally be
342	       detected by the PSN's IGP (link/node failure, link or BFD or IGP
343	       hello detection), and the IGP convergence will naturally modify
344	       the ECMP set of network paths between the Ingress and Egress
345	       PE's.  Hence the PW is only impacted during the normal IGP
346	       convergence time.

348	   2.  If the failure is related to the individual corruption of an LFIB
349	       entry in a router, then only the network path using that specific
350	       entry is impacted.  If the PW is load balanced over multiple
351	       network paths, then this failure can only be detected if, by
352	       chance, the transported OAM flow is mapped onto the impacted
353	       network path, or all paths are tested.  This type of error may be
354	       better solved be solved by other means such as LSP self test
355	       [14].

357	   To troubleshoot the MPLS PSN, including multiple paths, the
358	   techniques described in [8] and [9] can be used.

360	   Where the pseudowire OAM is carried out of band (VCCV Type 2) it is
361	   necessary to insert an "MPLS Router Alert Label" in the label stack.
362	   The resultant label stack is a follows:

364	    +-------------------------------+
365	    |      MPLS Tunnel label(s)     | n*4 octets (four octets per label)
366	    +-------------------------------+
367	    |      Router Alert label       |  4 octets
368	    +-------------------------------+
369	    |      PW label                 |  4 octets
370	    +-------------------------------+
371	    |      Flow label               |  4 octets
372	    +-------------------------------+
373	    |   Optional Control Word       |  4 octets
374	    +-------------------------------+
375	    |            Payload            |
376	    |                               |
377	    |                               |  n octets
378	    |                               |
379	    +-------------------------------+

381	                    Figure 4: Use of Router Alert LAbel

383	6.  Applicability

385	   A node within the PSN is not able to perform deep-packet-inspection
386	   (DPI) of the PW as the PW technology is not self-describing: the
387	   structure of the PW payload is only known to the ingress and egress
388	   PE devices.  The method proposed in this document provides a
389	   statistical mitigation of the problem of load balance in those cased
390	   where a PE is able to discern flows embedded in the traffic received
391	   on the attachment circuit.

393	   The methods describe in this document are transparent to the PSN and
394	   as such do not require any new capability from the PSN.

396	   The requirement to load-balance over multiple PSN paths occurs when
397	   the ratio between the PW access speed and the PSN's core link
398	   bandwidth is large (e.g. >= 10%).  ATM and FR are unlikely to meet
399	   this property.  Ethernet does and this is the reason why this
400	   document focuses on Ethernet.  Applications for other high-access-
401	   bandwidth PW's (fiber-channel) may be defined in the future.

403	   This design applies to MPLS pseudowires where it is meaningful to
404	   deconstruct the packets presented to the ingress PE into flows.  The
405	   mechanism described in this document promotes the distribution of
406	   flows within the pseudowire over different network paths.  This in
407	   turn means that whilst packets within a flow are delivered in order
408	   (subject to normal IP delivery perturbations due to topology
409	   variation), order is not maintained amongst packets of different
410	   flows.  It is not proposed to associate a different sequence number
411	   with each flow.  If sequence number support is required this
412	   mechanism is not applicable.

414	   Where it is known that the traffic carried by the Ethernet pseudowire
415	   is IP the method of identifying the flows are well known and can be
416	   applied.  Such methods typically include hashing on the source and
417	   destination addresses, the protocol ID and higher-layer flow-
418	   dependent fields such as TCP/UDP ports, L2TPv3 Session ID's etc.

420	   Where it is known that the traffic carried by the Ethernet pseudowire
421	   is non-IP, techniques used for link bundling between Ethernet
422	   switches may be reused.  In this case however the latency
423	   distribution would be larger than is found in the link bundle case.
424	   The acceptability of the increased latency is for further study.  Of
425	   particular importance the Ethernet control frames SHOULD always be
426	   mapped to the same PSN path to ensure in-order delivery.

428	6.1.  ECMP

430	   ECMP in packet switched networks is statistical in nature.  The
431	   mapping of flows to a particular path does not take into account the
432	   bandwidth of the flow being mapped or the current bandwidth usage of
433	   the members of the ECMP set.  This simplification works well when the
434	   distribution of flows is evenly spread over the ECMP set and there
435	   are a large number of flows that have low bandwidth relative to the
436	   paths.  A random allocation of a flow to a path provides a good
437	   approximation to an even spread provided polarization effects are
438	   avoided.  The method proposed in this document has the same
439	   statistical properties as an IP PSN.

441	   ECMP is a load-sharing mechanism that is based on sharing the load
442	   over a number of layer 3 paths through the PSN.  Often however
443	   multiple links exist between a pair of LSRs that are considered by
444	   the IGP to be a single link.  These are known as link bundles.  The
445	   mechanism described in this document can also be used to distribute
446	   the flows within a pseudowire over the members of the link bundle by
447	   using the flow label value to identify candidate flows.  How that
448	   mapping takes place is outside the scope of this specification.
449	   Similar considerations apply to link aggregation groups.

451	   In the ECMP case and the link bundling case the NSP may attempt to
452	   take bandwidth into consideration when allocating groups of flows to
453	   a common path.  That is permitted, but it must be borne in mind that
454	   the semantics of a label stack entry (LSE) as defined by [5] cannot
455	   be modified, the value of the flow label cannot be modified at any
456	   point on the LSP, and the interpretation of bit patterns in or values
457	   of the flow label by an LSR are undefined.

459	   A different type of load balancing is the desire to carry a
460	   pseudowire over a set of PSN links in which the bandwidth of members
461	   of the link set is less than the bandwidth of the pseudowire.  This
462	   problem is addressed in [15].  Such a mechanism can be considered
463	   complementary to this mechanism.

465	6.2.  Link Aggregation Groups

467	   Link Aggregation (LAG) is used to bond together several physical
468	   circuits between two adjacent nodes so they appear to higher-layer
469	   protocols as a single, higher bandwidth "virtual" pipe.  These may
470	   co-exist in various parts of a given network.  An advantage of LAGs
471	   is that they reduce the number of routing and signaling protocol
472	   adjacencies between devices, reducing control plane processing
473	   overhead.  As with ECMP key problem related to LAG is, due to
474	   inefficiencies in LAG load-distribution algorithms, a particular
475	   component- link may experience congestion, and the mechanism proposed
476	   here may be able to assist in producing a more uniform flow
477	   distribution.

479	   The same considerations requiring a flow to go over a single member
480	   of an ECMP path set apply to a member of a LAG.

482	6.3.  The Single Large Flow Case

484	   Clearly the operator should make sure that the service offered using
485	   PW technology and the method described in this document does not
486	   exceed the maximum planned link capacity unless it can be guaranteed
487	   that it conforms to the Internet traffic profile of a very large
488	   number of small flows.

490	   If the payload on a PW is made of a single inner flow (i.e. an
491	   encrypted connection between two routers), or the flow identifiers
492	   are too deeply buried in the packet then the functionality described
493	   in this document does not give any benefits, though neither does it
494	   cause harm relative to the existing situation.  The most common case
495	   where a single flow dominated the traffic on a PW is when it is used
496	   to transport enterprise traffic.  Enterprise traffic may well consist
497	   of a large single TCP flows , or encrypted flows that cannot be
498	   handled by the methods described in this document.

500	   An operator has six options under these circumstances:

502	   1.  The operator can do nothing and the system will work as it does
503	       without the flow label.

505	   2.  The operator can make the customer aware that the service
506	       offering has a restriction on flow bandwidth and police flows to
507	       that restriction.  This would allow customers offering multiple
508	       flows to use a larger fraction their access bandwidth, whilst
509	       preventing an single flow from consuming a fraction of internal
510	       link bandwidth that the operator considered excessive.

512	   3.  The operator could configure the ingress PE to assign a constant
513	       flow label to all high bandwidth flows so that only one path was
514	       affected by these flows,

516	   4.  The operator could configure the ingress PE to assign a random
517	       flow label to all high bandwidth flows so as to minimise the
518	       disruption to the network as a cost of out of order traffic to
519	       the user.

521	   5.  The operator could configure the ingress to assign a label of
522	       special significance to all high bandwidth flows so that some
523	       other action (not specified in this document) could be taken on
524	       the flow.

526	   The issues described above are mitigated by the following two
527	   factors:

529	   o  Firstly, the customer of a high-bandwidth PW service has an
530	      incentive to get the best transport service because an inefficient
531	      use of the PSN leads to jitter and eventually to loss to the PW's
532	      payload.

534	   o  Secondly, the customer is usually able to tailor their
535	      applications to generate many flows in the PSN.  A well-known
536	      example is massive data transport between servers which use many
537	      parallel TCP sessions.  This same technique can be used by any
538	      transport protocol: multiple UDP ports, multiple L2TPv3 Session
539	      ID's, multiple GRE keys may be used to decompose a large flow into
540	      smaller components.  This approach may be applied to IPsec where
541	      multiple SPI's may be allocated to the same security association.

543	7.  Applicability to MPLS

545	   A further application of this technique would be to create a basis
546	   for hash diversity without having to peek below the label stack for
547	   IP traffic carried over LDP LSPs.  Work on the generalization of this
548	   to MPLS has been described in draft-kompella-mpls-entropy-label.
549	   This is can be regarded as a complementary but distinct approach
550	   since although similar consideration may apply to the identification
551	   of flows and the allocation of flow label values, the flow labels are
552	   imposed by different network components and the associated signalling
553	   mechanisms are different.

555	8.  Security Considerations

557	   The pseudowire generic security considerations described in [11] and
558	   the security considerations applicable to a specific pseudowire type
559	   (for example, in the case of an Ethernet pseudowire [10] apply.

561	   The ingress PE SHOULD take steps to ensure that the load-balance
562	   label is not used as a covert channel.

564	   It is useful to give consideration to the choice of TTL value in the
565	   flow label LSE.  Since the flow label is the bottom of stack and even
566	   when PHP is employed will on arrival at the egress PE be prepended by
567	   the PW label, the flow label TTL MAY be set to a value of 1.  This
568	   will prevent the packet being inadvertently forwarded based on the
569	   value of the flow label.  Note that this may be a departure from
570	   considerations that apply to the general MPLS case.

572	9.  IANA Considerations

574	   IANA is requested to allocate the next available values from the IETF
575	   Consensus range in the Pseudowire Interface Parameters Sub-TLV type
576	   Registry as a Flow Label indicator.

578	   Parameter  Length       Description

580	   TBD         4            Load Balancing Label

582	10.  Congestion Considerations

584	   The congestion considerations applicable to pseudowires as described
585	   in [11] and any additional congestion considerations developed at the
586	   time of publication apply to this design.

588	   The ability to explicitly configure a PW to leverage the availability
589	   of multiple ECMP paths is beneficial to capacity planning as, all
590	   other parameters being constant, the statistical multiplexing of a
591	   larger number of smaller flows is more efficient than with a smaller
592	   number of larger flows.

594	   Note that if the classification into flows is only performed on IP
595	   packets the behaviour of those flows in the face of congestion will
596	   be as already defined by the IETF for packets of that type and no
597	   additional congestion processing is required.

599	   Where flows that are not IP are classified pseudowire congestion
600	   avoidance must be applied to each non-IP load balance group.

602	11.  Acknowledgements

604	   The authors wish to thank Joerg Kuechemann, Wilfried Maas, Luca
605	   Martini, Mark Townsley, Kireeti Kompella and Lucy Yong for valuable
606	   comments on this document.

608	12.  References

610	12.1.  Normative References

612	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
613	         Levels", BCP 14, RFC 2119, March 1997.

615	   [2]   Bryant, S., Swallow, G., Martini, L., and D. McPherson,
616	         "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for Use
617	         over an MPLS PSN", RFC 4385, February 2006.

619	   [3]   Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal Cost
620	         Multipath Treatment in MPLS Networks", BCP 128, RFC 4928,
621	         June 2007.

623	   [4]   Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time Division
624	         Multiplexing (TDM) over Packet (SAToP)", RFC 4553, June 2006.

626	   [5]   Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci,
627	         D., Li, T., and A. Conta, "MPLS Label Stack Encoding",
628	         RFC 3032, January 2001.

630	   [6]   Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron,
631	         "Pseudowire Setup and Maintenance Using the Label Distribution
632	         Protocol (LDP)", RFC 4447, April 2006.

634	   [7]   Nadeau, T. and C. Pignataro, "Pseudowire Virtual Circuit
635	         Connectivity Verification (VCCV): A Control Channel for
636	         Pseudowires", RFC 5085, December 2007.

638	   [8]   Allan, D. and T. Nadeau, "A Framework for Multi-Protocol Label
639	         Switching (MPLS) Operations and Management (OAM)", RFC 4378,
640	         February 2006.

642	   [9]   Kompella, K. and G. Swallow, "Detecting Multi-Protocol Label
643	         Switched (MPLS) Data Plane Failures", RFC 4379, February 2006.

645	   [10]  Martini, L., Rosen, E., El-Aawar, N., and G. Heron,
646	         "Encapsulation Methods for Transport of Ethernet over MPLS
647	         Networks", RFC 4448, April 2006.

649	12.2.  Informative References

651	   [11]  Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to-Edge
652	         (PWE3) Architecture", RFC 3985, March 2005.

654	   [12]  Zinin, A., Torvi, R., Choudhury, G., Martin, C., Imhoff, B.,
655	         and D. Fedyk, "Basic Specification for IP Fast-Reroute: Loop-
656	         free Alternates", draft-ietf-rtgwg-ipfrr-spec-base-12 (work in
657	         progress), March 2008.

659	   [13]  Kompella, K. and S. Amante, "The Use of Entropy Labels in MPLS
660	         Forwarding", draft-kompella-mpls-entropy-label-00 (work in
661	         progress), July 2008.

663	   [14]  Swallow, G., "Label Switching Router Self-Test",
664	         draft-ietf-mpls-lsr-self-test-07 (work in progress), May 2007.

666	   [15]  Stein, Y., Mendelsohn, I., and R. Insler, "PW Bonding",
667	         draft-stein-pwe3-pwbonding-01 (work in progress),
668	         November 2008.

670	Authors' Addresses

672	   Stewart Bryant (editor)
673	   Cisco Systems
674	   250 Longwater Ave
675	   Reading  RG2 6GB
676	   United Kingdom

678	   Phone: +44-208-824-8828
679	   Email: stbryant@cisco.com

681	   Clarence Filsfils
682	   Cisco Systems
683	   Brussels
684	   Belgium

686	   Email: cfilsfil@cisco.com
687	   Ulrich Drafz
688	   Deutsche Telekom
689	   Muenster,
690	   Germany

692	   Phone:
693	   Fax:
694	   Email: Ulrich.Drafz@t-com.net
695	   URI:

697	   Vach Kompella
698	   Alcatel-Lucent

700	   Phone:
701	   Fax:
702	   Email: Alcatel-Lucent vach.kompella@alcatel-lucent.com
703	   URI:

705	   Joe Regan
706	   Alcatel-Lucent

708	   Phone:
709	   Fax:
710	   Email: joe.regan@alcatel-lucent.comRegan
711	   URI:

713	   Shane Amante
714	   Level 3 Communications

716	   Phone:
717	   Fax:
718	   Email: shane@castlepoint.net
719	   URI: