idnits 2.17.1 

draft-ietf-rtgwg-cl-requirement-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     FR#5  Any automatic LSP routing and/or load balancing solutions
     MUST not oscillate such that performance observed by users changes such
     that an SLA is violated.  Since oscillation may cause reordering, there
     MUST be means to control the frequency of changing the component link
     over which a flow is placed.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     DR#9  When a worst case failure scenario occurs, the number of
     RSVP-TE LSPs to be resignaled will cause a period of unavailability as
     perceived by users.  The resignaling time of the solution MUST meet the
     SLA objective for the duration of unavailability.  The resignaling time
     of the solution MUST not increase significantly as compared with current
     methods.

  -- The document date (July 8, 2010) is 5034 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'I-D.ietf-pwe3-fat-pw' is mentioned on line 661, but
     not defined

  == Missing Reference: 'IEEE-802.1AX' is mentioned on line 698, but not
     defined

  == Missing Reference: 'ITU-T.Y.1541' is mentioned on line 510, but not
     defined

  == Missing Reference: 'RFC1717' is mentioned on line 701, but not defined

  ** Obsolete undefined reference: RFC 1717 (Obsoleted by RFC 1990)

  == Missing Reference: 'RFC2475' is mentioned on line 645, but not defined

  == Missing Reference: 'RFC2615' is mentioned on line 703, but not defined

  == Missing Reference: 'RFC2991' is mentioned on line 660, but not defined

  == Missing Reference: 'RFC2992' is mentioned on line 660, but not defined

  == Missing Reference: 'RFC3260' is mentioned on line 646, but not defined

  == Missing Reference: 'RFC4201' is mentioned on line 725, but not defined

  == Missing Reference: 'RFC4301' is mentioned on line 565, but not defined

  == Missing Reference: 'RFC4385' is mentioned on line 658, but not defined

  == Missing Reference: 'RFC4928' is mentioned on line 659, but not defined

  == Unused Reference: 'RFC2702' is defined on line 427, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4665' is defined on line 446, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5254' is defined on line 450, but no explicit
     reference was found in the text


     Summary: 1 error (**), 0 flaws (~~), 19 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTGWG                                                 C. Villamizar, Ed.
3	Internet-Draft                                      Infinera Corporation
4	Intended status: Informational                           D. McDysan, Ed.
5	Expires: January 9, 2011                                         S. Ning
6	                                                                A. Malis
7	                                                                 Verizon
8	                                                                 L. Yong
9	                                                              Huawei USA
10	                                                            July 8, 2010

12	              Requirements for MPLS Over a Composite Link
13	                   draft-ietf-rtgwg-cl-requirement-01

15	Abstract

17	   There is often a need to provide large aggregates of bandwidth that
18	   is best provided using parallel links between routers or MPLS LSR.
19	   In core networks there is often no alternative since the aggregate
20	   capacities of core networks today far exceed the capacity of a single
21	   physical link or single packet processing element.  Furthermore,
22	   links may be composed of network elements operating across multiple
23	   layers.

25	   The presence of parallel links, potentially comprised of multiple
26	   layers has resulted in a additional requirements.  Certain services
27	   may benefit from being restricted to a subset of the set of composite
28	   link component links or a specific component link, where component
29	   link characteristics, such as latency, differ.  Certain services
30	   require that LSP be treated as atomic and avoid reordering.  Other
31	   services will continue to require only that reordering not occur with
32	   a microflow as is current practice.

34	   Current practice related to multipath is described briefly in an
35	   appendix.

37	Status of this Memo

39	   This Internet-Draft is submitted in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF).  Note that other groups may also distribute
44	   working documents as Internet-Drafts.  The list of current Internet-
45	   Drafts is at http://datatracker.ietf.org/drafts/current/.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   This Internet-Draft will expire on January 9, 2011.

54	Copyright Notice

56	   Copyright (c) 2010 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (http://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with respect
64	   to this document.  Code Components extracted from this document must
65	   include Simplified BSD License text as described in Section 4.e of
66	   the Trust Legal Provisions and are provided without warranty as
67	   described in the Simplified BSD License.

69	Table of Contents

71	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
72	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
73	   2.  Assumptions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
74	   3.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
75	   4.  Network Operator Functional Requirements . . . . . . . . . . .  5
76	     4.1.  Availability, Stability and Transient Response . . . . . .  5
77	     4.2.  Component Links Provided by Lower Layer Networks . . . . .  6
78	     4.3.  Parallel Component Links with Different Characteristics  .  7
79	   5.  Derived Requirements . . . . . . . . . . . . . . . . . . . . .  8
80	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
81	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
82	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
83	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
84	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 10
85	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 11
86	     9.3.  Appendix References  . . . . . . . . . . . . . . . . . . . 11
87	   Appendix A.  More Details on Existing Network Operator
88	                Practices and Protocol Usage  . . . . . . . . . . . . 12
89	   Appendix B.  Existing Multipath Standards and Techniques . . . . . 14
90	     B.1.  Common Multpath Load Spliting Techniques . . . . . . . . . 15
91	     B.2.  Simple and Adaptive Load Balancing Multipath . . . . . . . 16
92	     B.3.  Traffic Split over Parallel Links  . . . . . . . . . . . . 16
93	     B.4.  Traffic Split over Multiple Paths  . . . . . . . . . . . . 17
94	   Appendix C.  ITU-T G.800 Composite Link Definitions and
95	                Terminology . . . . . . . . . . . . . . . . . . . . . 17
96	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18

98	1.  Introduction

100	   The purpose of this document is to describe why network operators
101	   require certain functions in order to solve certain business problems
102	   (Section 2).  The intent is to first describe why things need to be
103	   done in terms of functional requirements that are as independent as
104	   possible of protocol specifications (Section 4).  For certain
105	   functional requirements this document describes a set of derived
106	   protocol requirements (Section 5).  Three appendices provide
107	   supporting details as a summary of existing/prior operator
108	   approaches, a summary of implementation techniques and relevant
109	   protocol standards, and a summary of G.800 terminology used to define
110	   the concept of a composite link.  (Appendix B).

112	1.1.  Requirements Language

114	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
115	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
116	   document are to be interpreted as described in RFC 2119 [RFC2119].

118	2.  Assumptions

120	   The services supported include L3VPN, L2VPN (VPWS and VPLS), Internet
121	   traffic encapsulated by at least one MPLS label, and dynamically
122	   signaled MPLS-TP LSPs and pseudowires.  The MPLS LSPs supporting
123	   these services may be pt-pt, pt-mpt, or mpt-mpt.

125	   The location in a network where these requirements apply are a Label
126	   Edge Router (LER) or a Label Switch Router (LSR) as defined in RFC
127	   3031 [RFC3031].

129	   The IP DSCP cannot be used for flow identification since L3VPN
130	   requires Diffserv transparency (see RFC 4031 5.5.2 [RFC4031]), and in
131	   general network operators do not rely on the DSCP of Internet
132	   packets.

134	3.  Definitions

136	   Composite Link:
137	       Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link
138	       as summarized in Appendix Appendix C.  The following definitions
139	       map the ITU-T G.800 terminology into IETF terminology which is
140	       used in this document.

142	       Multiple parallel links:  When multiple parallel component links
143	           between the an LER/LSR and another LER/LSR.

145	       Multi-layer Component Link:  A component link that is formed by
146	           other network elements at other layers.

148	   Component Link:  A physical link (e.g., Lambda, Ethernet PHY, SONET/
149	       SDH, OTN, etc.) with packet transport capability, or a logical
150	       link (e.g., MPLS LSP, Ethernet VLAN, MPLS-TP LSP, etc.)

152	   Flow:  A sequence of packets that must be transferred on one
153	       component link.

155	   Flow identification:  The label stack and other information that
156	       uniquely identifies a flow.  Other information in flow
157	       identification may include an IP header, PW control word,
158	       Ethernet MAC address, etc.  Note that an LSP may contain one or
159	       more Flows or an LSP may be equivalent to a Flow.  Flow
160	       identification is used to locally select a component link, or a
161	       path through the network toward the destination.

163	4.  Network Operator Functional Requirements

165	   The Functional Requirements in this section are grouped in
166	   subsections starting with the highest priority.

168	4.1.  Availability, Stability and Transient Response

170	   Limiting the period of unavailability in response to failures or
171	   transient events is extremely important as well as maintaining
172	   stability.  The transient period between some service disrupting
173	   event and the convergence of the routing and/or signaling protocols
174	   MUST occur within a time frame specified by SLA objectives.  The
175	   timeframes range from rapid restoration, on the order of 100 ms or
176	   less (e.g., for VPWS), to several minutes (e.g., for L3VPN) and may
177	   differ among the set of customers within a single service.

179	   FR#1  The solution SHALL provide a means to summarize routing
180	         advertisements regarding the characteristics of a composite
181	         link such that the routing protocol convergence within the
182	         timeframe needed to meet the SLA objective..

184	   FR#2  The solution SHALL provide a means for aggregating signaling
185	         such that in response to a failure in the worst case cross
186	         section of the network that MPLS LSPs are restored within the
187	         timeframe needed to meet the SLA objective.

189	   FR#3  The solution SHALL provide to select a path for a flow across a
190	         network that contains a number of paths comprised of pairs of
191	         nodes connected by composite links in such a way as to
192	         automatically distribute the load over the network nodes
193	         connected by composite links while meeting all of the other
194	         mandatory requirements stated above.  The solution SHOULD work
195	         in a manner similar to that when the characteristics of the
196	         individual component links are advertised.

198	   FR#4  If extensions to existing protocols are specified and/or new
199	         protocols are defined, then the solution SHOULD provide a means
200	         for a network operator to migrate an existing deployment in a
201	         minimally disruptive manner.

203	   FR#5  Any automatic LSP routing and/or load balancing solutions MUST
204	         not oscillate such that performance observed by users changes
205	         such that an SLA is violated.  Since oscillation may cause
206	         reordering, there MUST be means to control the frequency of
207	         changing the component link over which a flow is placed.

209	   FR#6  Management and diagnostic protocols MUST be able to operate
210	         over composite links.

212	4.2.  Component Links Provided by Lower Layer Networks

214	   Case 3 as defined in [ITU-T.G.800] involves a component link
215	   supporting an MPLS layer network over another lower layer network
216	   (e.g., circuit switched or another MPLS network (e.g., MPLS-TP)).
217	   The lower layer network may change the latency (and/or other
218	   performance parameters) seen by the MPLS layer network.  Network
219	   Operators have SLAs of which some components are based on performance
220	   parameters.  Currently, there is no protocol for the lower layer
221	   network to inform the higher layer network of a change in a
222	   performance parameter.  Communication of the latency performance
223	   parameter is a very important requirement.  Communication of other
224	   performance parameters (e.g., delay variation) is desirable.

226	   FR#7   In order to support network SLAs and provide acceptable user
227	          experience, the solution SHALL specify a protocol means to
228	          allow a lower layer server network to communicate latency to
229	          the higher layer client network.

231	   FR#8   The precision of latency reporting SHOULD be at least 10% of
232	          the one way latency for latency of 1 ms or more.

234	   FR#9   The solution SHALL provide a means to limit the latency on a
235	          per LSP basis between nodes within a network to meet an SLA
236	          target when the path between these nodes contains one or more
237	          pairs of nodes connected via a composite link.

239	          The SLAs differ across the services, and some services have
240	          different SLAs for different QoS classes, for example, one QoS
241	          class may have a much larger latency bound than another.
242	          Overload can occur which would violate an SLA parameter (e.g.,
243	          loss) and some remedy to handle this case for a composite
244	          link.

246	   FR#10  If the total demand offered by traffic flows exceeds the
247	          capacity of the composite link, the solution SHOULD define a
248	          means to cause the LSPs for some traffic flows to move to some
249	          other point in the network that is not congested.  These
250	          "preempted LSPs" may not be restored if there is no
251	          uncongested path in the network.

253	4.3.  Parallel Component Links with Different Characteristics

255	   Corresponding to Case 1 of [ITU-T.G.800], as one means to provide
256	   high availability, network operators deploy a topology in the MPLS
257	   network using lower layer networks that have a certain degree of
258	   diversity at the lower layer(s).  Many techniques have been developed
259	   to balance the distribution of flows across component links that
260	   connect the same pair of nodes (See Appendix B.3).  When the path for
261	   a flow can be chosen from a set of candidate nodes connected via
262	   composite links, other techniques have been developed (See
263	   Appendix B.4).

265	   FR#11  The solution SHALL measure traffic on a labeled traffic flow
266	          and dynamically select the component link on which to place
267	          this flow in order to balance the load so that no component
268	          link in the composite link between a pair of nodes is
269	          overloaded.

271	   FR#12  When a traffic flow is moved from one component link to
272	          another in the same composite link between a set of nodes (or
273	          sites), it MUST be done so in a minimally disruptive manner.

275	          When a flow is moved from a current link to a target link with
276	          different latency, reordering can occur if the target link
277	          latency is less than that of the current or clumping can occur
278	          if target link latency is greater than that of the current.
279	          Therefore, some flows (e.g., timing distribution, PW circuit
280	          emulation) are quite sensitive to these effects, which may be
281	          specified in an SLA or are needed to meet a user experience
282	          objective (e.g. jitter buffer under/overrun).

284	   FR#13  The solution SHALL provide a means to identify flows whose
285	          rearrangement frequency needs to be bounded by a configured
286	          value.

288	   FR#14  The solution SHALL provide a means that communicates whether
289	          the flows within an LSP can be split across multiple component
290	          links.  The solution SHOULD provide a means to indicate the
291	          flow identification field(s) which can be used along the flow
292	          path which can be used to perform this function.

294	   FR#15  The solution SHALL provide a means to indicate that a traffic
295	          flow shall select a component link with the minimum latency
296	          value.

298	   FR#16  The solution SHALL provide a means to indicate that a traffic
299	          flow shall select a component link with a maximum acceptable
300	          latency value as specified by protocol.

302	   FR#17  The solution SHALL provide a means to indicate that a traffic
303	          flow shall select a component link with a maximum acceptable
304	          delay variation value as specified by protocol.

306	   FR#18  The solution SHALL provide a local means to a node which
307	          automatically distribute flows across the component links in
308	          the composite link that connects to the other node such that
309	          SLA objectives are met.

311	   FR#19  The solution SHALL provide a means to distribute flows from a
312	          single LSP across multiple component links to handle at least
313	          the case where the traffic carried in an LSP exceeds that of
314	          any component link in the composite link.

316	5.  Derived Requirements

318	   This section takes the next step and derives high-level requirements
319	   on protocol specification from the functional requirements.

321	   DR#1  The solution SHOULD attempt to extend existing protocols
322	         wherever possible, developing a new protocol only if this adds
323	         a significant set of capabilities.

325	         The vast majority of network operators have provisioned L3VPN
326	         services over LDP.  Many have deployed L2VPN services over LDP
327	         as well.  TE extensions to IGP and RSVP-TE are viewed as being
328	         overly complex by some operators.

330	   DR#2  A solution SHOULD extend LDP capabilities to meet functional
331	         requirements (without using TE methods as decided in
332	         [RFC3468]).

334	   DR#3  Coexistence of LDP and RSVP-TE signaled LSPs MUST be supported
335	         on a composite link.  Other functional requirements should be
336	         supported as independently of signaling protocol as possible.

338	   DR#4  When the nodes connected via a composite link are in the same
339	         MPLS network topology, the solution MAY define extensions to
340	         the IGP.

342	   DR#5  When the nodes are connected via a composite link are in
343	         different MPLS network topologies, the solution SHALL NOT rely
344	         on extensions to the IGP.

346	   DR#6  When a worst case failure scenario occurs,the resulting number
347	         of links advertised in the IGP causes IGP convergence to occur,
348	         causing a period of unavailability as perceived by users.  The
349	         convergence time of the solution MUST meet the SLA objective
350	         for the duration of unavailability.

352	   DR#7  The Solution SHALL summarize the characteristics of the
353	         component links as a composite link IGP advertisement that
354	         results in convergence time better than that of advertising the
355	         individual component links.  This summary SHALL be designed so
356	         that it represents the range of capabilities of the individual
357	         component links such that functional requirements are met, and
358	         also minimizes the frequency of advertisement updates which may
359	         cause IGP convergence to occur.  Examples of advertisement
360	         update tiggering events to be considered include: LSP
361	         establishment/release, changes in component link
362	         characteristics (e.g., latency, up/down state), and/or
363	         bandwidth utilization.

365	   DR#8  When a worst case failure scenario occurs,the resulting number
366	         of links advertised in the IGP causes IGP convergence to occur,
367	         causing a period of unavailability as perceived by users.  The
368	         convergence time of the solution MUST meet the SLA objective
369	         for the duration of unavailability.

371	   DR#9  When a worst case failure scenario occurs, the number of
372	         RSVP-TE LSPs to be resignaled will cause a period of
373	         unavailability as perceived by users.  The resignaling time of
374	         the solution MUST meet the SLA objective for the duration of
375	         unavailability.  The resignaling time of the solution MUST not
376	         increase significantly as compared with current methods.

378	6.  Acknowledgements

380	   Frederic Jounay of France Telecom and Yuji Kamite of NTT
381	   Communications Corporation co-authored a version of this document.

383	   A rewrite of this document occurred after the IETF77 meeting.
384	   Dimitri Papadimitriou, Lou Berger, Tony Li, the WG chairs John Scuder
385	   and Alex Zinin, and others provided valuable guidance prior to and at
386	   the IETF77 RTGWG meeting.

388	   Tony Li and John Drake have made numerous valuable comments on the
389	   RTGWG mailing list that are reflected in versions following the
390	   IETF77 meeting.

392	7.  IANA Considerations

394	   This memo includes no request to IANA.

396	8.  Security Considerations

398	   This document specifies a set of requirements.  The requirements
399	   themselves do not pose a security threat.  If these requirements are
400	   met using MPLS signaling as commonly practiced today with
401	   authenticated but unencrypted OSPF-TE, ISIS-TE, and RSVP-TE or LDP,
402	   then the requirement to provide additional information in this
403	   communication presents additional information that could conceivably
404	   be gathered in a man-in-the-middle confidentiality breach.  Such an
405	   attack would require a capability to monitor this signaling either
406	   through a provider breach or access to provider physical transmission
407	   infrastructure.  A provider breach already poses a threat of numerous
408	   tpes of attacks which are of far more serious consequence.  Encrption
409	   of the signaling can prevent or render more difficult any
410	   confidentiality breach that otherwise might occur by means of access
411	   to provider physical transmission infrastructure.

413	9.  References

415	9.1.  Normative References

417	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
418	              Requirement Levels", BCP 14, RFC 2119, March 1997.

420	9.2.  Informative References

422	   [ITU-T.G.800]
423	              ITU-T, "Unified functional architecture of transport
424	              networks", 2007, <http://www.itu.int/rec/T-REC-G/
425	              recommendation.asp?parent=T-REC-G.800>.

427	   [RFC2702]  Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M., and J.
428	              McManus, "Requirements for Traffic Engineering Over MPLS",
429	              RFC 2702, September 1999.

431	   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
432	              Label Switching Architecture", RFC 3031, January 2001.

434	   [RFC3468]  Andersson, L. and G. Swallow, "The Multiprotocol Label
435	              Switching (MPLS) Working Group decision on MPLS signaling
436	              protocols", RFC 3468, February 2003.

438	   [RFC3809]  Nagarajan, A., "Generic Requirements for Provider
439	              Provisioned Virtual Private Networks (PPVPN)", RFC 3809,
440	              June 2004.

442	   [RFC4031]  Carugi, M. and D. McDysan, "Service Requirements for Layer
443	              3 Provider Provisioned Virtual Private Networks (PPVPNs)",
444	              RFC 4031, April 2005.

446	   [RFC4665]  Augustyn, W. and Y. Serbest, "Service Requirements for
447	              Layer 2 Provider-Provisioned Virtual Private Networks",
448	              RFC 4665, September 2006.

450	   [RFC5254]  Bitar, N., Bocci, M., and L. Martini, "Requirements for
451	              Multi-Segment Pseudowire Emulation Edge-to-Edge (PWE3)",
452	              RFC 5254, October 2008.

454	9.3.  Appendix References

456	   [I-D.ietf-pwe3-fat-pw]
457	              Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan,
458	              J., and S. Amante, "Flow Aware Transport of Pseudowires
459	              over an MPLS PSN", draft-ietf-pwe3-fat-pw-03 (work in
460	              progress), January 2010.

462	   [IEEE-802.1AX]
463	              IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE
464	              Standard for Local and Metropolitan Area Networks - Link
465	              Aggregation", 2006, <http://standards.ieee.org/getieee802/
466	              download/802.1AX-2008.pdf>.

468	   [ITU-T.Y.1541]
469	              ITU-T, "Network performance objectives for IP-based
470	              services", 2006, <http://www.itu.int/rec/T-REC-Y.1541/en>.

472	   [RFC1717]  Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The
473	              PPP Multilink Protocol (MP)", RFC 1717, November 1994.

475	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
476	              and W. Weiss, "An Architecture for Differentiated
477	              Services", RFC 2475, December 1998.

479	   [RFC2615]  Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615,
480	              June 1999.

482	   [RFC2991]  Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
483	              Multicast Next-Hop Selection", RFC 2991, November 2000.

485	   [RFC2992]  Hopps, C., "Analysis of an Equal-Cost Multi-Path
486	              Algorithm", RFC 2992, November 2000.

488	   [RFC3260]  Grossman, D., "New Terminology and Clarifications for
489	              Diffserv", RFC 3260, April 2002.

491	   [RFC4201]  Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling
492	              in MPLS Traffic Engineering (TE)", RFC 4201, October 2005.

494	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
495	              Internet Protocol", RFC 4301, December 2005.

497	   [RFC4385]  Bryant, S., Swallow, G., Martini, L., and D. McPherson,
498	              "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for
499	              Use over an MPLS PSN", RFC 4385, February 2006.

501	   [RFC4928]  Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal
502	              Cost Multipath Treatment in MPLS Networks", BCP 128,
503	              RFC 4928, June 2007.

505	Appendix A.  More Details on Existing Network Operator Practices and
506	             Protocol Usage

508	   Network operators have SLAs for services that are comprised of
509	   numerical values for performance measures, principally availability,
510	   latency, delay variation.  See [ITU-T.Y.1541], RFC 3809, Section 4.9
511	   [RFC3809] for examples of the form of such SLAs.  Note that the
512	   numerical values of Y.1541 span multiple networks and may be looser
513	   than network operator SLAs.  Applications and acceptable user
514	   experience have a relationship to these performance parameters.

516	   Consider latency as an example.  In some cases, minimizing latency
517	   relates directly to the best customer experience (e.g., in TCP closer
518	   is faster).  I other cases, user experience is relatively insensitive
519	   to latency, up to a specific limit at which point user perception of
520	   quality degrades significantly (e.g., interactive human voice and
521	   multimedia conferencing).  A number of SLAs have. a bound on point-
522	   point latency, and as long as this bound is met, the SLA is met --
523	   decreasing the latency is not necessary.  In some SLAs, if the
524	   specified latency is not met, the user considers the service as
525	   unavailable.  An unprotected LSP can be manually provisioned on a set
526	   of to meet this type of SLA, but this lowers availability since an
527	   alternate route that meets the latency SLA cannot be determined.

529	   Historically, when an IP/MPLS network was operated over a lower layer
530	   circuit switched network (e.g., SONET rings), a change in latency
531	   caused by the lower layer network (e.g., due to a maintenance action
532	   or failure) this was not known to the MPLS network.  This resulted in
533	   latency affecting end user experience, sometimes violating SLAs or
534	   resulting in user complaints.

536	   A response to this problem was to provision IP/MPLS networks over
537	   unprotected circuits and set the metric and/or TE-metric proportional
538	   to latency.  This resulted in traffic being directed over the least
539	   latency path, even if this was not needed to meet an SLA or meet user
540	   experience objectives.  This results in reduced flexibility and
541	   increased cost for network operators.  Using lower layer networks to
542	   provide restoration and grooming is expected to be more efficient,
543	   but the inability to communicate performance parameters, in
544	   particular latency, from the lower layer network to the higher layer
545	   network is an important problem to be solved before this can be done.

547	   Latency SLAs for pt-pt services are often tied closely to geographic
548	   locations, while latency for multipoint services may be based upon a
549	   worst case within a region.

551	   The presence of only three Traffic Class (TC) bits (previously known
552	   as EXP bits) in the MPLS shim header is limiting when a network
553	   operator needs to support QoS classes for multiple services (e.g.,
554	   L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS
555	   classes that need to be supported.  In some cases one bit is used to
556	   indicate conformance to some ingress traffic classification, leaving
557	   only two bits for indicating the service QoS classes.  The approach
558	   that has been taken is to aggregate these QoS classes into similar
559	   sets on LER-LSR and LSR-LSR links.

561	   Labeled LSPs have been and use of link layer encapsulation have been
562	   standardized in order to provide a means to meet these needs.

564	   The IP DSCP cannot be used for flow identification since RFC 4301
565	   Section 5.5 [RFC4301] requires Diffserv transparency, and in general
566	   network operators do not rely on the DSCP of Internet packets.

568	   A label is pushed onto Internet packets when they are carried along
569	   with L2/L3VPN packets on the same link or lower layer network
570	   provides a mean to distinguish between the QoS class for these
571	   packets.

573	   Operating an MPLS-TE network involves a different paradigm from
574	   operating an IGP metric-based LDP signaled MPLS network.  The mpt-pt
575	   LDP signaled MPLS LSPs occur automatically, and balancing across
576	   parallel links occurs if the IGP metrics are set "equally" (with
577	   equality a locally definable relation).

579	   Traffic is typically comprised of a few large (some very large) flows
580	   and many small flows.  In some cases, separate LSPs are established
581	   for very large flows.  This can occur even if the IP header
582	   information is inspected by a router, for example an IPsec tunnel
583	   that carries a large amount of traffic.  An important example of
584	   large flows is that of a L2/L3 VPN customer who has an access line
585	   bandwdith comparable to a client-client composite link bandwidth --
586	   there could be flows that are on the order of the access line
587	   bandwdith.

589	Appendix B.  Existing Multipath Standards and Techniques

591	   Today the requirement to handle large aggregations of traffic, much
592	   larger than a single component link, can be handled by a number of
593	   techniques which we will collectively call multipath.  Multipath
594	   applied to parallel links between the same set of nodes includes
595	   Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or
596	   other aggregation techniques some of which may be vendor specific.
597	   Multipath applied to diverse paths rather than parallel links
598	   includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or
599	   even BGP, and equal cost LSP, as described in Appendix B.4.  Various
600	   mutilpath techniques have strengths and weaknesses.

602	   The term composite link is more general than terms such as link
603	   aggregate which is generally considered to be specific to Ethernet
604	   and its use here is consistent with the broad definition in
605	   [ITU-T.G.800].  The term multipath excludes inverse multiplexing and
606	   refers to techniques which only solve the problem of large
607	   aggregations of traffic, without addressing the other requirements
608	   outlined in this document.

610	B.1.  Common Multpath Load Spliting Techniques

612	   Identical load balancing techniqes are used for multipath both over
613	   parallel links and over diverse paths.

615	   Large aggregates of IP traffic do not provide explicit signaling to
616	   indicate the expected traffic loads.  Large aggregates of MPLS
617	   traffic are carried in MPLS tunnels supported by MPLS LSP.  LSP which
618	   are signaled using RSVP-TE extensions do provide explicit signaling
619	   which includes the expected traffic load for the aggregate.  LSP
620	   which are signaled using LDP do not provide an expected traffic load.

622	   MPLS LSP may contain other MPLS LSP arranged hierarchically.  When an
623	   MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
624	   payload, there is no signaling associated with these inner LSP.
625	   Therefore even when using RSVP-TE signaling there may be insufficient
626	   information provided by signaling to adequately distribute load
627	   across a composite link.

629	   Generally a set of label stack entries that is unique across the
630	   ordered set of label numbers can safely be assumed to contain a group
631	   of flows.  The reordering of traffic can therefore be considered to
632	   be acceptable unless reordering occurs within traffic containing a
633	   common unique set of label stack entries.  Existing load splitting
634	   techniques take advantage of this property in addition to looking
635	   beyond the bottom of the label stack and determining if the payload
636	   is IPv4 or IPv6 to load balance traffic accordingly.

638	   MPLS-TP OAM violates the assumption that it is safe to reorder
639	   traffic within an LSP.  If MPLS-TP OAM is to be accommodated, then
640	   existing multipth techniques must be modified.  Such modifications
641	   are outside the scope of this document.

643	   For example a large aggregate of IP traffic may be subdivided into a
644	   large number of groups of flows using a hash on the IP source and
645	   destination addresses.  This is as described in [RFC2475] and
646	   clarified in [RFC3260].  For MPLS traffic carrying IP, a similar hash
647	   can be performed on the set of labels in the label stack.  These
648	   techniques are both examples of means to subdivide traffic into
649	   groups of flows for the purpose of load balancing traffic across
650	   aggregated link capacity.  The means of identifying a flow should not
651	   be confused with the definition of a flow.

653	   Discussion of whether a hash based approach provides a sufficiently
654	   even load balance using any particular hashing algorithm or method of
655	   distributing traffic across a set of component links is outside of
656	   the scope of this document.

658	   The current load balancing techniques are referenced in [RFC4385] and
659	   [RFC4928].  The use of three hash based approaches are described in
660	   [RFC2991] and [RFC2992].  A mechanism to identify flows within PW is
661	   described in [I-D.ietf-pwe3-fat-pw].  The use of hash based
662	   approaches is mentioned as an example of an existing set of
663	   techniques to distribute traffic over a set of component links.
664	   Other techniques are not precluded.

666	B.2.  Simple and Adaptive Load Balancing Multipath

668	   Simple multipath generally relies on the mathematical probability
669	   that given a very large number of small microflows, these microflows
670	   will tend to be distributed evenly across a hash space.  A common
671	   simple multipath implementation assumes that all members (component
672	   links) are of equal capacity and perform a modulo operation across
673	   the hashed value.  An alternate simple multipath technique uses a
674	   table generally with a power of two size, and distributes the table
675	   entries proportionally among members according to the capacity of
676	   each member.

678	   Simple load balancing works well if there are a very large number of
679	   small microflows (i.e., microflow rate is much less than component
680	   link capacity).  However, the case where there are even a few large
681	   microflows is not handled well by simple load balancing.

683	   An adaptive multipath technique is one where the traffic bound to
684	   each member (component link) is measured and the load split is
685	   adjusted accordingly.  As long as the adjustment is done within a
686	   single network element, then no protocol extensions are required and
687	   there are no interoperability issues.

689	   Note that if the load balancing algorithm and/or its parameters is
690	   adjusted, then packets in some flows may be delivered out of
691	   sequence.

693	B.3.  Traffic Split over Parallel Links

695	   The load spliting techniques defined in Appendix B.1 and Appendix B.2
696	   are both used in splitting traffic over parallel links between the
697	   same pair of nodes.  The best known technique, though far from being
698	   the first, is Ethernet Link Aggregation [IEEE-802.1AX].  This same
699	   technique had been applied much earlier using OSPF or ISIS Equal Cost
700	   MultiPath (ECMP) over parallel links between the same nodes.
701	   Multilink PPP [RFC1717] uses a technique that provides inverse
702	   multiplexing, however a number of vendors had provided proprietary
703	   extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet
704	   Link Aggregation but are no longer used.

706	   Link bundling [RFC4201] provides yet another means of handling
707	   parallel LSP.  RFC4201 explicitly allow a special value of all ones
708	   to indicate a split across all members of the bundle.

710	B.4.  Traffic Split over Multiple Paths

712	   OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of
713	   traffic split over multiple paths that may traverse intermediate
714	   nodes.  ECMP is often incorrectly equated to only this case, and
715	   multipath over multiple diverse paths is often incorrectly equated to
716	   ECMP.

718	   Many implementations are able to create more than one LSP between a
719	   pair of nodes, where these LSP are routed diversely to better make
720	   use of available capacity.  The load on these LSP can be distributed
721	   proportionally to the reserved bandwidth of the LSP.  These multiple
722	   LSP may be advertised as a single PSC FA and any LSP making use of
723	   the FA may be split over these multiple LSP.

725	   Link bundling [RFC4201] component links may themselves be LSP.  When
726	   this technique is used, any LSP which specifies the link bundle may
727	   be split across the multiple paths of the LSP that comprise the
728	   bundle.

730	Appendix C.  ITU-T G.800 Composite Link Definitions and Terminology

732	   Composite Link:
733	       Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link
734	       in terms of three cases, of which the following two are relevant
735	       (the one describing inverse (TDM) multiplexing does not apply).
736	       Note that these case definitions are taken verbatim from section
737	       6.9, "Layer Relationships".

739	       Case 1:  "Multiple parallel links between the same subnetworks
740	           can be bundled together into a single composite link.  Each
741	           component of the composite link is independent in the sense
742	           that each component link is supported by a separate server
743	           layer trail.  The composite link conveys communication
744	           information using different server layer trails thus the
745	           sequence of symbols crossing this link may not be preserved.
746	           This is illustrated in Figure 14."

748	       Case 3:  "A link can also be constructed by a concatenation of
749	           component links and configured channel forwarding
750	           relationships.  The forwarding relationships must have a 1:1
751	           correspondence to the link connections that will be provided
752	           by the client link.  In this case, it is not possible to
753	           fully infer the status of the link by observing the server
754	           layer trails visible at the ends of the link.  This is
755	           illustrated in Figure 16."

757	   Subnetwork:  A set of one or more nodes (i.e., LER or LSR) and links.
758	       As a special case it can represent a site comprised of multiple
759	       nodes.

761	   Forwarding Relationship:  Configured forwarding between ports on a
762	       subnetwork.  It may be connectionless (e.g., IP, not considered
763	       in this draft), or connection oriented (e.g., MPLS signaled or
764	       configured).

766	   Component Link:  A topolological relationship between subnetworks
767	       (i.e., a connection between nodes), which may be a wavelength,
768	       circuit, virtual circuit or an MPLS LSP.

770	Authors' Addresses

772	   Curtis Villamizar (editor)
773	   Infinera Corporation
774	   169 W. Java Drive
775	   Sunnyvale, CA  94089

777	   Email: cvillamizar@infinera.com

779	   Dave McDysan (editor)
780	   Verizon
781	   22001 Loudoun County PKWY
782	   Ashburn, VA  20147

784	   Email: dave.mcdysan@verizon.com

786	   So Ning
787	   Verizon
788	   2400 N. Glenville Ave.
789	   Richardson, TX  75082

791	   Phone: +1 972-729-7905
792	   Email: ning.so@verizonbusiness.com
793	   Andrew Malis
794	   Verizon
795	   117 West St.
796	   Waltham, MA  02451

798	   Phone: +1 781-466-2362
799	   Email: andrew.g.malis@verizon.com

801	   Lucy Yong
802	   Huawei USA
803	   1700 Alma Dr. Suite 500
804	   Plano, TX  75075

806	   Phone: +1 469-229-5387
807	   Email: lucyyong@huawei.com