idnits 2.17.1 

draft-ietf-rtgwg-cl-requirement-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 504 has weird spacing: '... packet  trans...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     FR#5  Any automatic LSP routing and/or load balancing solutions
     MUST not oscillate such that performance observed by users changes such
     that an NPO is violated.  Since oscillation may cause reordering, there
     MUST be means to control the frequency of changing the component link
     over which a flow is placed.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     DR#7  When a worst case failure scenario occurs, the number of
     RSVP-TE LSPs to be resignaled will cause a period of unavailability as
     perceived by users.  The resignaling time of the solution MUST meet the
     NPO objective for the duration of unavailability.  The resignaling time
     of the solution MUST not increase significantly as compared with current
     methods.

  -- The document date (January 10, 2011) is 4855 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'I-D.ietf-pwe3-fat-pw' is mentioned on line 717, but
     not defined

  == Missing Reference: 'IEEE-802.1AX' is mentioned on line 754, but not
     defined

  == Missing Reference: 'ITU-T.Y.1540' is mentioned on line 597, but not
     defined

  == Missing Reference: 'ITU-T.Y.1541' is mentioned on line 555, but not
     defined

  == Missing Reference: 'RFC1717' is mentioned on line 758, but not defined

  ** Obsolete undefined reference: RFC 1717 (Obsoleted by RFC 1990)

  == Missing Reference: 'RFC2475' is mentioned on line 701, but not defined

  == Missing Reference: 'RFC2615' is mentioned on line 760, but not defined

  == Missing Reference: 'RFC2991' is mentioned on line 716, but not defined

  == Missing Reference: 'RFC2992' is mentioned on line 716, but not defined

  == Missing Reference: 'RFC3260' is mentioned on line 702, but not defined

  == Missing Reference: 'RFC4201' is mentioned on line 782, but not defined

  == Missing Reference: 'RFC4301' is mentioned on line 621, but not defined

  == Missing Reference: 'RFC4385' is mentioned on line 714, but not defined

  == Missing Reference: 'RFC4928' is mentioned on line 715, but not defined

  == Unused Reference: 'RFC2702' is defined on line 442, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4665' is defined on line 467, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5254' is defined on line 484, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-05) exists of
     draft-ietf-l2vpn-vpms-frmwk-requirements-03


     Summary: 1 error (**), 0 flaws (~~), 22 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTGWG                                                 C. Villamizar, Ed.
3	Internet-Draft                                      Infinera Corporation
4	Intended status: Informational                           D. McDysan, Ed.
5	Expires: July 14, 2011                                           S. Ning
6	                                                                A. Malis
7	                                                                 Verizon
8	                                                                 L. Yong
9	                                                              Huawei USA
10	                                                        January 10, 2011

12	              Requirements for MPLS Over a Composite Link
13	                   draft-ietf-rtgwg-cl-requirement-03

15	Abstract

17	   There is often a need to provide large aggregates of bandwidth that
18	   are best provided using parallel links between routers or MPLS LSR.
19	   In core networks there is often no alternative since the aggregate
20	   capacities of core networks today far exceed the capacity of a single
21	   physical link or single packet processing element.

23	   The presence of parallel links, with each link potentially comprised
24	   of multiple layers has resulted in additional requirements.  Certain
25	   services may benefit from being restricted to a subset of the
26	   component links or a specific component link, where component link
27	   characteristics, such as latency, differ.  Certain services require
28	   that an LSP be treated as atomic and avoid reordering.  Other
29	   services will continue to require only that reordering not occur
30	   within a microflow as is current practice.

32	   Current practice related to multipath is described briefly in an
33	   appendix.

35	Status of this Memo

37	   This Internet-Draft is submitted in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF).  Note that other groups may also distribute
42	   working documents as Internet-Drafts.  The list of current Internet-
43	   Drafts is at http://datatracker.ietf.org/drafts/current/.

45	   Internet-Drafts are draft documents valid for a maximum of six months
46	   and may be updated, replaced, or obsoleted by other documents at any
47	   time.  It is inappropriate to use Internet-Drafts as reference
48	   material or to cite them other than as "work in progress."
49	   This Internet-Draft will expire on July 14, 2011.

51	Copyright Notice

53	   Copyright (c) 2011 IETF Trust and the persons identified as the
54	   document authors.  All rights reserved.

56	   This document is subject to BCP 78 and the IETF Trust's Legal
57	   Provisions Relating to IETF Documents
58	   (http://trustee.ietf.org/license-info) in effect on the date of
59	   publication of this document.  Please review these documents
60	   carefully, as they describe your rights and restrictions with respect
61	   to this document.  Code Components extracted from this document must
62	   include Simplified BSD License text as described in Section 4.e of
63	   the Trust Legal Provisions and are provided without warranty as
64	   described in the Simplified BSD License.

66	Table of Contents

68	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
69	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
70	   2.  Assumptions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
71	   3.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
72	   4.  Network Operator Functional Requirements . . . . . . . . . . .  5
73	     4.1.  Availability, Stability and Transient Response . . . . . .  5
74	     4.2.  Component Links Provided by Lower Layer Networks . . . . .  6
75	     4.3.  Parallel Component Links with Different Characteristics  .  7
76	   5.  Derived Requirements . . . . . . . . . . . . . . . . . . . . .  9
77	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
78	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
79	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
80	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
81	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 11
82	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 11
83	     9.3.  Appendix References  . . . . . . . . . . . . . . . . . . . 12
84	   Appendix A.  More Details on Existing Network Operator
85	                Practices and Protocol Usage  . . . . . . . . . . . . 13
86	   Appendix B.  Existing Multipath Standards and Techniques . . . . . 15
87	     B.1.  Common Multpath Load Spliting Techniques . . . . . . . . . 16
88	     B.2.  Simple and Adaptive Load Balancing Multipath . . . . . . . 17
89	     B.3.  Traffic Split over Parallel Links  . . . . . . . . . . . . 17
90	     B.4.  Traffic Split over Multiple Paths  . . . . . . . . . . . . 18
91	   Appendix C.  ITU-T G.800 Composite Link Definitions and
92	                Terminology . . . . . . . . . . . . . . . . . . . . . 18
93	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19

95	1.  Introduction

97	   The purpose of this document is to describe why network operators
98	   require certain functions in order to solve certain business problems
99	   (Section 2).  The intent is to first describe why things need to be
100	   done in terms of functional requirements that are as independent as
101	   possible of protocol specifications (Section 4).  For certain
102	   functional requirements this document describes a set of derived
103	   protocol requirements (Section 5).  Three appendices provide
104	   supporting details as a summary of existing/prior operator approaches
105	   (Appendix A), a summary of implementation techniques and relevant
106	   protocol standards (Appendix B), and a summary of G.800 terminology
107	   used to define a composite link (Appendix C).

109	1.1.  Requirements Language

111	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
112	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
113	   document are to be interpreted as described in RFC 2119 [RFC2119].

115	2.  Assumptions

117	   The services supported include L3VPN RFC 4364 [RFC4364], RFC 4797
118	   [RFC4797]L2VPN RFC 4664 [RFC4664] (VPWS, VPLS (RFC 4761 [RFC4761],
119	   RFC 4762 [RFC4762]) and VPMS VPMS Framework
120	   [I-D.ietf-l2vpn-vpms-frmwk-requirements]), Internet traffic
121	   encapsulated by at least one MPLS label, and dynamically signaled
122	   MPLS or MPLS-TP LSPs and pseudowires.  The MPLS LSPs supporting these
123	   services may be pt-pt, pt-mpt, or mpt-mpt.

125	   The locations in a network where these requirements apply are a Label
126	   Edge Router (LER) or a Label Switch Router (LSR) as defined in RFC
127	   3031 [RFC3031].

129	   The IP DSCP cannot be used for flow identification since L3VPN
130	   requires Diffserv transparency (see RFC 4031 5.5.2 [RFC4031]), and in
131	   general network operators do not rely on the DSCP of Internet
132	   packets.

134	3.  Definitions

136	   ITU-T G.800 Based Composite and Component Link Definitions:
137	       Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite and
138	       component links as summarized in Appendix C.  The following
139	       definitions for composite and component links are derived from
140	       and intended to be consistent with the cited ITU-T G.800
141	       terminology.

143	       Composite Link:  A composite link is a logical link composed of a
144	           set of parallel point-to-point component links, where all
145	           links in the set share the same endpoints.  A composite link
146	           may itself be a component of another composite link, but only
147	           a strict hierarchy of links is allowed.

149	       Component Link:  A point-to-point physical or logical link that
150	           preserves ordering in the steady state.  A component link may
151	           have transient out of order events, but such events must not
152	           exceed the network's specific NPO.  Examples of a physical
153	           link are: Lambda, Ethernet PHY, and OTN.  Examples of a
154	           logical link are: MPLS LSP, Ethernet VLAN, and MPLS-TP LSP.

156	   Flow:  A sequence of packets that must be transferred in order on one
157	       component link.

159	   Flow identification:  The label stack and other information that
160	       uniquely identifies a flow.  Other information in flow
161	       identification may include an IP header, PW control word,
162	       Ethernet MAC address, etc.  Note that an LSP may contain one or
163	       more Flows or an LSP may be equivalent to a Flow.  Flow
164	       identification is used to locally select a component link, or a
165	       path through the network toward the destination.

167	   Network Performance Objective (NPO):  Numerical values for
168	       performance measures, principally availability, latency, and
169	       delay variation.  See Appendix A for more details.

171	4.  Network Operator Functional Requirements

173	   The Functional Requirements in this section are grouped in
174	   subsections starting with the highest priority.

176	4.1.  Availability, Stability and Transient Response

178	   Limiting the period of unavailability in response to failures or
179	   transient events is extremely important as well as maintaining
180	   stability.  The transient period between some service disrupting
181	   event and the convergence of the routing and/or signaling protocols
182	   MUST occur within a time frame specified by NPO values.  Appendix A
183	   provides references and a summary of service types requiring a range
184	   of restoration times.

186	   FR#1  The solution SHALL provide a means to summarize some routing
187	         advertisements regarding the characteristics of a composite
188	         link such that the routing protocol converges within the
189	         timeframe needed to meet the network performance objective.  A
190	         composite link CAN be announced in conjunction with detailed
191	         parameters about its component links, such as bandwidth and
192	         latency.  The composite link SHALL behave as a single IGP
193	         adjacency.

195	   FR#2  The solution SHALL ensure that all possible restoration
196	         operations happen within the timeframe needed to meet the NPO.
197	         The solution may need to specify a means for aggregating
198	         signaling to meet this requirement.

200	   FR#3  The solution SHALL provide a mechanism to select a path for a
201	         flow across a network that contains a number of paths comprised
202	         of pairs of nodes connected by composite links in such a way as
203	         to automatically distribute the load over the network nodes
204	         connected by composite links while meeting all of the other
205	         mandatory requirements stated above.  The solution SHOULD work
206	         in a manner similar to that of current networks without any
207	         composite link protocol enhancements when the characteristics
208	         of the individual component links are advertised.

210	   FR#4  If extensions to existing protocols are specified and/or new
211	         protocols are defined, then the solution SHOULD provide a means
212	         for a network operator to migrate an existing deployment in a
213	         minimally disruptive manner.

215	   FR#5  Any automatic LSP routing and/or load balancing solutions MUST
216	         not oscillate such that performance observed by users changes
217	         such that an NPO is violated.  Since oscillation may cause
218	         reordering, there MUST be means to control the frequency of
219	         changing the component link over which a flow is placed.

221	   FR#6  Management and diagnostic protocols MUST be able to operate
222	         over composite links.

224	4.2.  Component Links Provided by Lower Layer Networks

226	   Case 3 as defined in [ITU-T.G.800] involves a component link
227	   supporting an MPLS layer network over another lower layer network
228	   (e.g., circuit switched or another MPLS network (e.g., MPLS-TP)).
229	   The lower layer network may change the latency (and/or other
230	   performance parameters) seen by the MPLS layer network.  Network
231	   Operators have NPOs of which some components are based on performance
232	   parameters.  Currently, there is no protocol for the lower layer
233	   network to inform the higher layer network of a change in a
234	   performance parameter.  Communication of the latency performance
235	   parameter is a very important requirement.  Communication of other
236	   performance parameters (e.g., delay variation) is desirable.

238	   FR#7   In order to support network NPOs and provide acceptable user
239	          experience, the solution SHALL specify a protocol means to
240	          allow a lower layer server network to communicate latency to
241	          the higher layer client network.

243	   FR#8   The precision of latency reporting SHOULD be at least 10% of
244	          the one way latencies for latency of 1 ms or more.

246	   FR#9   The solution SHALL provide a means to limit the latency on a
247	          per LSP basis between nodes within a network to meet an NPO
248	          target when the path between these nodes contains one or more
249	          pairs of nodes connected via a composite link.

251	          The NPOs differ across the services, and some services have
252	          different NPOs for different QoS classes, for example, one QoS
253	          class may have a much larger latency bound than another.
254	          Overload can occur which would violate an NPO parameter (e.g.,
255	          loss) and some remedy to handle this case for a composite link
256	          is required.

258	   FR#10  If the total demand offered by traffic flows exceeds the
259	          capacity of the composite link, the solution SHOULD define a
260	          means to cause the LSPs for some traffic flows to move to some
261	          other point in the network that is not congested.  These
262	          "preempted LSPs" may not be restored if there is no
263	          uncongested path in the network.

265	4.3.  Parallel Component Links with Different Characteristics

267	   Corresponding to Case 1 of [ITU-T.G.800], as one means to provide
268	   high availability, network operators deploy a topology in the MPLS
269	   network using lower layer networks that have a certain degree of
270	   diversity at the lower layer(s).  Many techniques have been developed
271	   to balance the distribution of flows across component links that
272	   connect the same pair of nodes (See Appendix B.3).  When the path for
273	   a flow can be chosen from a set of candidate nodes connected via
274	   composite links, other techniques have been developed (See
275	   Appendix B.4).

277	   FR#11  The solution SHALL measure traffic on a labeled traffic flow
278	          and dynamically select the component link on which to place
279	          this flow in order to balance the load so that no component
280	          link in the composite link between a pair of nodes is
281	          overloaded.

283	   FR#12  When a traffic flow is moved from one component link to
284	          another in the same composite link between a set of nodes (or
285	          sites), it MUST be done so in a minimally disruptive manner.

287	          When a flow is moved from a current link to a target link with
288	          different latency, reordering can occur if the target link
289	          latency is less than that of the current or clumping can occur
290	          if target link latency is greater than that of the current.
291	          Therefore, some flows (e.g., timing distribution, PW circuit
292	          emulation) are quite sensitive to these effects, which may be
293	          specified in an NPO or are needed to meet a user experience
294	          objective (e.g. jitter buffer under/overrun).

296	   FR#13  The solution SHALL provide a means to identify flows whose
297	          rearrangement frequency needs to be bounded by a configured
298	          value.

300	   FR#14  The solution SHALL provide a means that communicates whether
301	          the flows within an LSP can be split across multiple component
302	          links.  The solution SHOULD provide a means to indicate the
303	          flow identification field(s) which can be used along the flow
304	          path which can be used to perform this function.

306	   FR#15  The solution SHALL provide a means to indicate that a traffic
307	          flow shall select a component link with the minimum latency
308	          value.

310	   FR#16  The solution SHALL provide a means to indicate that a traffic
311	          flow shall select a component link with a maximum acceptable
312	          latency value as specified by protocol.

314	   FR#17  The solution SHALL provide a means to indicate that a traffic
315	          flow shall select a component link with a maximum acceptable
316	          delay variation value as specified by protocol.

318	   FR#18  The solution SHALL provide a means local to a node that
319	          automatically distributes flows across the component links in
320	          the composite link such that NPOs are met.

322	   FR#19  The solution SHALL provide a means to distribute flows from a
323	          single LSP across multiple component links to handle at least
324	          the case where the traffic carried in an LSP exceeds that of
325	          any component link in the composite link.  As defined in
326	          section 3, a flow is a sequence of packets that must be
327	          transferred on one component link.

329	   FR#20  The solution SHOULD support the use case where a composite
330	          link itself is a component link for a higher order composite
331	          link.  For example, a composite link comprised of MPLS-TP bi-
332	          directional tunnels viewed as logical links could then be used
333	          as a component link in yet another composite link that
334	          connects MPLS routers.

336	5.  Derived Requirements

338	   This section takes the next step and derives high-level requirements
339	   on protocol specification from the functional requirements.

341	   DR#1  The solution SHOULD attempt to extend existing protocols
342	         wherever possible, developing a new protocol only if this adds
343	         a significant set of capabilities.

345	         The vast majority of network operators have provisioned L3VPN
346	         services over LDP.  Many have deployed L2VPN services over LDP
347	         as well.  TE extensions to IGP and RSVP-TE are viewed as being
348	         overly complex by some operators.

350	   DR#2  A solution SHOULD extend LDP capabilities to meet functional
351	         requirements (without using TE methods as decided in
352	         [RFC3468]).

354	   DR#3  Coexistence of LDP and RSVP-TE signaled LSPs MUST be supported
355	         on a composite link.  Other functional requirements should be
356	         supported as independently of signaling protocol as possible.

358	   DR#4  When the nodes connected via a composite link are in the same
359	         MPLS network topology, the solution MAY define extensions to
360	         the IGP.

362	   DR#5  When the nodes are connected via a composite link are in
363	         different MPLS network topologies, the solution SHALL NOT rely
364	         on extensions to the IGP.

366	   DR#6  The Solution SHOULD support composite link IGP advertisement
367	         that results in convergence time better than that of
368	         advertising the individual component links.  The solution SHALL
369	         be designed so that it represents the range of capabilities of
370	         the individual component links such that functional
371	         requirements are met, and also minimizes the frequency of
372	         advertisement updates which may cause IGP convergence to occur.

374	         Examples of advertisement update triggering events to be
375	         considered include: LSP establishment/release, changes in
376	         component link characteristics (e.g., latency, up/down state),
377	         and/or bandwidth utilization.

379	   DR#7  When a worst case failure scenario occurs, the number of
380	         RSVP-TE LSPs to be resignaled will cause a period of
381	         unavailability as perceived by users.  The resignaling time of
382	         the solution MUST meet the NPO objective for the duration of
383	         unavailability.  The resignaling time of the solution MUST not
384	         increase significantly as compared with current methods.

386	6.  Acknowledgements

388	   Frederic Jounay of France Telecom and Yuji Kamite of NTT
389	   Communications Corporation co-authored a version of this document.

391	   A rewrite of this document occurred after the IETF77 meeting.
392	   Dimitri Papadimitriou, Lou Berger, Tony Li, the WG chairs John Scuder
393	   and Alex Zinin, and others provided valuable guidance prior to and at
394	   the IETF77 RTGWG meeting.

396	   Tony Li and John Drake have made numerous valuable comments on the
397	   RTGWG mailing list that are reflected in versions following the
398	   IETF77 meeting.

400	7.  IANA Considerations

402	   This memo includes no request to IANA.

404	8.  Security Considerations

406	   This document specifies a set of requirements.  The requirements
407	   themselves do not pose a security threat.  If these requirements are
408	   met using MPLS signaling as commonly practiced today with
409	   authenticated but unencrypted OSPF-TE, ISIS-TE, and RSVP-TE or LDP,
410	   then the requirement to provide additional information in this
411	   communication presents additional information that could conceivably
412	   be gathered in a man-in-the-middle confidentiality breach.  Such an
413	   attack would require a capability to monitor this signaling either
414	   through a provider breach or access to provider physical transmission
415	   infrastructure.  A provider breach already poses a threat of numerous
416	   tpes of attacks which are of far more serious consequence.  Encrption
417	   of the signaling can prevent or render more difficult any
418	   confidentiality breach that otherwise might occur by means of access
419	   to provider physical transmission infrastructure.

421	9.  References

423	9.1.  Normative References

425	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
426	              Requirement Levels", BCP 14, RFC 2119, March 1997.

428	9.2.  Informative References

430	   [I-D.ietf-l2vpn-vpms-frmwk-requirements]
431	              Kamite, Y., JOUNAY, F., Niven-Jenkins, B., Brungard, D.,
432	              and L. Jin, "Framework and Requirements for Virtual
433	              Private Multicast Service (VPMS)",
434	              draft-ietf-l2vpn-vpms-frmwk-requirements-03 (work in
435	              progress), July 2010.

437	   [ITU-T.G.800]
438	              ITU-T, "Unified functional architecture of transport
439	              networks", 2007, <http://www.itu.int/rec/T-REC-G/
440	              recommendation.asp?parent=T-REC-G.800>.

442	   [RFC2702]  Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M., and J.
443	              McManus, "Requirements for Traffic Engineering Over MPLS",
444	              RFC 2702, September 1999.

446	   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
447	              Label Switching Architecture", RFC 3031, January 2001.

449	   [RFC3468]  Andersson, L. and G. Swallow, "The Multiprotocol Label
450	              Switching (MPLS) Working Group decision on MPLS signaling
451	              protocols", RFC 3468, February 2003.

453	   [RFC3809]  Nagarajan, A., "Generic Requirements for Provider
454	              Provisioned Virtual Private Networks (PPVPN)", RFC 3809,
455	              June 2004.

457	   [RFC4031]  Carugi, M. and D. McDysan, "Service Requirements for Layer
458	              3 Provider Provisioned Virtual Private Networks (PPVPNs)",
459	              RFC 4031, April 2005.

461	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
462	              Networks (VPNs)", RFC 4364, February 2006.

464	   [RFC4664]  Andersson, L. and E. Rosen, "Framework for Layer 2 Virtual
465	              Private Networks (L2VPNs)", RFC 4664, September 2006.

467	   [RFC4665]  Augustyn, W. and Y. Serbest, "Service Requirements for
468	              Layer 2 Provider-Provisioned Virtual Private Networks",
469	              RFC 4665, September 2006.

471	   [RFC4761]  Kompella, K. and Y. Rekhter, "Virtual Private LAN Service
472	              (VPLS) Using BGP for Auto-Discovery and Signaling",
473	              RFC 4761, January 2007.

475	   [RFC4762]  Lasserre, M. and V. Kompella, "Virtual Private LAN Service
476	              (VPLS) Using Label Distribution Protocol (LDP) Signaling",
477	              RFC 4762, January 2007.

479	   [RFC4797]  Rekhter, Y., Bonica, R., and E. Rosen, "Use of Provider
480	              Edge to Provider Edge (PE-PE) Generic Routing
481	              Encapsulation (GRE) or IP in BGP/MPLS IP Virtual Private
482	              Networks", RFC 4797, January 2007.

484	   [RFC5254]  Bitar, N., Bocci, M., and L. Martini, "Requirements for
485	              Multi-Segment Pseudowire Emulation Edge-to-Edge (PWE3)",
486	              RFC 5254, October 2008.

488	9.3.  Appendix References

490	   [I-D.ietf-pwe3-fat-pw]
491	              Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan,
492	              J., and S. Amante, "Flow Aware Transport of Pseudowires
493	              over an MPLS PSN", draft-ietf-pwe3-fat-pw-03 (work in
494	              progress), January 2010.

496	   [IEEE-802.1AX]
497	              IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE
498	              Standard for Local and Metropolitan Area Networks - Link
499	              Aggregation", 2006, <http://standards.ieee.org/getieee802/
500	              download/802.1AX-2008.pdf>.

502	   [ITU-T.Y.1540]
503	              ITU-T, "Internet protocol data communication service - IP
504	              packet  transfer and availability performance parameters",
505	              2007, <http://www.itu.int/rec/T-REC-Y.1540/en>.

507	   [ITU-T.Y.1541]
508	              ITU-T, "Network performance objectives for IP-based
509	              services", 2006, <http://www.itu.int/rec/T-REC-Y.1541/en>.

511	   [RFC1717]  Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The
512	              PPP Multilink Protocol (MP)", RFC 1717, November 1994.

514	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
515	              and W. Weiss, "An Architecture for Differentiated
516	              Services", RFC 2475, December 1998.

518	   [RFC2615]  Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615,
519	              June 1999.

521	   [RFC2991]  Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
522	              Multicast Next-Hop Selection", RFC 2991, November 2000.

524	   [RFC2992]  Hopps, C., "Analysis of an Equal-Cost Multi-Path
525	              Algorithm", RFC 2992, November 2000.

527	   [RFC3260]  Grossman, D., "New Terminology and Clarifications for
528	              Diffserv", RFC 3260, April 2002.

530	   [RFC4201]  Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling
531	              in MPLS Traffic Engineering (TE)", RFC 4201, October 2005.

533	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
534	              Internet Protocol", RFC 4301, December 2005.

536	   [RFC4385]  Bryant, S., Swallow, G., Martini, L., and D. McPherson,
537	              "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for
538	              Use over an MPLS PSN", RFC 4385, February 2006.

540	   [RFC4928]  Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal
541	              Cost Multipath Treatment in MPLS Networks", BCP 128,
542	              RFC 4928, June 2007.

544	Appendix A.  More Details on Existing Network Operator Practices and
545	             Protocol Usage

547	   Often, network operators have a contractual Service Level Agreement
548	   (SLA) with customers for services that are comprised of numerical
549	   values for performance measures, principally availability, latency,
550	   delay variation.  Additionally, network operators may have Service
551	   Level Sepcification (SLS) that is for internal use by the operator.
552	   See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809]
553	   for examples of the form of such SLA and SLS specifications.  In this
554	   document we use the term Network Performance Objective (NPO) as
555	   defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures
556	   have network operator and service specific implications.  Note that
557	   the numerical NPO values of Y.1540 and Y.1541 span multiple networks
558	   and may be looser than network operator SLA or SLS objectives.
559	   Applications and acceptable user experience have an important
560	   relationship to these performance parameters.

562	   Consider latency as an example.  In some cases, minimizing latency
563	   relates directly to the best customer experience (e.g., in TCP closer
564	   is faster).  I other cases, user experience is relatively insensitive
565	   to latency, up to a specific limit at which point user perception of
566	   quality degrades significantly (e.g., interactive human voice and
567	   multimedia conferencing).  A number of NPOs have. a bound on point-
568	   point latency, and as long as this bound is met, the NPO is met --
569	   decreasing the latency is not necessary.  In some NPOs, if the
570	   specified latency is not met, the user considers the service as
571	   unavailable.  An unprotected LSP can be manually provisioned on a set
572	   of to meet this type of NPO, but this lowers availability since an
573	   alternate route that meets the latency NPO cannot be determined.

575	   Historically, when an IP/MPLS network was operated over a lower layer
576	   circuit switched network (e.g., SONET rings), a change in latency
577	   caused by the lower layer network (e.g., due to a maintenance action
578	   or failure) this was not known to the MPLS network.  This resulted in
579	   latency affecting end user experience, sometimes violating NPOs or
580	   resulting in user complaints.

582	   A response to this problem was to provision IP/MPLS networks over
583	   unprotected circuits and set the metric and/or TE-metric proportional
584	   to latency.  This resulted in traffic being directed over the least
585	   latency path, even if this was not needed to meet an NPO or meet user
586	   experience objectives.  This results in reduced flexibility and
587	   increased cost for network operators.  Using lower layer networks to
588	   provide restoration and grooming is expected to be more efficient,
589	   but the inability to communicate performance parameters, in
590	   particular latency, from the lower layer network to the higher layer
591	   network is an important problem to be solved before this can be done.

593	   Latency NPOs for pt-pt services are often tied closely to geographic
594	   locations, while latency for multipoint services may be based upon a
595	   worst case within a region.

597	   Section 7 of [ITU-T.Y.1540] defines availability for an IP service in
598	   terms of loss exceeding a threshold for a period on the order of 5
599	   minutes.  However, the timeframes for restoration (i.e., as
600	   implemented by pre-determined protection, convergence of routing
601	   protocols and/or signaling) for services range from on the order of
602	   100 ms or less (e.g., for VPWS to emulate classical SDH/SONET
603	   protection switching), to several minutes (e.g., to allow BGP to
604	   reconverge for L3VPN) and may differ among the set of customers
605	   within a single service.

607	   The presence of only three Traffic Class (TC) bits (previously known
608	   as EXP bits) in the MPLS shim header is limiting when a network
609	   operator needs to support QoS classes for multiple services (e.g.,
610	   L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS
611	   classes that need to be supported.  In some cases one bit is used to
612	   indicate conformance to some ingress traffic classification, leaving
613	   only two bits for indicating the service QoS classes.  The approach
614	   that has been taken is to aggregate these QoS classes into similar
615	   sets on LER-LSR and LSR-LSR links.

617	   Labeled LSPs have been and use of link layer encapsulation have been
618	   standardized in order to provide a means to meet these needs.

620	   The IP DSCP cannot be used for flow identification since RFC 4301
621	   Section 5.5 [RFC4301] requires Diffserv transparency, and in general
622	   network operators do not rely on the DSCP of Internet packets.

624	   A label is pushed onto Internet packets when they are carried along
625	   with L2/L3VPN packets on the same link or lower layer network
626	   provides a mean to distinguish between the QoS class for these
627	   packets.

629	   Operating an MPLS-TE network involves a different paradigm from
630	   operating an IGP metric-based LDP signaled MPLS network.  The mpt-pt
631	   LDP signaled MPLS LSPs occur automatically, and balancing across
632	   parallel links occurs if the IGP metrics are set "equally" (with
633	   equality a locally definable relation).

635	   Traffic is typically comprised of a few large (some very large) flows
636	   and many small flows.  In some cases, separate LSPs are established
637	   for very large flows.  This can occur even if the IP header
638	   information is inspected by a router, for example an IPsec tunnel
639	   that carries a large amount of traffic.  An important example of
640	   large flows is that of a L2/L3 VPN customer who has an access line
641	   bandwdith comparable to a client-client composite link bandwidth --
642	   there could be flows that are on the order of the access line
643	   bandwdith.

645	Appendix B.  Existing Multipath Standards and Techniques

647	   Today the requirement to handle large aggregations of traffic, much
648	   larger than a single component link, can be handled by a number of
649	   techniques which we will collectively call multipath.  Multipath
650	   applied to parallel links between the same set of nodes includes
651	   Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or
652	   other aggregation techniques some of which may be vendor specific.
653	   Multipath applied to diverse paths rather than parallel links
654	   includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or
655	   even BGP, and equal cost LSP, as described in Appendix B.4.  Various
656	   mutilpath techniques have strengths and weaknesses.

658	   The term composite link is more general than terms such as link
659	   aggregate which is generally considered to be specific to Ethernet
660	   and its use here is consistent with the broad definition in
661	   [ITU-T.G.800].  The term multipath excludes inverse multiplexing and
662	   refers to techniques which only solve the problem of large
663	   aggregations of traffic, without addressing the other requirements
664	   outlined in this document.

666	B.1.  Common Multpath Load Spliting Techniques

668	   Identical load balancing techniqes are used for multipath both over
669	   parallel links and over diverse paths.

671	   Large aggregates of IP traffic do not provide explicit signaling to
672	   indicate the expected traffic loads.  Large aggregates of MPLS
673	   traffic are carried in MPLS tunnels supported by MPLS LSP.  LSP which
674	   are signaled using RSVP-TE extensions do provide explicit signaling
675	   which includes the expected traffic load for the aggregate.  LSP
676	   which are signaled using LDP do not provide an expected traffic load.

678	   MPLS LSP may contain other MPLS LSP arranged hierarchically.  When an
679	   MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
680	   payload, there is no signaling associated with these inner LSP.
681	   Therefore even when using RSVP-TE signaling there may be insufficient
682	   information provided by signaling to adequately distribute load
683	   across a composite link.

685	   Generally a set of label stack entries that is unique across the
686	   ordered set of label numbers can safely be assumed to contain a group
687	   of flows.  The reordering of traffic can therefore be considered to
688	   be acceptable unless reordering occurs within traffic containing a
689	   common unique set of label stack entries.  Existing load splitting
690	   techniques take advantage of this property in addition to looking
691	   beyond the bottom of the label stack and determining if the payload
692	   is IPv4 or IPv6 to load balance traffic accordingly.

694	   MPLS-TP OAM violates the assumption that it is safe to reorder
695	   traffic within an LSP.  If MPLS-TP OAM is to be accommodated, then
696	   existing multipth techniques must be modified.  Such modifications
697	   are outside the scope of this document.

699	   For example a large aggregate of IP traffic may be subdivided into a
700	   large number of groups of flows using a hash on the IP source and
701	   destination addresses.  This is as described in [RFC2475] and
702	   clarified in [RFC3260].  For MPLS traffic carrying IP, a similar hash
703	   can be performed on the set of labels in the label stack.  These
704	   techniques are both examples of means to subdivide traffic into
705	   groups of flows for the purpose of load balancing traffic across
706	   aggregated link capacity.  The means of identifying a flow should not
707	   be confused with the definition of a flow.

709	   Discussion of whether a hash based approach provides a sufficiently
710	   even load balance using any particular hashing algorithm or method of
711	   distributing traffic across a set of component links is outside of
712	   the scope of this document.

714	   The current load balancing techniques are referenced in [RFC4385] and
715	   [RFC4928].  The use of three hash based approaches are described in
716	   [RFC2991] and [RFC2992].  A mechanism to identify flows within PW is
717	   described in [I-D.ietf-pwe3-fat-pw].  The use of hash based
718	   approaches is mentioned as an example of an existing set of
719	   techniques to distribute traffic over a set of component links.
720	   Other techniques are not precluded.

722	B.2.  Simple and Adaptive Load Balancing Multipath

724	   Simple multipath generally relies on the mathematical probability
725	   that given a very large number of small microflows, these microflows
726	   will tend to be distributed evenly across a hash space.  A common
727	   simple multipath implementation assumes that all members (component
728	   links) are of equal capacity and perform a modulo operation across
729	   the hashed value.  An alternate simple multipath technique uses a
730	   table generally with a power of two size, and distributes the table
731	   entries proportionally among members according to the capacity of
732	   each member.

734	   Simple load balancing works well if there are a very large number of
735	   small microflows (i.e., microflow rate is much less than component
736	   link capacity).  However, the case where there are even a few large
737	   microflows is not handled well by simple load balancing.

739	   An adaptive multipath technique is one where the traffic bound to
740	   each member (component link) is measured and the load split is
741	   adjusted accordingly.  As long as the adjustment is done within a
742	   single network element, then no protocol extensions are required and
743	   there are no interoperability issues.

745	   Note that if the load balancing algorithm and/or its parameters is
746	   adjusted, then packets in some flows may be delivered out of
747	   sequence.

749	B.3.  Traffic Split over Parallel Links

751	   The load spliting techniques defined in Appendix B.1 and Appendix B.2
752	   are both used in splitting traffic over parallel links between the
753	   same pair of nodes.  The best known technique, though far from being
754	   the first, is Ethernet Link Aggregation [IEEE-802.1AX].  This same
755	   technique had been applied much earlier using OSPF or ISIS Equal Cost
756	   MultiPath (ECMP) over parallel links between the same nodes.

758	   Multilink PPP [RFC1717] uses a technique that provides inverse
759	   multiplexing, however a number of vendors had provided proprietary
760	   extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet
761	   Link Aggregation but are no longer used.

763	   Link bundling [RFC4201] provides yet another means of handling
764	   parallel LSP.  RFC4201 explicitly allow a special value of all ones
765	   to indicate a split across all members of the bundle.

767	B.4.  Traffic Split over Multiple Paths

769	   OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of
770	   traffic split over multiple paths that may traverse intermediate
771	   nodes.  ECMP is often incorrectly equated to only this case, and
772	   multipath over multiple diverse paths is often incorrectly equated to
773	   ECMP.

775	   Many implementations are able to create more than one LSP between a
776	   pair of nodes, where these LSP are routed diversely to better make
777	   use of available capacity.  The load on these LSP can be distributed
778	   proportionally to the reserved bandwidth of the LSP.  These multiple
779	   LSP may be advertised as a single PSC FA and any LSP making use of
780	   the FA may be split over these multiple LSP.

782	   Link bundling [RFC4201] component links may themselves be LSP.  When
783	   this technique is used, any LSP which specifies the link bundle may
784	   be split across the multiple paths of the LSP that comprise the
785	   bundle.

787	Appendix C.  ITU-T G.800 Composite Link Definitions and Terminology

789	   Composite Link:
790	       Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link
791	       in terms of three cases, of which the following two are relevant
792	       (the one describing inverse (TDM) multiplexing does not apply).
793	       Note that these case definitions are taken verbatim from section
794	       6.9, "Layer Relationships".

796	       Case 1:  "Multiple parallel links between the same subnetworks
797	           can be bundled together into a single composite link.  Each
798	           component of the composite link is independent in the sense
799	           that each component link is supported by a separate server
800	           layer trail.  The composite link conveys communication
801	           information using different server layer trails thus the
802	           sequence of symbols crossing this link may not be preserved.
803	           This is illustrated in Figure 14."

805	       Case 3:  "A link can also be constructed by a concatenation of
806	           component links and configured channel forwarding
807	           relationships.  The forwarding relationships must have a 1:1
808	           correspondence to the link connections that will be provided
809	           by the client link.  In this case, it is not possible to
810	           fully infer the status of the link by observing the server
811	           layer trails visible at the ends of the link.  This is
812	           illustrated in Figure 16."

814	   Subnetwork:  A set of one or more nodes (i.e., LER or LSR) and links.
815	       As a special case it can represent a site comprised of multiple
816	       nodes.

818	   Forwarding Relationship:  Configured forwarding between ports on a
819	       subnetwork.  It may be connectionless (e.g., IP, not considered
820	       in this draft), or connection oriented (e.g., MPLS signaled or
821	       configured).

823	   Component Link:  A topolological relationship between subnetworks
824	       (i.e., a connection between nodes), which may be a wavelength,
825	       circuit, virtual circuit or an MPLS LSP.

827	Authors' Addresses

829	   Curtis Villamizar (editor)
830	   Infinera Corporation
831	   169 W. Java Drive
832	   Sunnyvale, CA  94089

834	   Email: cvillamizar@infinera.com

836	   Dave McDysan (editor)
837	   Verizon
838	   22001 Loudoun County PKWY
839	   Ashburn, VA  20147

841	   Email: dave.mcdysan@verizon.com
842	   So Ning
843	   Verizon
844	   2400 N. Glenville Ave.
845	   Richardson, TX  75082

847	   Phone: +1 972-729-7905
848	   Email: ning.so@verizonbusiness.com

850	   Andrew Malis
851	   Verizon
852	   117 West St.
853	   Waltham, MA  02451

855	   Phone: +1 781-466-2362
856	   Email: andrew.g.malis@verizon.com

858	   Lucy Yong
859	   Huawei USA
860	   1700 Alma Dr. Suite 500
861	   Plano, TX  75075

863	   Phone: +1 469-229-5387
864	   Email: lucyyong@huawei.com