idnits 2.17.1 

draft-amante-oam-ng-requirements-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 20.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 926.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 937.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 944.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 950.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 5 instances of too long lines in the document, the longest one
     being 2 characters in excess of 72.

  == There are 12 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 18, 2008) is 5906 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC792' is mentioned on line 221, but not defined

  == Missing Reference: 'ICMP' is mentioned on line 663, but not defined

  == Missing Reference: 'RFC 792' is mentioned on line 886, but not defined

  == Unused Reference: 'BFD-BASE' is defined on line 864, but no explicit
     reference was found in the text

  == Unused Reference: 'LLDP' is defined on line 868, but no explicit
     reference was found in the text

  == Unused Reference: 'LMP' is defined on line 870, but no explicit
     reference was found in the text

  == Unused Reference: 'RSVP-DIAG' is defined on line 876, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-03) exists of
     draft-ietf-mpls-remote-lsp-ping-01


     Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          S. Amante
3	Internet-Draft                               Level 3 Communications, LLC
4	Intended status: Informational                                  A. Atlas
5	Expires: August 21, 2008                                              BT
6	                                                                A. Lange
7	                                                          Alcatel-Lucent
8	                                                            D. McPherson
9	                                                    Arbor Networks, Inc.
10	                                                       February 18, 2008

12	        Operations and Maintenance Next Generation Requirements
13	                  draft-amante-oam-ng-requirements-01

15	Status of this Memo

17	   By submitting this Internet-Draft, each author represents that any
18	   applicable patent or other IPR claims of which he or she is aware
19	   have been or will be disclosed, and any of which he or she becomes
20	   aware will be disclosed, in accordance with Section 6 of BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF), its areas, and its working groups.  Note that
24	   other groups may also distribute working documents as Internet-
25	   Drafts.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   The list of current Internet-Drafts can be accessed at
33	   http://www.ietf.org/ietf/1id-abstracts.txt.

35	   The list of Internet-Draft Shadow Directories can be accessed at
36	   http://www.ietf.org/shadow.html.

38	   This Internet-Draft will expire on August 21, 2008.

40	Copyright Notice

42	   Copyright (C) The IETF Trust (2008).

44	Abstract

46	   Current IP and MPLS OAM techniques need to be extended to permit
47	   operators to effectively diagnose load-balancing issues.
48	   Specifically, new ad-hoc OAM techniques are needed to diganose
49	   various link-bundling techniques, such as IP/MPLS Equal Cost Multi-
50	   Path (ECMP) and Link Aggregation Groups (LAG).  In addition, these
51	   OAM tools should also be extended to permit performance monitoring
52	   over longer time durations.  This document defines requirements for
53	   the next generation of OAM solutions.

55	Requirements Language

57	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
58	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
59	   document are to be interpreted as described in RFC 2119 [RFC2119].

61	Table of Contents

63	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
64	     1.1.  Contributors . . . . . . . . . . . . . . . . . . . . . . .  4
65	   2.  Background . . . . . . . . . . . . . . . . . . . . . . . . . .  4
66	   3.  Use Cases  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
67	     3.1.  Types of Exercise Mechanisms . . . . . . . . . . . . . . .  5
68	     3.2.  Scenario 1: Traceroute through Routed Hops . . . . . . . .  5
69	     3.3.  Scenario 2: Traceroute through One Switched Hop  . . . . .  6
70	     3.4.  Scenario 3: Traceroute through Two, or More, Switched
71	           Hops . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
72	     3.5.  ECMP . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
73	     3.6.  Proxy Traceroute/Ping Functionality  . . . . . . . . . . . 10
74	   4.  Performance Monitoring . . . . . . . . . . . . . . . . . . . . 11
75	     4.1.  Proactive Network Monitoring and Verification  . . . . . . 11
76	       4.1.1.  Proactive Periodic Network Monitoring and
77	               Verification . . . . . . . . . . . . . . . . . . . . . 12
78	       4.1.2.  Proactive Perpetual Network Monitoring and
79	               Verification . . . . . . . . . . . . . . . . . . . . . 12
80	     4.2.  Network Performance Monitoring . . . . . . . . . . . . . . 13
81	   5.  Other Requirements . . . . . . . . . . . . . . . . . . . . . . 13
82	     5.1.  Intra-AS Requirements  . . . . . . . . . . . . . . . . . . 13
83	     5.2.  Inter-AS Requirements  . . . . . . . . . . . . . . . . . . 16
84	     5.3.  MTU considerations . . . . . . . . . . . . . . . . . . . . 17
85	     5.4.  Extensibility  . . . . . . . . . . . . . . . . . . . . . . 18
86	     5.5.  Path Capabilities  . . . . . . . . . . . . . . . . . . . . 18
87	     5.6.  Per Hop Behavior Modification  . . . . . . . . . . . . . . 19
88	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
89	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
90	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20
91	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
92	     9.1.  Informative References . . . . . . . . . . . . . . . . . . 20
93	     9.2.  Normative References . . . . . . . . . . . . . . . . . . . 20
94	     9.3.  References . . . . . . . . . . . . . . . . . . . . . . . . 20
95	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
96	   Intellectual Property and Copyright Statements . . . . . . . . . . 22

98	1.  Introduction

100	   Current networks make extensive use of multiple network paths to
101	   create larger virtual links between network elements, in particular
102	   when a single physical-layer link has exceeded its carrying capacity
103	   and no larger bandwidth physical layer technologies exist.  Operators
104	   use various link bundling techniques, such as Link Aggregation Groups
105	   (LAGs) and IP and MPLS Equal Cost Multi-Path (ECMP), to augment the
106	   capacity between network elements when physical link-layer capacity
107	   is exhausted.  Existing troubleshooting tools, based on 'legacy' ping
108	   and traceroute, are insufficient to effectively examine the
109	   underlying component-links that traffic will use.

111	   In addition, as more of the world's traffic converges around IP and
112	   MPLS based networks, service providers need to extract temporally
113	   aware traffic performance information.

115	   This draft is NOT intended to address transport MPLS capabilities.
116	   Transport-oriented requirements would be complimentary to the
117	   requirements presented here.

119	1.1.  Contributors

121	   The following made vital contributions to this document:

123	   Rajeev Manur, Force10 Networks, Inc.

125	2.  Background

127	   The use of Link Aggregate Groups (LAG's), Equal Cost Multi-Path
128	   (ECMP) or a combination of ECMP over LAG's is a common technique used
129	   to bond multiple parallel circuits or paths together to achieve the
130	   appearance of a larger aggregate link between two nodes.  The
131	   advantage of these techniques, in particular LAG's, is a reduced
132	   number of routing and signaling protocol adjacencies between devices,
133	   reducing control plane processing overhead.  A disadvantage of these
134	   techniques is an inability to determine the individual component-link
135	   used for traffic forwarding inside a LAG or ECMP path, specifically
136	   for a given microflow, between two devices using traditional
137	   traceroute or ping utilities.

139	   A key problem related to LAG or ECMP paths is, due to inefficiencies
140	   in LAG or ECMP load-distribution algorithms, a particular component-
141	   link may experience congestion or a soft-failure, which would go
142	   unnoticed by NMS systems and, likely, IP/MPLS Control Plane
143	   protocols.  The end result is performance degradation of a subset of
144	   end-user microflows that use the affected component-links between two
145	   adjacent devices.

147	   What is needed by operators are the following.  First, and the most
148	   immediate need, is a capability to determine the set of component-
149	   links used by individual network elements through which traceroute or
150	   ping messages are traversing.  Second, a capability to specify an
151	   end-user's microflow, e.g.: a 5-tuple "flow" in the case of IP
152	   traffic, that will be used by intermediate devices to calculate the
153	   component-link or ECMP path used for that flow to allow periodic or
154	   perpetual performance monitoring.  Ultimately, these capabilities are
155	   necessary to both determine and exercise the actual path that is/was
156	   used by an end-user's particular application through the network.

158	3.  Use Cases

160	3.1.  Types of Exercise Mechanisms

162	   This memo classifies two types of ping and traceroute requests that
163	   are needed in modern networks where many inter-node links consist of
164	   LAG, ECMP or LAG over ECMP paths.  First, a "traditional" or "legacy"
165	   traceroute and ping request where intermediate devices only
166	   understand how to use outer IP header information as the input to a
167	   LAG or ECMP hashing algorithm.  This type of mechanism has limited
168	   utility insomuch as existing devices, interior to a Service
169	   Provider's network, only understand how to process limited
170	   information in traceroute or ping requests.  Note that when operators
171	   originate traceroute and/or ping sessions from within their network,
172	   requests are sourced from devices, often routers, whose interfaces
173	   reside within their network.

175	   On the other hand, a "next-generation" traceroute and ping request
176	   where intermediate devices understand new information likely
177	   contained in the payload of the traceroute and ping request, which
178	   can then be fed as input to the LAG or ECMP hashing algorithm.  This
179	   would allow operators to, for example, specify the exact "tuple" used
180	   by customer traffic in order to properly exercise the LAG or ECMP
181	   paths used by a particular customer 'flow' through the network.

183	3.2.  Scenario 1: Traceroute through Routed Hops
184	      I1: 10.1.1.1/30                I3: 10.5.1.1/30
185	   +------+                       +------+                      +------+
186	   |      |-- A1 ----------- A2 --|      |-- D1 ---------- D2 --|      |
187	   |  R1  |-- B1 -- LAG-1 -- B2 --|  R2  |         LAG-2        |  R3  |
188	   |      |-- C1 ----------- C2 --|      |-- E1 ---------- E2 --|      |
189	   +------+                       +------+                      +------+
190	                   10.1.1.2/30: I2                    10.5.1.2/30: I4

192	   Note on figures: Figures 1 through 3 represent a piece of a network
193	   for illustrative purposes.  In a real network, other nodes will be
194	   present.

196	                 Figure 1: Traceroute through Routed Hops

198	   In the above example, the links A1-A2, B1-B2 and C1-C2 are grouped
199	   into a single LAG, called LAG-1, between nodes R1 and R2.
200	   Furthermore, D1-D2 and E1-E2 are grouped into a single LAG, called
201	   LAG-2, between nodes R2 and R3.  I1 represents the IPv4 address
202	   10.1.1.1/30 assigned to the LAG-1 interface on R1.  I2 represents the
203	   IPv4 address 10.1.1.2/30 assigned to the LAG-1 interface on R2.  I3
204	   and I4 are the IP interfaces assigned to R2 and R3, respectively, on
205	   LAG-2.  R1 and R2 will maintain a single set of routing and signaling
206	   protocol (e.g.: IS-IS, RSVP and/or LDP), adjacencies over LAG-1,
207	   while R2 and R3 will maintain a single set of routing and signaling
208	   protocol adjacencies over LAG-2.  Assuming the individual component
209	   link sizes between R1, R2 and R3 are 10 Gbps, the end result is that
210	   R1 and R2 believe they have a single 30 Gbps connection between them
211	   and R2 and R3 believe they have a 20 Gbps connection between them.

213	   When performing a traceroute from R1 through R2 to R3, each router
214	   independently and automatically determines, through a proprietary LAG
215	   or ECMP load-distribution algorithm, the outgoing component-link
216	   inside a LAG or ECMP path to send out traceroute UDP probe packets.
217	   Unfortunately, the details of the specific component-links are not
218	   exposed to a user interface, which would allow operators to determine
219	   the exact physical path used by traceroute.  Furthermore, those
220	   details cannot also be used as input to a 'ping' utility, (using ICMP
221	   echo-request and echo-reply messages [RFC792]), to test longer term
222	   performance of a specific physical path through the network.  The end
223	   result is a network operator may believe that a given path between
224	   devices is behaving properly when, in fact, end-user traffic is
225	   traversing a different set of component-links and experiencing
226	   congestion or other link-layer forwarding problems.

228	3.3.  Scenario 2: Traceroute through One Switched Hop
229	       I1: 10.1.1.1/30
230	   +------+                       +-------+                       +------+
231	   |      |-- A1 ----------- A2 --|       |-- C1 ----------- C2 --|      |
232	   |  R1  |         LAG-1         |  SW1  |-- D1 -- LAG-2 -- D2 --|  R2  |
233	   |      |-- B1 ----------- B2 --|       |-- E1 ----------- E2 --|      |
234	   +------+                       +-------+                       +------+
235	                                                        10.1.1.2/30: I2

237	               Figure 2: Traceroute through One Switched Hop

239	   In this scenario, links A1-A2 and B1-B2 are grouped into a single 20
240	   Gbps LAG, called LAG-1, between nodes R1 and SW1.  Furthermore, links
241	   C1-C2, D1-D2 and E1-E2 are also joined together into a single 30 Gbps
242	   LAG, called LAG-2, between nodes SW1 and R2.  I1 represents the IPv4
243	   address 10.1.1.1/30 assigned to the LAG-1 interface on R1.  I2
244	   represents the IPv4 address 10.1.1.2/30 assigned to the LAG-2
245	   interface on R2.  As in Scenario 1, R1 and R2 will maintain a single
246	   set of IP/MPLS routing and signaling protocol adjacencies over the
247	   LAG's through SW1.

249	   As in scenario 1, each device along the path R1 to SW1 to R2, (or
250	   vice-versa), automatically and independently determines the outgoing
251	   component-link inside a LAG or ECMP "bundle" to send out traceroute
252	   UDP probe packets.  Unfortunately, in this scenario if only the
253	   incoming component-link interface ID is displayed to an end-user or
254	   network operator, that will not reveal the entire physical path
255	   traversed from R1 through SW1 to R2.  This scenario highlights the
256	   need to also show both the outgoing component-link interface ID on R1
257	   and the incoming component-link interface ID on R2.  With both of
258	   those pieces of information, and a priori knowledge that there is
259	   only one Layer-2 switch between R1 and R3, an operator can rely on a
260	   "legacy" traceroute implementation to determine the actual component-
261	   links that were used in a traceroute request.

263	   If the operator does not have a priori knowledge that there is a
264	   Layer-2 switch between R1 and R2, it would be useful for R1 and R2 to
265	   include relevant Layer-2 information, learned from a Link-Layer
266	   Discovery Protocol, on both R1 and R3 in the traceroute reply.  In
267	   this example, R1 would reply with its own outgoing component-link
268	   name, SW1's hostname and SW1's incoming component-link name.
269	   Furthermore, when R2 sends a traceroute reply it would respond with
270	   its own incoming component-link name, SW1's hostname and SW1's
271	   outgoing component-link name.  This would immediately point out to an
272	   operator the presence of one, or more, Layer-2 switches in the middle
273	   of a Layer-3 path.  Ultimately, without specific component-link
274	   'neighbor' information, such as from a Link-Layer Discovery Protocol,
275	   it will be difficult to rapidly determine the presence or absence of
276	   Layer-2 switches in the interior of a Layer-3 path.

278	   It's also important to point out in this particular scenario that, at
279	   best, SW1 only understands how to parse information in the outer IP
280	   header of a legacy traceroute UDP probe, or other data packets, for
281	   input into its LAG hash algorithm, which ultimately determines the
282	   outgoing component-link it will use to send packets to R2.  It would
283	   be highly desirable that SW1 was able to intercept and act upon data
284	   fields contained in "next-generation" traceroute and/or ping probe
285	   packets, so that operators could specify the actual 5-tuple "flow" to
286	   be input into SW1's LAG hash algorithm in order to exercise a
287	   specific component-link on SW1 outbound toward R3.  If this approach
288	   is not used it would likely prevent operators from periodically or
289	   continuously exercising a specific set of component-links through a
290	   given edge-to-edge path on the network, such as through a proactive
291	   network monitoring system, as discussed in Section 4.1 of this
292	   document.

294	3.4.  Scenario 3: Traceroute through Two, or More, Switched Hops

296	         I1: 10.1.1.1/30
297	     +----+             +-----+             +-----+               +----+
298	     |    |-A1-------A2-|     |-C1-------C2-|     |-E1---------E2-|    |
299	     | R1 |    LAG-1    | SW1 |    LAG-2    | SW2 |-F1- LAG-3 -F2-| R2 |
300	     |    |-B1-------B2-|     |-D1-------D2-|     |-G1---------G2-|    |
301	     +----+             +-----+             +-----+               +----+
302	                                                     10.1.1.2/30: I2

304	          Figure 3: Traceroute through Two, or More Switched Hops

306	   In this case, two Layer-2 switches are inserted in the path between
307	   Layer-3 nodes R1 and R2.  LAG-1 and LAG-2 are each grouped together
308	   into their own 20 Gbps LAG.  Furthermore, LAG-3, between nodes SW2
309	   and R2, is joined together as a single 30 Gbps LAG.  Finally, I1
310	   represents the IPv4 address 10.1.1.1/30 assigned to the LAG-1
311	   interface on R1; in addition, I2 denotes the IPv4 address 10.1.1.2/30
312	   assigned to the LAG-2 interface on R2.

314	   This scenario is common in Enterprise or DataCenter environments
315	   where R1 may be a router or server, SW1 a top-of-rack distribution
316	   switch, SW2 an aggregation switch and, finally, R2, which is a
317	   Layer-3 router typically providing WAN connectivity.

319	   This particular case further highlights the need to automatically
320	   learn the presence of Layer-2 switches and, ideally, allow one to
321	   automatically exercise their LAG hash algorithms to fully qualify the
322	   exact set of component-links taken between two Layer-3 devices.  In
323	   order to learn the presence of Layer-2 switches, it will be necessary
324	   for traceroute replies to also include relevant Layer-2 information,
325	   such as the next-hop device's hostname and incoming component-link
326	   name, from a Link-Layer Discovery Protocol.  In the case of "legacy"
327	   traceroute, R1 would reply with its outgoing component-link name,
328	   plus two pieces of information learned from a Link-Layer Discovery
329	   Protocol: SW1's hostname and SW1's incoming component-link name.
330	   Furthermore, when the next traceroute UDP probe is sent to R2, it
331	   will reply with it's incoming component-link name, SW2's hostname and
332	   SW2's outgoing component-link name.  Unfortunately, this only yields
333	   a partial solution, because it would not reveal the actual component-
334	   link used between SW1 and SW2, nor the presence of a third Layer-2
335	   switch between SW1 and SW2.  In this instance, an operator would want
336	   to use Layer-2 OAM tools in an attempt to identify and diagnose the
337	   particular component-link that is used between SW1 and SW2.
338	   Unfortunately, Layer-2 OAM tools do not have the ability to identify
339	   or troubleshoot component-links in a 802.3ad LAG.  In addition, it is
340	   time consuming for operators to stop using Layer-2.5 (such as LSP-
341	   Ping or LSP-Trace) or Layer-3 ping/traceroute tools, login to R1 and
342	   R2 and use Layer-2 OAM tools to resume diagnosing the problem.
343	   Furthermore, due to the lack of an integrated toolset, it prevents
344	   operators from using an NMS to continuously monitor component-links
345	   on paths that go over one or more Layer-2 switches.

347	   Instead, what is needed by operators is integrated Layer-2 and
348	   Layer-3 ping/traceroute tools, which allow for rapid and accurate
349	   diagnosis and troubleshooting of LAG/ECMP problems.  Ultimately, if
350	   Layer-2 switches can intercept and act upon "next-generation"
351	   traceroute and ping requests, that would enable operators to specify
352	   the actual 5-tuple "flow" to be input into each Layer-2 switches' LAG
353	   hash algorithm.  This would allow operators to periodically or
354	   continuously exercise a specific set of component-links over all
355	   Layer-2 and Layer-3 devices, all at the same time, along a complete
356	   edge-to-edge path through the network, as discussed in Section 4.1 of
357	   this document.

359	   It should be noted that the above presumes intermediate Layer-2
360	   switches are capable of intercepting and acting upon NG-OAM probe-
361	   requests, which may not be true initially in all environments.
362	   Therefore, this document requires all NG-OAM solutions to document
363	   how they will determine if intermediate Layer-2 switches are NG-OAM
364	   capable and communicating that back to the initiator of an NG-OAM
365	   request, in order that operators can tell if the complete path was
366	   properly exercised.

368	3.5.  ECMP

370	   TBD

372	3.6.  Proxy Traceroute/Ping Functionality

374	   To enable more rapid troubleshooting and diagnosis of problems
375	   related to LAG, ECMP and/or asymmetric paths in a large-scale
376	   network, it is useful to use "proxy" routers/hosts within a network
377	   that can initiate a traceroute or ping on behalf of a Network
378	   Monitoring System (NMS), such as via [PROXY-LSP-PING].  This is
379	   particularly valuable in the following scenarios:

381	   o  When troubleshooting problems related to asymmetric paths, it is
382	      useful to perform a traceroute and/or ping from a source to the
383	      destination as well as from the destination back to the source.

385	   o  Some IP/MPLS routers use 'input interface' as input into the LAG
386	      and/or ECMP hashing algorithm; therefore, quickly exercising the
387	      associated direction of a particular flow through the network is
388	      required.

390	   o  When narrowing a problem down to specific sequence of links within
391	      the network, it is useful to rapidly focus additional testing on
392	      suspicious segments, which are a subset of an overall edge-to-edge
393	      path.

395	   o  Periodic monitoring of a large-scale network composed of a
396	      multitude of LAG and/or ECMP paths.  In order to divide up the
397	      periodic testing of a large set of component-links and paths while
398	      simultaneously providing timely results, it is useful to
399	      distribute testing out to the IP/MPLS routers in the network on or
400	      near the paths to be tested.  (See Section 3.6 for more details).

402	   In this scenario, there are three types of devices:

404	   Initiator: The node which creates a proxy traceroute/ping request
405	   with: 1) a "5-tuple" to be used as input to a LAG and/or ECMP hashing
406	   algorithm; 2) the IP address of the Proxy IP/MPLS router that will
407	   initiate the ping/traceroute on behalf of the Initiator; and, 3) the
408	   IP address of the destination IP/MPLS router/host that will terminate
409	   this ping/traceroute request.

411	   Proxy IP/MPLS Router: The node which receives a proxy traceroute/ping
412	   request from an Initiator.  Once it has interpreted the proxy
413	   request, it initiates a proxy ping/traceroute request from itself
414	   toward the destination IP/MPLS router specified in the proxy ping/
415	   traceroute request.

417	   Proxy Request Terminator: The node(s) which terminate a proxy
418	   traceroute/ping request received from the Proxy IP/MPLS Router.  In
419	   the case of a proxy traceroute, intermediate nodes along the path to
420	   the final destination of proxy traceroute are considered
421	   "Intermediate Proxy Request Terminators".

423	   A NG-OAM solution MUST support Proxy Traceroute/Ping Functionality.
424	   A NG-OAM solution MUST support replies from the Proxy Request
425	   Terminator (or Intermediate Proxy Request Terminators) being sent
426	   back to the Proxy IP/MPLS Router, before they are relayed back to the
427	   Initiator.  The advantage of this approach is that replies should
428	   follow a symmetrical path back to the Initiator, which is useful if
429	   the NMS is behind a stateful firewall.  On the other hand, an NG-OAM
430	   solution MAY support replies from the Proxy Request Terminator (or,
431	   Intermediate Proxy Request Terminators) directly back to the
432	   Initiator.  The advantage of this scheme is that it does not rely on
433	   the Proxy IP/MPLS Router to cache or relay/reformat Proxy Reply
434	   Information, before replying back to the Initiator.  This may be
435	   useful in situations where it's desirable to reduce the load on the
436	   Proxy IP/MPLS Router.

438	4.  Performance Monitoring

440	4.1.  Proactive Network Monitoring and Verification

442	   There are two forms of Proactive Network Monitoring and Verification
443	   (PNMV): Perpetual and Periodic.  In a Perpetual PNMV case, the nodes
444	   performing monitoring send OAM messages at a specific interval, and
445	   record the results on a perpetual basis.  In the Periodic case, the
446	   messages are sent only on demand of an external system, such as an
447	   NMS, or an operator's command.  These forms can be implementation
448	   cases of the same solution.

450	   Today's solutions, such as ping, traceroute, and simulated user
451	   traffic between management nodes, can address the case when there is
452	   a single path between two endpoints.  However, in large national and
453	   international networks, there will exist several routed hops for
454	   certain paths through the network.  Furthermore, between each pair of
455	   IP/MPLS routers there will exist LAG's and/or ECMP paths.
456	   Unfortunately at present, Network Monitoring Systems (NMS) are unable
457	   to exercise the set of component-links through specific paths on the
458	   network.  This would allow the NMS to identify and notify a Network
459	   Operations Center (NOC) to a soft-failure through one or more
460	   component-links on the network.  The NOC could then proactively
461	   respond to the problem by, for example, quickly taking the affected
462	   component-link(s) out-of-service or, alternatively, administratively
463	   disabling the link bundle or ECMP path and allowing traffic to switch
464	   to another in-service path.

466	   The challenge with monitoring a large set of LAG and/or ECMP paths in
467	   a network will be to find the right balance between monitoring all
468	   component-links in the network, minimizing the resource utilization
469	   (e.g.: CPU, memory, network I/O) on the NMS system(s) while
470	   simultaneously having a timely detection interval to allow for
471	   proactive notification of problems to the NOC.  Therefore, a solution
472	   must be devised that allows an NMS to transmit multiple independent,
473	   concurrent LAG and/or ECMP path test queries into various points in
474	   the network.  Within the network, Proxy IP/MPLS Routers will carry
475	   out the test queries and report back the test results to the NMS.

477	   A NG-OAM solution SHOULD support the ability to do Proactive
478	   Perpetual Network Monitoring and Verification, again through the use
479	   of Proxy Traceroute/Ping Functionality described in Section 3.5.  It
480	   should be noted that Perpetual PNMV may be more resource intensive on
481	   devices, which is why that requirement is relaxed compared to
482	   Periodic PNMV.

484	4.1.1.  Proactive Periodic Network Monitoring and Verification

486	   Periodic network monitoring is often done in response to a suspected
487	   network event, or done as a sampled case of Perpetual network
488	   monitoring when Perpetual network monitoring cannot be scaled to the
489	   necessary level.  Probes sent Periodically are often sent with a
490	   shorter inter-message interval, and often request more information
491	   than a test that runs on a Perpetual basis.

493	   In order to perform periodic monitoring, the Initiator MUST send the
494	   Proxy IP/MPLS Router, the number and interval of the probe requests.
495	   For example, the Initiator may send the Proxy IP/MPLS Router a
496	   request to run 300 consecutive probes at an interval of 500 msec
497	   between probes.

499	4.1.2.  Proactive Perpetual Network Monitoring and Verification

501	   Perpetual network monitoring is done consistently among a subset of
502	   end points in the total network.  The subset, such as sample PoP
503	   router to sample PoP router, is selected to strike a balance between
504	   a good view of network performance and an unmaintainable set of
505	   messages.

507	   In order to perform perpetual monitoring, the selected monitoring and
508	   monitored nodes must run the test, such as NG-Ping, at a set interval
509	   and collect and store the resulting statistics.

511	   Network Performance Monitoring, as described in section 3.7, is as
512	   good example of the case where Perpetual PNMV is required.

514	   An NG-OAM solution MUST offer the ability to change monitoring timing
515	   intervals.  Values as low as 3.3 ms have been suggested, but are
516	   optional.  Values down to 100 ms SHOULD be supported.

518	4.2.  Network Performance Monitoring

520	   Network Performance Monitoring (PM, or NPM) is the art and science of
521	   recording temporally aware network performance characteristics.  A
522	   use case for the resulting statistics is for SLA verification, in
523	   addition to proactive maintenance.

525	   Relevant PM characteristics are typically loss, latency and jitter.
526	   A PM solution MUST index these characteristics to time intervals.
527	   Knowing that 100 packets were lost, but not knowing when is not
528	   particularly actionable.  The limits of existing tools and
529	   information often results in a NOC "clearing counters" then running a
530	   "fast ping" for an arbitrary length of time and hoping that the error
531	   occurs again.  Keeping all results of a Perpetual PNMV test is one
532	   possible solution, however this volume of information can be
533	   difficult to store or to sort through when a network event is
534	   occurring.  A NG-OAM solution SHOULD provide easy-to-read,
535	   temporally-aware, statistic that allows an operator to easily assess
536	   the magnitude of the problem.

538	   An example of this sort of statistic from the world of SONET/SDH
539	   transport is the errored second, and severely errored second.

541	   The level of granularity of PM statistics gathering SHOULD be
542	   configurable.

544	5.  Other Requirements

546	5.1.  Intra-AS Requirements

548	   The NG-OAM solution SHOULD use the same mechanism to address both the
549	   Intra-AS (this section) and Intra-AS (Section 5.2) requirements.  An
550	   operator MUST be able to run a traceroute from one domain and through
551	   another.  The amount of information this traceroute provides may
552	   differ depending on where the probe is originated, and what sort of
553	   authorization it possesses to access information in other domains.

555	   Intra-AS requirements are applicable within an Autonomous System
556	   (AS), where all IP/MPLS devices are expected to be under a single
557	   administrative authority.  Because devices are under a single
558	   administrative authority, copious diagnostic information that can be
559	   returned to the Initiator of a ping/traceroute request.  Ultimately,
560	   however, an NG-OAM solution MUST ensure that extensive Intra-AS
561	   diagnostic information is not leaked across the boundaries of the
562	   Autonomous System, since it would provide valuable network
563	   intelligence information.  In addition, it is desirable if
564	   lightweight authentication and/or encryption techniques can be used
565	   to secure both probe requests and replies, in order to limit the
566	   effects of resource exhaustion on network elements that are
567	   processing probe request/replies.

569	   The following is a brief summary of the minimal set of information
570	   that a NG-OAM solution is expected to address.  NG-OAM solutions MAY
571	   capture additional information through, for example, experimental or
572	   vendor-specific objects specified in the NG OAM probe-request.

574	   NG-OAM Probe Requests and Probe Replies MUST contain a "Query ID",
575	   generated by the Probe Initiator, that can be used to associate Probe
576	   Responses to Probe Requests.

578	   Next-Gen Traceroute

580	   o  MUST work for IP and MPLS

582	   o  MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
583	      Request

585	   o  MUST be able to specify whether the IPv4 packet is a first-
586	      fragment, or subsequent fragment, in order that intermediate
587	      devices can adjust their LAG/ECMP calculation appropriately.

589	   o  MUST be able to specify the MPLS label stack use to identify a
590	      "flow" across an MPLS-only portion of the network in a Probe
591	      Request.

593	   o  MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
594	      in a Probe Request.

596	   o  MUST be able to specify a combination of label stack and IP
597	      5-tuple, if both are used in the ECMP/LAG hash algorithm.

599	   o  MUST capture the following information in a Probe Reply:

601	      *  The specific components of Layer-2, (e.g.: Ethernet), header,
602	         MPLS label stack and/or IP 5-tuple, that were used in the ECMP/
603	         LAG hash algorithm at this hop

605	      *  Incoming Interface Name

607	      *  Outgoing Interface Name
608	      *  Number of component-links in a bundle

610	      *  Size (Bandwidth) of individual component-links in a bundle

612	      *  Percent bandwidth utilization on interface(s)

614	      *  Remote Link-Layer neighbor name and interface name

616	   o  SHOULD be able to, on request of the source, to provide recent
617	      performance history of the incoming or outgoing link(s)

619	   Next-Gen Ping

621	   o  MUST work for IP and MPLS

623	   o  MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
624	      Request

626	   o  MUST be able to specify the MPLS label stack use to identify a
627	      "flow" across an MPLS-only portion of the network in a Probe
628	      Request.

630	   o  MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
631	      in a Probe Request.

633	   o  MUST follow the regular data-plane path for forwarding within a
634	      network element

636	   o  MUST be able to test all links/paths concurrently, or serially,
637	      between two network elements when operators do not know a
638	      customer's "flow" information, which can be used as input to a LAG
639	      and/or ECMP hash calculation.

641	   Proxy Traceroute

643	   o  All of the requirements mentioned above for "Next-Gen Traceroute",
644	      plus:

646	   o  The Initiator MUST be able to specify the number of Probe
647	      Requests.

649	   o  The Initiator MAY also specify the interval between Probe
650	      Requests, which the Proxy IP/MPLS Router is responsible for
651	      carrying out on the Initiator's behalf.

653	   Proxy Ping
654	   o  All of the requirements mentioned above for "Next-Gen Ping", plus:

656	   o  The Initiator MUST be able to specify the number of Probe Requests
657	      and interval between Probe Requests, which the Proxy IP/MPLS
658	      Router is responsible for carrying out on the Initiator's behalf.

660	   Next-Gen OAM Traceroute/Ping Probe Replies MUST capture error
661	   conditions that were encountered during an unsuccessful Probe
662	   Request.  Those replies are expected to capture not only those
663	   conditions defined by classic [ICMP], (e.g: Destination Unreachable
664	   Type), but also new error conditions specific to NG-OAM solutions.
665	   In order to seamlessly accommodate future error conditions, NG-OAM
666	   solutions MUST use a TLV format for specifying error conditions in
667	   Probe Replies.

669	   Intra-AS probe requests (and probe replies) MUST be easily
670	   identifiable in the data plane, in order that routers acting on NG-
671	   traceroute or NG-ping requests (or replies) can rapidly drop them in
672	   order to avoid resource exhaustion.  NG-traceroute and NG-ping
673	   solutions MUST provide configurable methods to rate-limit the number
674	   of Intra-AS request (or reply) packets to prevent resource
675	   exhaustion.

677	5.2.  Inter-AS Requirements

679	   Inter-AS requirements are applicable across administrative domains,
680	   such as the Internet or, perhaps, several MPLS service providers
681	   delivering a single MPLS VPN solution.  Because devices are not under
682	   a single administrative authority, only a limited amount of
683	   diagnostic information must be returned to the Initiator of a ping/
684	   traceroute request.  This information is primarily useful in the
685	   context of helping the responsible party pinpoint the specific
686	   location of a problem.  For example, Customer A may be experiencing
687	   packet loss in Service Provider A's network for his Internet service.
688	   The link between Customer A and Service Provider A consists of a ECMP
689	   path between SP A's ASBR and Customer A's ASBR.  Customer A can
690	   perform a NG-traceroute through this ECMP path and provide the output
691	   of NG-traceroute to SP A's NOC in order to more rapidly identify the
692	   particular component-link, which is the causing a problem.  Other
693	   examples where this is useful are: over Internet (IPv4 or IPv6)
694	   peering/transit links and within DataCenters from servers through to
695	   the DataCenter provider's ASBR attached to several SP's, where MPLS
696	   is not used.

698	   Inter-AS probe requests (and probe replies) MUST be easily
699	   identifiable in the data plane, in order that routers acting on NG-
700	   traceroute or NG-ping requests (or replies) can rapidly drop them in
701	   order to avoid resource exhaustion.  NG-traceroute and NG-ping
702	   solutions MUST provide configurable methods to rate-limit the number
703	   of Inter-AS request (or reply) packets to prevent resource
704	   exhaustion.

706	   Next-Gen Traceroute

708	   o  MUST work for IP and MPLS

710	   o  MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
711	      Request

713	   o  MUST be able to specify the MPLS label stack use to identify a
714	      "flow" across an MPLS-only portion of the network in a Probe
715	      Request.

717	   o  MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
718	      in a Probe Request.

720	   o  MUST be able to specify a combination of label stack and IP
721	      5-tuple, if both are used in the ECMP/LAG hash algorithm.

723	   o  MUST capture the following information in a Probe Reply:

725	      *  Incoming Interface Name

727	      *  Outgoing Interface Name

729	   Next-Gen Ping

731	   o  MUST work for IP and MPLS

733	   o  MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
734	      Request

736	   o  MUST be able to specify the MPLS label stack use to identify a
737	      "flow" across an MPLS-only portion of the network in a Probe
738	      Request.

740	   o  MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
741	      in a Probe Request.

743	   Proxy Ping/Traceroute requirements are not applicable to Inter-AS
744	   scenarios, since the risk of resource starvation is too large.

746	5.3.  MTU considerations

748	   Traceroute probes need to be kept to minimal size.  Traceroute reply
749	   PDU's should be kept to 1500 Bytes in size in order to avoid the need
750	   for IP fragmentation.  It is a safe assumption that operators have a
751	   minimum of 1500 Bytes for IP MTU, and often significantly larger.

753	   Optionally, path MTU discovery may be used to determine a minimum
754	   MTU.  The MTU values MUST be configurable by the operator to adjust
755	   to unanticipated conditions.  A Traceroute reply packet MAY span
756	   multiple packets.

758	5.4.  Extensibility

760	   It would be useful to allow for the "next-generation" traceroute and
761	   ping protocols to contain TLV's, in order that they may be easily
762	   extended in the future to account for additional capabilities, which
763	   may be developed at a later point in time.

765	5.5.  Path Capabilities

767	   In order to be certain that NG-ping or NG-traceroute will be able to
768	   properly exercise component-links in a LAG and/or ECMP path through
769	   the network, it is necessary to determine if all devices along a
770	   specific path are capable of supporting the requisite protocols and
771	   replying with appropriate results back to the originator of the NG-
772	   ping or NG-traceroute request.  There are potentially two methods
773	   that can be employed to determine these capabilities: 1) path
774	   discovery; or, 2) encoding special/reserved codepoints into the
775	   packet header of NG-OAM request/reply packets.  With the first
776	   method, the originating host/router could use a path discovery
777	   function to determine the capabilities and properties of intermediate
778	   and/or terminating devices prior to actually using NG-ping or NG-
779	   traceroute to test the data path.  Once the originating host/router
780	   has learned the characteristics of intermediate and/or terminating
781	   devices, it could then originate a NG-ping/traceroute request using
782	   that information to exercise the actual data path.

784	   The second method is likely to encode the NG OAM packets with
785	   specific values in the packet header of NG-OAM request/reply packets,
786	   (for example, via new ICMP type/codes or MPLS label values).  In this
787	   approach, the originating host/router can simply launch a NG-ping/
788	   traceroute request allowing each intermediate and/or terminating
789	   device to independently determine if it's capable of supporting the
790	   NG-OAM request and, concurrently, exercising the component-links
791	   appropriate to the LAG and/or ECMP path.

793	   Although the latter approach has the potential disadvantage that it
794	   may be more difficult to support on some existing hardware, this
795	   document recognizes that it is the superior approach of the two
796	   choices.  If one depends on, for example, NG-traceroute to "discover"
797	   characteristics of a path before allowing one to ping, it creates a
798	   circular dependency.  Specifically, in the case where one is doing
799	   perpetual pings and the underlying path changes for legitimate
800	   reasons, the NG-OAM would have to discover the change to the path,
801	   trigger a new NG-traceroute and then resume perpetual pings along the
802	   new path.  Note that a change to the existing path could consist of
803	   any of the following: 1) a component-link in a LAG goes down, yet,
804	   the LAG itself remains operational, (e.g.: a 10x LAG goes to a 9x
805	   LAG), ultimately changing the result of LAG hashing algorithm; or, 2)
806	   the entire LAG and/or ECMP path goes down and data packets are routed
807	   along an alternate path.  Ultimately, if each NG-OAM packet is a
808	   self-contained, autonomous OAM unit, then each intermediate and/or
809	   terminating device will act on it appropriately.

811	   Therefore, this document specifies that a NG-OAM solution MUST
812	   support the second method, autonomous OAM units, outlined above.  NG-
813	   OAM solutions MAY support the first method, to provide short-term NG
814	   OAM coverage with existing hardware.

816	5.6.  Per Hop Behavior Modification

818	   Modification of per-hop behavior in order to support NG-OAM is
819	   acceptable, but not required of NG-OAM solutions.  This allows
820	   solutions where intermediate routers have to look at something new to
821	   determine if they are looking at an OAM packet, or to determine if
822	   they are they target or Proxy of a NG-OAM request.

824	6.  IANA Considerations

826	   This document makes no request of IANA.

828	   Note to RFC Editor: this section may be removed on publication as an
829	   RFC.

831	7.  Security Considerations

833	   Devices MUST rate-limit the amount traceroute and/or ping traffic
834	   they process to avoid DoS attacks.  Those rate-limits MUST be
835	   configurable to suit the appropriate environment in which they are
836	   deployed.  An attacker must not be allowed to force an inordinate
837	   amount of traceroute and/or ping traffic down a single physical
838	   component-link causing congestion.  Therefore, devices MUST rate-
839	   limit the amount of "external" traceroute and/or ping traffic through
840	   any specific component-link or set of component-links.  Note,
841	   implementations SHOULD provide exceptions that to allow a network
842	   operators Intra-Domain traceroute and/or ping traffic, particularly
843	   for performance monitoring, to get through without interference by
844	   rate-limiters.

846	   A lightweight authentication method SHOULD be provided by an NG-OAM
847	   solution.  This mechanism can be used to defend against DoS or
848	   insertion attacks from other systems spoofing NG-OAM information.
849	   This can also be used in a reply message to defend against a "SLA
850	   Violation" attack where a malicious system could make it appear as if
851	   an operator's network has violated the SLA, when, in fact, they have
852	   not.

854	8.  Acknowledgements

856	   The authors would like to thank Nitin Bahadur, Ping Pan, Nasser El-
857	   Aawar, Dimitri Papadimitriou for their reviews and thoughtful
858	   feedback.

860	9.  References

862	9.1.  Informative References

864	   [BFD-BASE]
865	              "draft-ietf-bfd-base-07.txt - Bidirectional Forwarding
866	              Detection", January 2008.

868	   [LLDP]     "IEEE Standard - 802.1AB-2005", May 2005.

870	   [LMP]      "RFC 4204 - Link Management Protocol", October 2005.

872	   [PROXY-LSP-PING]
873	              George Swallow and Vanson Lim, "Proxy LSP Ping,
874	              draft-ietf-mpls-remote-lsp-ping-01.txt", November 2007.

876	   [RSVP-DIAG]
877	              "RFC 2745 - RSVP Diagnostic Messages", January 2000.

879	9.2.  Normative References

881	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
882	              Requirement Levels", BCP 14, RFC 2119, March 1997.

884	9.3.  References

886	   [RFC 792]  "Internet Control Message Protocol", 2005.

888	Authors' Addresses

890	   Shane Amante
891	   Level 3 Communications, LLC
892	   1025 Eldorado Blvd
893	   Broomfield, CO  80021

895	   Email: shane.amante@level3.com

897	   Alia Atlas
898	   BT

900	   Email: alia.atlas@bt.com

902	   Andrew Lange
903	   Alcatel-Lucent

905	   Email: andrew.lange@alcatel-lucent.com

907	   Danny McPherson
908	   Arbor Networks, Inc.

910	   Email: danny@arbot.net

912	Full Copyright Statement

914	   Copyright (C) The IETF Trust (2008).

916	   This document is subject to the rights, licenses and restrictions
917	   contained in BCP 78, and except as set forth therein, the authors
918	   retain all their rights.

920	   This document and the information contained herein are provided on an
921	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
922	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
923	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
924	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
925	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
926	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

928	Intellectual Property

930	   The IETF takes no position regarding the validity or scope of any
931	   Intellectual Property Rights or other rights that might be claimed to
932	   pertain to the implementation or use of the technology described in
933	   this document or the extent to which any license under such rights
934	   might or might not be available; nor does it represent that it has
935	   made any independent effort to identify any such rights.  Information
936	   on the procedures with respect to rights in RFC documents can be
937	   found in BCP 78 and BCP 79.

939	   Copies of IPR disclosures made to the IETF Secretariat and any
940	   assurances of licenses to be made available, or the result of an
941	   attempt made to obtain a general license or permission for the use of
942	   such proprietary rights by implementers or users of this
943	   specification can be obtained from the IETF on-line IPR repository at
944	   http://www.ietf.org/ipr.

946	   The IETF invites any interested party to bring to its attention any
947	   copyrights, patents or patent applications, or other proprietary
948	   rights that may cover technology that may be required to implement
949	   this standard.  Please address the information to the IETF at
950	   ietf-ipr@ietf.org.

952	Acknowledgment

954	   Funding for the RFC Editor function is provided by the IETF
955	   Administrative Support Activity (IASA).