idnits 2.17.1 draft-amante-oam-ng-requirements-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 20. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 926. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 937. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 944. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 950. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 2 characters in excess of 72. == There are 12 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 18, 2008) is 5906 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC792' is mentioned on line 221, but not defined == Missing Reference: 'ICMP' is mentioned on line 663, but not defined == Missing Reference: 'RFC 792' is mentioned on line 886, but not defined == Unused Reference: 'BFD-BASE' is defined on line 864, but no explicit reference was found in the text == Unused Reference: 'LLDP' is defined on line 868, but no explicit reference was found in the text == Unused Reference: 'LMP' is defined on line 870, but no explicit reference was found in the text == Unused Reference: 'RSVP-DIAG' is defined on line 876, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-ietf-mpls-remote-lsp-ping-01 Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Amante 3 Internet-Draft Level 3 Communications, LLC 4 Intended status: Informational A. Atlas 5 Expires: August 21, 2008 BT 6 A. Lange 7 Alcatel-Lucent 8 D. McPherson 9 Arbor Networks, Inc. 10 February 18, 2008 12 Operations and Maintenance Next Generation Requirements 13 draft-amante-oam-ng-requirements-01 15 Status of this Memo 17 By submitting this Internet-Draft, each author represents that any 18 applicable patent or other IPR claims of which he or she is aware 19 have been or will be disclosed, and any of which he or she becomes 20 aware will be disclosed, in accordance with Section 6 of BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on August 21, 2008. 40 Copyright Notice 42 Copyright (C) The IETF Trust (2008). 44 Abstract 46 Current IP and MPLS OAM techniques need to be extended to permit 47 operators to effectively diagnose load-balancing issues. 48 Specifically, new ad-hoc OAM techniques are needed to diganose 49 various link-bundling techniques, such as IP/MPLS Equal Cost Multi- 50 Path (ECMP) and Link Aggregation Groups (LAG). In addition, these 51 OAM tools should also be extended to permit performance monitoring 52 over longer time durations. This document defines requirements for 53 the next generation of OAM solutions. 55 Requirements Language 57 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 58 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 59 document are to be interpreted as described in RFC 2119 [RFC2119]. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.1. Contributors . . . . . . . . . . . . . . . . . . . . . . . 4 65 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 3.1. Types of Exercise Mechanisms . . . . . . . . . . . . . . . 5 68 3.2. Scenario 1: Traceroute through Routed Hops . . . . . . . . 5 69 3.3. Scenario 2: Traceroute through One Switched Hop . . . . . 6 70 3.4. Scenario 3: Traceroute through Two, or More, Switched 71 Hops . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 72 3.5. ECMP . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 73 3.6. Proxy Traceroute/Ping Functionality . . . . . . . . . . . 10 74 4. Performance Monitoring . . . . . . . . . . . . . . . . . . . . 11 75 4.1. Proactive Network Monitoring and Verification . . . . . . 11 76 4.1.1. Proactive Periodic Network Monitoring and 77 Verification . . . . . . . . . . . . . . . . . . . . . 12 78 4.1.2. Proactive Perpetual Network Monitoring and 79 Verification . . . . . . . . . . . . . . . . . . . . . 12 80 4.2. Network Performance Monitoring . . . . . . . . . . . . . . 13 81 5. Other Requirements . . . . . . . . . . . . . . . . . . . . . . 13 82 5.1. Intra-AS Requirements . . . . . . . . . . . . . . . . . . 13 83 5.2. Inter-AS Requirements . . . . . . . . . . . . . . . . . . 16 84 5.3. MTU considerations . . . . . . . . . . . . . . . . . . . . 17 85 5.4. Extensibility . . . . . . . . . . . . . . . . . . . . . . 18 86 5.5. Path Capabilities . . . . . . . . . . . . . . . . . . . . 18 87 5.6. Per Hop Behavior Modification . . . . . . . . . . . . . . 19 88 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 89 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 90 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 91 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 92 9.1. Informative References . . . . . . . . . . . . . . . . . . 20 93 9.2. Normative References . . . . . . . . . . . . . . . . . . . 20 94 9.3. References . . . . . . . . . . . . . . . . . . . . . . . . 20 95 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 96 Intellectual Property and Copyright Statements . . . . . . . . . . 22 98 1. Introduction 100 Current networks make extensive use of multiple network paths to 101 create larger virtual links between network elements, in particular 102 when a single physical-layer link has exceeded its carrying capacity 103 and no larger bandwidth physical layer technologies exist. Operators 104 use various link bundling techniques, such as Link Aggregation Groups 105 (LAGs) and IP and MPLS Equal Cost Multi-Path (ECMP), to augment the 106 capacity between network elements when physical link-layer capacity 107 is exhausted. Existing troubleshooting tools, based on 'legacy' ping 108 and traceroute, are insufficient to effectively examine the 109 underlying component-links that traffic will use. 111 In addition, as more of the world's traffic converges around IP and 112 MPLS based networks, service providers need to extract temporally 113 aware traffic performance information. 115 This draft is NOT intended to address transport MPLS capabilities. 116 Transport-oriented requirements would be complimentary to the 117 requirements presented here. 119 1.1. Contributors 121 The following made vital contributions to this document: 123 Rajeev Manur, Force10 Networks, Inc. 125 2. Background 127 The use of Link Aggregate Groups (LAG's), Equal Cost Multi-Path 128 (ECMP) or a combination of ECMP over LAG's is a common technique used 129 to bond multiple parallel circuits or paths together to achieve the 130 appearance of a larger aggregate link between two nodes. The 131 advantage of these techniques, in particular LAG's, is a reduced 132 number of routing and signaling protocol adjacencies between devices, 133 reducing control plane processing overhead. A disadvantage of these 134 techniques is an inability to determine the individual component-link 135 used for traffic forwarding inside a LAG or ECMP path, specifically 136 for a given microflow, between two devices using traditional 137 traceroute or ping utilities. 139 A key problem related to LAG or ECMP paths is, due to inefficiencies 140 in LAG or ECMP load-distribution algorithms, a particular component- 141 link may experience congestion or a soft-failure, which would go 142 unnoticed by NMS systems and, likely, IP/MPLS Control Plane 143 protocols. The end result is performance degradation of a subset of 144 end-user microflows that use the affected component-links between two 145 adjacent devices. 147 What is needed by operators are the following. First, and the most 148 immediate need, is a capability to determine the set of component- 149 links used by individual network elements through which traceroute or 150 ping messages are traversing. Second, a capability to specify an 151 end-user's microflow, e.g.: a 5-tuple "flow" in the case of IP 152 traffic, that will be used by intermediate devices to calculate the 153 component-link or ECMP path used for that flow to allow periodic or 154 perpetual performance monitoring. Ultimately, these capabilities are 155 necessary to both determine and exercise the actual path that is/was 156 used by an end-user's particular application through the network. 158 3. Use Cases 160 3.1. Types of Exercise Mechanisms 162 This memo classifies two types of ping and traceroute requests that 163 are needed in modern networks where many inter-node links consist of 164 LAG, ECMP or LAG over ECMP paths. First, a "traditional" or "legacy" 165 traceroute and ping request where intermediate devices only 166 understand how to use outer IP header information as the input to a 167 LAG or ECMP hashing algorithm. This type of mechanism has limited 168 utility insomuch as existing devices, interior to a Service 169 Provider's network, only understand how to process limited 170 information in traceroute or ping requests. Note that when operators 171 originate traceroute and/or ping sessions from within their network, 172 requests are sourced from devices, often routers, whose interfaces 173 reside within their network. 175 On the other hand, a "next-generation" traceroute and ping request 176 where intermediate devices understand new information likely 177 contained in the payload of the traceroute and ping request, which 178 can then be fed as input to the LAG or ECMP hashing algorithm. This 179 would allow operators to, for example, specify the exact "tuple" used 180 by customer traffic in order to properly exercise the LAG or ECMP 181 paths used by a particular customer 'flow' through the network. 183 3.2. Scenario 1: Traceroute through Routed Hops 184 I1: 10.1.1.1/30 I3: 10.5.1.1/30 185 +------+ +------+ +------+ 186 | |-- A1 ----------- A2 --| |-- D1 ---------- D2 --| | 187 | R1 |-- B1 -- LAG-1 -- B2 --| R2 | LAG-2 | R3 | 188 | |-- C1 ----------- C2 --| |-- E1 ---------- E2 --| | 189 +------+ +------+ +------+ 190 10.1.1.2/30: I2 10.5.1.2/30: I4 192 Note on figures: Figures 1 through 3 represent a piece of a network 193 for illustrative purposes. In a real network, other nodes will be 194 present. 196 Figure 1: Traceroute through Routed Hops 198 In the above example, the links A1-A2, B1-B2 and C1-C2 are grouped 199 into a single LAG, called LAG-1, between nodes R1 and R2. 200 Furthermore, D1-D2 and E1-E2 are grouped into a single LAG, called 201 LAG-2, between nodes R2 and R3. I1 represents the IPv4 address 202 10.1.1.1/30 assigned to the LAG-1 interface on R1. I2 represents the 203 IPv4 address 10.1.1.2/30 assigned to the LAG-1 interface on R2. I3 204 and I4 are the IP interfaces assigned to R2 and R3, respectively, on 205 LAG-2. R1 and R2 will maintain a single set of routing and signaling 206 protocol (e.g.: IS-IS, RSVP and/or LDP), adjacencies over LAG-1, 207 while R2 and R3 will maintain a single set of routing and signaling 208 protocol adjacencies over LAG-2. Assuming the individual component 209 link sizes between R1, R2 and R3 are 10 Gbps, the end result is that 210 R1 and R2 believe they have a single 30 Gbps connection between them 211 and R2 and R3 believe they have a 20 Gbps connection between them. 213 When performing a traceroute from R1 through R2 to R3, each router 214 independently and automatically determines, through a proprietary LAG 215 or ECMP load-distribution algorithm, the outgoing component-link 216 inside a LAG or ECMP path to send out traceroute UDP probe packets. 217 Unfortunately, the details of the specific component-links are not 218 exposed to a user interface, which would allow operators to determine 219 the exact physical path used by traceroute. Furthermore, those 220 details cannot also be used as input to a 'ping' utility, (using ICMP 221 echo-request and echo-reply messages [RFC792]), to test longer term 222 performance of a specific physical path through the network. The end 223 result is a network operator may believe that a given path between 224 devices is behaving properly when, in fact, end-user traffic is 225 traversing a different set of component-links and experiencing 226 congestion or other link-layer forwarding problems. 228 3.3. Scenario 2: Traceroute through One Switched Hop 229 I1: 10.1.1.1/30 230 +------+ +-------+ +------+ 231 | |-- A1 ----------- A2 --| |-- C1 ----------- C2 --| | 232 | R1 | LAG-1 | SW1 |-- D1 -- LAG-2 -- D2 --| R2 | 233 | |-- B1 ----------- B2 --| |-- E1 ----------- E2 --| | 234 +------+ +-------+ +------+ 235 10.1.1.2/30: I2 237 Figure 2: Traceroute through One Switched Hop 239 In this scenario, links A1-A2 and B1-B2 are grouped into a single 20 240 Gbps LAG, called LAG-1, between nodes R1 and SW1. Furthermore, links 241 C1-C2, D1-D2 and E1-E2 are also joined together into a single 30 Gbps 242 LAG, called LAG-2, between nodes SW1 and R2. I1 represents the IPv4 243 address 10.1.1.1/30 assigned to the LAG-1 interface on R1. I2 244 represents the IPv4 address 10.1.1.2/30 assigned to the LAG-2 245 interface on R2. As in Scenario 1, R1 and R2 will maintain a single 246 set of IP/MPLS routing and signaling protocol adjacencies over the 247 LAG's through SW1. 249 As in scenario 1, each device along the path R1 to SW1 to R2, (or 250 vice-versa), automatically and independently determines the outgoing 251 component-link inside a LAG or ECMP "bundle" to send out traceroute 252 UDP probe packets. Unfortunately, in this scenario if only the 253 incoming component-link interface ID is displayed to an end-user or 254 network operator, that will not reveal the entire physical path 255 traversed from R1 through SW1 to R2. This scenario highlights the 256 need to also show both the outgoing component-link interface ID on R1 257 and the incoming component-link interface ID on R2. With both of 258 those pieces of information, and a priori knowledge that there is 259 only one Layer-2 switch between R1 and R3, an operator can rely on a 260 "legacy" traceroute implementation to determine the actual component- 261 links that were used in a traceroute request. 263 If the operator does not have a priori knowledge that there is a 264 Layer-2 switch between R1 and R2, it would be useful for R1 and R2 to 265 include relevant Layer-2 information, learned from a Link-Layer 266 Discovery Protocol, on both R1 and R3 in the traceroute reply. In 267 this example, R1 would reply with its own outgoing component-link 268 name, SW1's hostname and SW1's incoming component-link name. 269 Furthermore, when R2 sends a traceroute reply it would respond with 270 its own incoming component-link name, SW1's hostname and SW1's 271 outgoing component-link name. This would immediately point out to an 272 operator the presence of one, or more, Layer-2 switches in the middle 273 of a Layer-3 path. Ultimately, without specific component-link 274 'neighbor' information, such as from a Link-Layer Discovery Protocol, 275 it will be difficult to rapidly determine the presence or absence of 276 Layer-2 switches in the interior of a Layer-3 path. 278 It's also important to point out in this particular scenario that, at 279 best, SW1 only understands how to parse information in the outer IP 280 header of a legacy traceroute UDP probe, or other data packets, for 281 input into its LAG hash algorithm, which ultimately determines the 282 outgoing component-link it will use to send packets to R2. It would 283 be highly desirable that SW1 was able to intercept and act upon data 284 fields contained in "next-generation" traceroute and/or ping probe 285 packets, so that operators could specify the actual 5-tuple "flow" to 286 be input into SW1's LAG hash algorithm in order to exercise a 287 specific component-link on SW1 outbound toward R3. If this approach 288 is not used it would likely prevent operators from periodically or 289 continuously exercising a specific set of component-links through a 290 given edge-to-edge path on the network, such as through a proactive 291 network monitoring system, as discussed in Section 4.1 of this 292 document. 294 3.4. Scenario 3: Traceroute through Two, or More, Switched Hops 296 I1: 10.1.1.1/30 297 +----+ +-----+ +-----+ +----+ 298 | |-A1-------A2-| |-C1-------C2-| |-E1---------E2-| | 299 | R1 | LAG-1 | SW1 | LAG-2 | SW2 |-F1- LAG-3 -F2-| R2 | 300 | |-B1-------B2-| |-D1-------D2-| |-G1---------G2-| | 301 +----+ +-----+ +-----+ +----+ 302 10.1.1.2/30: I2 304 Figure 3: Traceroute through Two, or More Switched Hops 306 In this case, two Layer-2 switches are inserted in the path between 307 Layer-3 nodes R1 and R2. LAG-1 and LAG-2 are each grouped together 308 into their own 20 Gbps LAG. Furthermore, LAG-3, between nodes SW2 309 and R2, is joined together as a single 30 Gbps LAG. Finally, I1 310 represents the IPv4 address 10.1.1.1/30 assigned to the LAG-1 311 interface on R1; in addition, I2 denotes the IPv4 address 10.1.1.2/30 312 assigned to the LAG-2 interface on R2. 314 This scenario is common in Enterprise or DataCenter environments 315 where R1 may be a router or server, SW1 a top-of-rack distribution 316 switch, SW2 an aggregation switch and, finally, R2, which is a 317 Layer-3 router typically providing WAN connectivity. 319 This particular case further highlights the need to automatically 320 learn the presence of Layer-2 switches and, ideally, allow one to 321 automatically exercise their LAG hash algorithms to fully qualify the 322 exact set of component-links taken between two Layer-3 devices. In 323 order to learn the presence of Layer-2 switches, it will be necessary 324 for traceroute replies to also include relevant Layer-2 information, 325 such as the next-hop device's hostname and incoming component-link 326 name, from a Link-Layer Discovery Protocol. In the case of "legacy" 327 traceroute, R1 would reply with its outgoing component-link name, 328 plus two pieces of information learned from a Link-Layer Discovery 329 Protocol: SW1's hostname and SW1's incoming component-link name. 330 Furthermore, when the next traceroute UDP probe is sent to R2, it 331 will reply with it's incoming component-link name, SW2's hostname and 332 SW2's outgoing component-link name. Unfortunately, this only yields 333 a partial solution, because it would not reveal the actual component- 334 link used between SW1 and SW2, nor the presence of a third Layer-2 335 switch between SW1 and SW2. In this instance, an operator would want 336 to use Layer-2 OAM tools in an attempt to identify and diagnose the 337 particular component-link that is used between SW1 and SW2. 338 Unfortunately, Layer-2 OAM tools do not have the ability to identify 339 or troubleshoot component-links in a 802.3ad LAG. In addition, it is 340 time consuming for operators to stop using Layer-2.5 (such as LSP- 341 Ping or LSP-Trace) or Layer-3 ping/traceroute tools, login to R1 and 342 R2 and use Layer-2 OAM tools to resume diagnosing the problem. 343 Furthermore, due to the lack of an integrated toolset, it prevents 344 operators from using an NMS to continuously monitor component-links 345 on paths that go over one or more Layer-2 switches. 347 Instead, what is needed by operators is integrated Layer-2 and 348 Layer-3 ping/traceroute tools, which allow for rapid and accurate 349 diagnosis and troubleshooting of LAG/ECMP problems. Ultimately, if 350 Layer-2 switches can intercept and act upon "next-generation" 351 traceroute and ping requests, that would enable operators to specify 352 the actual 5-tuple "flow" to be input into each Layer-2 switches' LAG 353 hash algorithm. This would allow operators to periodically or 354 continuously exercise a specific set of component-links over all 355 Layer-2 and Layer-3 devices, all at the same time, along a complete 356 edge-to-edge path through the network, as discussed in Section 4.1 of 357 this document. 359 It should be noted that the above presumes intermediate Layer-2 360 switches are capable of intercepting and acting upon NG-OAM probe- 361 requests, which may not be true initially in all environments. 362 Therefore, this document requires all NG-OAM solutions to document 363 how they will determine if intermediate Layer-2 switches are NG-OAM 364 capable and communicating that back to the initiator of an NG-OAM 365 request, in order that operators can tell if the complete path was 366 properly exercised. 368 3.5. ECMP 370 TBD 372 3.6. Proxy Traceroute/Ping Functionality 374 To enable more rapid troubleshooting and diagnosis of problems 375 related to LAG, ECMP and/or asymmetric paths in a large-scale 376 network, it is useful to use "proxy" routers/hosts within a network 377 that can initiate a traceroute or ping on behalf of a Network 378 Monitoring System (NMS), such as via [PROXY-LSP-PING]. This is 379 particularly valuable in the following scenarios: 381 o When troubleshooting problems related to asymmetric paths, it is 382 useful to perform a traceroute and/or ping from a source to the 383 destination as well as from the destination back to the source. 385 o Some IP/MPLS routers use 'input interface' as input into the LAG 386 and/or ECMP hashing algorithm; therefore, quickly exercising the 387 associated direction of a particular flow through the network is 388 required. 390 o When narrowing a problem down to specific sequence of links within 391 the network, it is useful to rapidly focus additional testing on 392 suspicious segments, which are a subset of an overall edge-to-edge 393 path. 395 o Periodic monitoring of a large-scale network composed of a 396 multitude of LAG and/or ECMP paths. In order to divide up the 397 periodic testing of a large set of component-links and paths while 398 simultaneously providing timely results, it is useful to 399 distribute testing out to the IP/MPLS routers in the network on or 400 near the paths to be tested. (See Section 3.6 for more details). 402 In this scenario, there are three types of devices: 404 Initiator: The node which creates a proxy traceroute/ping request 405 with: 1) a "5-tuple" to be used as input to a LAG and/or ECMP hashing 406 algorithm; 2) the IP address of the Proxy IP/MPLS router that will 407 initiate the ping/traceroute on behalf of the Initiator; and, 3) the 408 IP address of the destination IP/MPLS router/host that will terminate 409 this ping/traceroute request. 411 Proxy IP/MPLS Router: The node which receives a proxy traceroute/ping 412 request from an Initiator. Once it has interpreted the proxy 413 request, it initiates a proxy ping/traceroute request from itself 414 toward the destination IP/MPLS router specified in the proxy ping/ 415 traceroute request. 417 Proxy Request Terminator: The node(s) which terminate a proxy 418 traceroute/ping request received from the Proxy IP/MPLS Router. In 419 the case of a proxy traceroute, intermediate nodes along the path to 420 the final destination of proxy traceroute are considered 421 "Intermediate Proxy Request Terminators". 423 A NG-OAM solution MUST support Proxy Traceroute/Ping Functionality. 424 A NG-OAM solution MUST support replies from the Proxy Request 425 Terminator (or Intermediate Proxy Request Terminators) being sent 426 back to the Proxy IP/MPLS Router, before they are relayed back to the 427 Initiator. The advantage of this approach is that replies should 428 follow a symmetrical path back to the Initiator, which is useful if 429 the NMS is behind a stateful firewall. On the other hand, an NG-OAM 430 solution MAY support replies from the Proxy Request Terminator (or, 431 Intermediate Proxy Request Terminators) directly back to the 432 Initiator. The advantage of this scheme is that it does not rely on 433 the Proxy IP/MPLS Router to cache or relay/reformat Proxy Reply 434 Information, before replying back to the Initiator. This may be 435 useful in situations where it's desirable to reduce the load on the 436 Proxy IP/MPLS Router. 438 4. Performance Monitoring 440 4.1. Proactive Network Monitoring and Verification 442 There are two forms of Proactive Network Monitoring and Verification 443 (PNMV): Perpetual and Periodic. In a Perpetual PNMV case, the nodes 444 performing monitoring send OAM messages at a specific interval, and 445 record the results on a perpetual basis. In the Periodic case, the 446 messages are sent only on demand of an external system, such as an 447 NMS, or an operator's command. These forms can be implementation 448 cases of the same solution. 450 Today's solutions, such as ping, traceroute, and simulated user 451 traffic between management nodes, can address the case when there is 452 a single path between two endpoints. However, in large national and 453 international networks, there will exist several routed hops for 454 certain paths through the network. Furthermore, between each pair of 455 IP/MPLS routers there will exist LAG's and/or ECMP paths. 456 Unfortunately at present, Network Monitoring Systems (NMS) are unable 457 to exercise the set of component-links through specific paths on the 458 network. This would allow the NMS to identify and notify a Network 459 Operations Center (NOC) to a soft-failure through one or more 460 component-links on the network. The NOC could then proactively 461 respond to the problem by, for example, quickly taking the affected 462 component-link(s) out-of-service or, alternatively, administratively 463 disabling the link bundle or ECMP path and allowing traffic to switch 464 to another in-service path. 466 The challenge with monitoring a large set of LAG and/or ECMP paths in 467 a network will be to find the right balance between monitoring all 468 component-links in the network, minimizing the resource utilization 469 (e.g.: CPU, memory, network I/O) on the NMS system(s) while 470 simultaneously having a timely detection interval to allow for 471 proactive notification of problems to the NOC. Therefore, a solution 472 must be devised that allows an NMS to transmit multiple independent, 473 concurrent LAG and/or ECMP path test queries into various points in 474 the network. Within the network, Proxy IP/MPLS Routers will carry 475 out the test queries and report back the test results to the NMS. 477 A NG-OAM solution SHOULD support the ability to do Proactive 478 Perpetual Network Monitoring and Verification, again through the use 479 of Proxy Traceroute/Ping Functionality described in Section 3.5. It 480 should be noted that Perpetual PNMV may be more resource intensive on 481 devices, which is why that requirement is relaxed compared to 482 Periodic PNMV. 484 4.1.1. Proactive Periodic Network Monitoring and Verification 486 Periodic network monitoring is often done in response to a suspected 487 network event, or done as a sampled case of Perpetual network 488 monitoring when Perpetual network monitoring cannot be scaled to the 489 necessary level. Probes sent Periodically are often sent with a 490 shorter inter-message interval, and often request more information 491 than a test that runs on a Perpetual basis. 493 In order to perform periodic monitoring, the Initiator MUST send the 494 Proxy IP/MPLS Router, the number and interval of the probe requests. 495 For example, the Initiator may send the Proxy IP/MPLS Router a 496 request to run 300 consecutive probes at an interval of 500 msec 497 between probes. 499 4.1.2. Proactive Perpetual Network Monitoring and Verification 501 Perpetual network monitoring is done consistently among a subset of 502 end points in the total network. The subset, such as sample PoP 503 router to sample PoP router, is selected to strike a balance between 504 a good view of network performance and an unmaintainable set of 505 messages. 507 In order to perform perpetual monitoring, the selected monitoring and 508 monitored nodes must run the test, such as NG-Ping, at a set interval 509 and collect and store the resulting statistics. 511 Network Performance Monitoring, as described in section 3.7, is as 512 good example of the case where Perpetual PNMV is required. 514 An NG-OAM solution MUST offer the ability to change monitoring timing 515 intervals. Values as low as 3.3 ms have been suggested, but are 516 optional. Values down to 100 ms SHOULD be supported. 518 4.2. Network Performance Monitoring 520 Network Performance Monitoring (PM, or NPM) is the art and science of 521 recording temporally aware network performance characteristics. A 522 use case for the resulting statistics is for SLA verification, in 523 addition to proactive maintenance. 525 Relevant PM characteristics are typically loss, latency and jitter. 526 A PM solution MUST index these characteristics to time intervals. 527 Knowing that 100 packets were lost, but not knowing when is not 528 particularly actionable. The limits of existing tools and 529 information often results in a NOC "clearing counters" then running a 530 "fast ping" for an arbitrary length of time and hoping that the error 531 occurs again. Keeping all results of a Perpetual PNMV test is one 532 possible solution, however this volume of information can be 533 difficult to store or to sort through when a network event is 534 occurring. A NG-OAM solution SHOULD provide easy-to-read, 535 temporally-aware, statistic that allows an operator to easily assess 536 the magnitude of the problem. 538 An example of this sort of statistic from the world of SONET/SDH 539 transport is the errored second, and severely errored second. 541 The level of granularity of PM statistics gathering SHOULD be 542 configurable. 544 5. Other Requirements 546 5.1. Intra-AS Requirements 548 The NG-OAM solution SHOULD use the same mechanism to address both the 549 Intra-AS (this section) and Intra-AS (Section 5.2) requirements. An 550 operator MUST be able to run a traceroute from one domain and through 551 another. The amount of information this traceroute provides may 552 differ depending on where the probe is originated, and what sort of 553 authorization it possesses to access information in other domains. 555 Intra-AS requirements are applicable within an Autonomous System 556 (AS), where all IP/MPLS devices are expected to be under a single 557 administrative authority. Because devices are under a single 558 administrative authority, copious diagnostic information that can be 559 returned to the Initiator of a ping/traceroute request. Ultimately, 560 however, an NG-OAM solution MUST ensure that extensive Intra-AS 561 diagnostic information is not leaked across the boundaries of the 562 Autonomous System, since it would provide valuable network 563 intelligence information. In addition, it is desirable if 564 lightweight authentication and/or encryption techniques can be used 565 to secure both probe requests and replies, in order to limit the 566 effects of resource exhaustion on network elements that are 567 processing probe request/replies. 569 The following is a brief summary of the minimal set of information 570 that a NG-OAM solution is expected to address. NG-OAM solutions MAY 571 capture additional information through, for example, experimental or 572 vendor-specific objects specified in the NG OAM probe-request. 574 NG-OAM Probe Requests and Probe Replies MUST contain a "Query ID", 575 generated by the Probe Initiator, that can be used to associate Probe 576 Responses to Probe Requests. 578 Next-Gen Traceroute 580 o MUST work for IP and MPLS 582 o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe 583 Request 585 o MUST be able to specify whether the IPv4 packet is a first- 586 fragment, or subsequent fragment, in order that intermediate 587 devices can adjust their LAG/ECMP calculation appropriately. 589 o MUST be able to specify the MPLS label stack use to identify a 590 "flow" across an MPLS-only portion of the network in a Probe 591 Request. 593 o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used 594 in a Probe Request. 596 o MUST be able to specify a combination of label stack and IP 597 5-tuple, if both are used in the ECMP/LAG hash algorithm. 599 o MUST capture the following information in a Probe Reply: 601 * The specific components of Layer-2, (e.g.: Ethernet), header, 602 MPLS label stack and/or IP 5-tuple, that were used in the ECMP/ 603 LAG hash algorithm at this hop 605 * Incoming Interface Name 607 * Outgoing Interface Name 608 * Number of component-links in a bundle 610 * Size (Bandwidth) of individual component-links in a bundle 612 * Percent bandwidth utilization on interface(s) 614 * Remote Link-Layer neighbor name and interface name 616 o SHOULD be able to, on request of the source, to provide recent 617 performance history of the incoming or outgoing link(s) 619 Next-Gen Ping 621 o MUST work for IP and MPLS 623 o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe 624 Request 626 o MUST be able to specify the MPLS label stack use to identify a 627 "flow" across an MPLS-only portion of the network in a Probe 628 Request. 630 o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used 631 in a Probe Request. 633 o MUST follow the regular data-plane path for forwarding within a 634 network element 636 o MUST be able to test all links/paths concurrently, or serially, 637 between two network elements when operators do not know a 638 customer's "flow" information, which can be used as input to a LAG 639 and/or ECMP hash calculation. 641 Proxy Traceroute 643 o All of the requirements mentioned above for "Next-Gen Traceroute", 644 plus: 646 o The Initiator MUST be able to specify the number of Probe 647 Requests. 649 o The Initiator MAY also specify the interval between Probe 650 Requests, which the Proxy IP/MPLS Router is responsible for 651 carrying out on the Initiator's behalf. 653 Proxy Ping 654 o All of the requirements mentioned above for "Next-Gen Ping", plus: 656 o The Initiator MUST be able to specify the number of Probe Requests 657 and interval between Probe Requests, which the Proxy IP/MPLS 658 Router is responsible for carrying out on the Initiator's behalf. 660 Next-Gen OAM Traceroute/Ping Probe Replies MUST capture error 661 conditions that were encountered during an unsuccessful Probe 662 Request. Those replies are expected to capture not only those 663 conditions defined by classic [ICMP], (e.g: Destination Unreachable 664 Type), but also new error conditions specific to NG-OAM solutions. 665 In order to seamlessly accommodate future error conditions, NG-OAM 666 solutions MUST use a TLV format for specifying error conditions in 667 Probe Replies. 669 Intra-AS probe requests (and probe replies) MUST be easily 670 identifiable in the data plane, in order that routers acting on NG- 671 traceroute or NG-ping requests (or replies) can rapidly drop them in 672 order to avoid resource exhaustion. NG-traceroute and NG-ping 673 solutions MUST provide configurable methods to rate-limit the number 674 of Intra-AS request (or reply) packets to prevent resource 675 exhaustion. 677 5.2. Inter-AS Requirements 679 Inter-AS requirements are applicable across administrative domains, 680 such as the Internet or, perhaps, several MPLS service providers 681 delivering a single MPLS VPN solution. Because devices are not under 682 a single administrative authority, only a limited amount of 683 diagnostic information must be returned to the Initiator of a ping/ 684 traceroute request. This information is primarily useful in the 685 context of helping the responsible party pinpoint the specific 686 location of a problem. For example, Customer A may be experiencing 687 packet loss in Service Provider A's network for his Internet service. 688 The link between Customer A and Service Provider A consists of a ECMP 689 path between SP A's ASBR and Customer A's ASBR. Customer A can 690 perform a NG-traceroute through this ECMP path and provide the output 691 of NG-traceroute to SP A's NOC in order to more rapidly identify the 692 particular component-link, which is the causing a problem. Other 693 examples where this is useful are: over Internet (IPv4 or IPv6) 694 peering/transit links and within DataCenters from servers through to 695 the DataCenter provider's ASBR attached to several SP's, where MPLS 696 is not used. 698 Inter-AS probe requests (and probe replies) MUST be easily 699 identifiable in the data plane, in order that routers acting on NG- 700 traceroute or NG-ping requests (or replies) can rapidly drop them in 701 order to avoid resource exhaustion. NG-traceroute and NG-ping 702 solutions MUST provide configurable methods to rate-limit the number 703 of Inter-AS request (or reply) packets to prevent resource 704 exhaustion. 706 Next-Gen Traceroute 708 o MUST work for IP and MPLS 710 o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe 711 Request 713 o MUST be able to specify the MPLS label stack use to identify a 714 "flow" across an MPLS-only portion of the network in a Probe 715 Request. 717 o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used 718 in a Probe Request. 720 o MUST be able to specify a combination of label stack and IP 721 5-tuple, if both are used in the ECMP/LAG hash algorithm. 723 o MUST capture the following information in a Probe Reply: 725 * Incoming Interface Name 727 * Outgoing Interface Name 729 Next-Gen Ping 731 o MUST work for IP and MPLS 733 o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe 734 Request 736 o MUST be able to specify the MPLS label stack use to identify a 737 "flow" across an MPLS-only portion of the network in a Probe 738 Request. 740 o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used 741 in a Probe Request. 743 Proxy Ping/Traceroute requirements are not applicable to Inter-AS 744 scenarios, since the risk of resource starvation is too large. 746 5.3. MTU considerations 748 Traceroute probes need to be kept to minimal size. Traceroute reply 749 PDU's should be kept to 1500 Bytes in size in order to avoid the need 750 for IP fragmentation. It is a safe assumption that operators have a 751 minimum of 1500 Bytes for IP MTU, and often significantly larger. 753 Optionally, path MTU discovery may be used to determine a minimum 754 MTU. The MTU values MUST be configurable by the operator to adjust 755 to unanticipated conditions. A Traceroute reply packet MAY span 756 multiple packets. 758 5.4. Extensibility 760 It would be useful to allow for the "next-generation" traceroute and 761 ping protocols to contain TLV's, in order that they may be easily 762 extended in the future to account for additional capabilities, which 763 may be developed at a later point in time. 765 5.5. Path Capabilities 767 In order to be certain that NG-ping or NG-traceroute will be able to 768 properly exercise component-links in a LAG and/or ECMP path through 769 the network, it is necessary to determine if all devices along a 770 specific path are capable of supporting the requisite protocols and 771 replying with appropriate results back to the originator of the NG- 772 ping or NG-traceroute request. There are potentially two methods 773 that can be employed to determine these capabilities: 1) path 774 discovery; or, 2) encoding special/reserved codepoints into the 775 packet header of NG-OAM request/reply packets. With the first 776 method, the originating host/router could use a path discovery 777 function to determine the capabilities and properties of intermediate 778 and/or terminating devices prior to actually using NG-ping or NG- 779 traceroute to test the data path. Once the originating host/router 780 has learned the characteristics of intermediate and/or terminating 781 devices, it could then originate a NG-ping/traceroute request using 782 that information to exercise the actual data path. 784 The second method is likely to encode the NG OAM packets with 785 specific values in the packet header of NG-OAM request/reply packets, 786 (for example, via new ICMP type/codes or MPLS label values). In this 787 approach, the originating host/router can simply launch a NG-ping/ 788 traceroute request allowing each intermediate and/or terminating 789 device to independently determine if it's capable of supporting the 790 NG-OAM request and, concurrently, exercising the component-links 791 appropriate to the LAG and/or ECMP path. 793 Although the latter approach has the potential disadvantage that it 794 may be more difficult to support on some existing hardware, this 795 document recognizes that it is the superior approach of the two 796 choices. If one depends on, for example, NG-traceroute to "discover" 797 characteristics of a path before allowing one to ping, it creates a 798 circular dependency. Specifically, in the case where one is doing 799 perpetual pings and the underlying path changes for legitimate 800 reasons, the NG-OAM would have to discover the change to the path, 801 trigger a new NG-traceroute and then resume perpetual pings along the 802 new path. Note that a change to the existing path could consist of 803 any of the following: 1) a component-link in a LAG goes down, yet, 804 the LAG itself remains operational, (e.g.: a 10x LAG goes to a 9x 805 LAG), ultimately changing the result of LAG hashing algorithm; or, 2) 806 the entire LAG and/or ECMP path goes down and data packets are routed 807 along an alternate path. Ultimately, if each NG-OAM packet is a 808 self-contained, autonomous OAM unit, then each intermediate and/or 809 terminating device will act on it appropriately. 811 Therefore, this document specifies that a NG-OAM solution MUST 812 support the second method, autonomous OAM units, outlined above. NG- 813 OAM solutions MAY support the first method, to provide short-term NG 814 OAM coverage with existing hardware. 816 5.6. Per Hop Behavior Modification 818 Modification of per-hop behavior in order to support NG-OAM is 819 acceptable, but not required of NG-OAM solutions. This allows 820 solutions where intermediate routers have to look at something new to 821 determine if they are looking at an OAM packet, or to determine if 822 they are they target or Proxy of a NG-OAM request. 824 6. IANA Considerations 826 This document makes no request of IANA. 828 Note to RFC Editor: this section may be removed on publication as an 829 RFC. 831 7. Security Considerations 833 Devices MUST rate-limit the amount traceroute and/or ping traffic 834 they process to avoid DoS attacks. Those rate-limits MUST be 835 configurable to suit the appropriate environment in which they are 836 deployed. An attacker must not be allowed to force an inordinate 837 amount of traceroute and/or ping traffic down a single physical 838 component-link causing congestion. Therefore, devices MUST rate- 839 limit the amount of "external" traceroute and/or ping traffic through 840 any specific component-link or set of component-links. Note, 841 implementations SHOULD provide exceptions that to allow a network 842 operators Intra-Domain traceroute and/or ping traffic, particularly 843 for performance monitoring, to get through without interference by 844 rate-limiters. 846 A lightweight authentication method SHOULD be provided by an NG-OAM 847 solution. This mechanism can be used to defend against DoS or 848 insertion attacks from other systems spoofing NG-OAM information. 849 This can also be used in a reply message to defend against a "SLA 850 Violation" attack where a malicious system could make it appear as if 851 an operator's network has violated the SLA, when, in fact, they have 852 not. 854 8. Acknowledgements 856 The authors would like to thank Nitin Bahadur, Ping Pan, Nasser El- 857 Aawar, Dimitri Papadimitriou for their reviews and thoughtful 858 feedback. 860 9. References 862 9.1. Informative References 864 [BFD-BASE] 865 "draft-ietf-bfd-base-07.txt - Bidirectional Forwarding 866 Detection", January 2008. 868 [LLDP] "IEEE Standard - 802.1AB-2005", May 2005. 870 [LMP] "RFC 4204 - Link Management Protocol", October 2005. 872 [PROXY-LSP-PING] 873 George Swallow and Vanson Lim, "Proxy LSP Ping, 874 draft-ietf-mpls-remote-lsp-ping-01.txt", November 2007. 876 [RSVP-DIAG] 877 "RFC 2745 - RSVP Diagnostic Messages", January 2000. 879 9.2. Normative References 881 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 882 Requirement Levels", BCP 14, RFC 2119, March 1997. 884 9.3. References 886 [RFC 792] "Internet Control Message Protocol", 2005. 888 Authors' Addresses 890 Shane Amante 891 Level 3 Communications, LLC 892 1025 Eldorado Blvd 893 Broomfield, CO 80021 895 Email: shane.amante@level3.com 897 Alia Atlas 898 BT 900 Email: alia.atlas@bt.com 902 Andrew Lange 903 Alcatel-Lucent 905 Email: andrew.lange@alcatel-lucent.com 907 Danny McPherson 908 Arbor Networks, Inc. 910 Email: danny@arbot.net 912 Full Copyright Statement 914 Copyright (C) The IETF Trust (2008). 916 This document is subject to the rights, licenses and restrictions 917 contained in BCP 78, and except as set forth therein, the authors 918 retain all their rights. 920 This document and the information contained herein are provided on an 921 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 922 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 923 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 924 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 925 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 926 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 928 Intellectual Property 930 The IETF takes no position regarding the validity or scope of any 931 Intellectual Property Rights or other rights that might be claimed to 932 pertain to the implementation or use of the technology described in 933 this document or the extent to which any license under such rights 934 might or might not be available; nor does it represent that it has 935 made any independent effort to identify any such rights. Information 936 on the procedures with respect to rights in RFC documents can be 937 found in BCP 78 and BCP 79. 939 Copies of IPR disclosures made to the IETF Secretariat and any 940 assurances of licenses to be made available, or the result of an 941 attempt made to obtain a general license or permission for the use of 942 such proprietary rights by implementers or users of this 943 specification can be obtained from the IETF on-line IPR repository at 944 http://www.ietf.org/ipr. 946 The IETF invites any interested party to bring to its attention any 947 copyrights, patents or patent applications, or other proprietary 948 rights that may cover technology that may be required to implement 949 this standard. Please address the information to the IETF at 950 ietf-ipr@ietf.org. 952 Acknowledgment 954 Funding for the RFC Editor function is provided by the IETF 955 Administrative Support Activity (IASA).