idnits 2.17.1 draft-janapath-opsawg-flowoam-req-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 14 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 8, 2012) is 4364 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 116, but not defined == Unused Reference: 'RFC 792' is defined on line 378, but no explicit reference was found in the text == Unused Reference: '1' is defined on line 386, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 5556 -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE SPB' Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 OPSAWG Working Group Janardhanan Pathangi 3 INTERNET-DRAFT Balaji Venkat Venkataswami 4 Intended Status: Proposed Standard DELL 5 Expires: November 2012 Richard Groves 6 MICROSOFT 7 Peter Hoose 8 FACEBOOK 9 May 8, 2012 11 Requirements for OAM tools that enable flow Analysis 12 draft-janapath-opsawg-flowoam-req-00 14 Abstract 16 This document specifies Operations and Management (OAM) requirements 17 that improve on the traditional OAM tools like Ping and Traceroute. 18 These requirements have arisen from the fact that more details than 19 given by Ping and Traceroute are required while troubleshooting or 20 doing performance and network planning. These requirements have been 21 gathered from network operators especially from data centers where 22 the networks have slightly different characteristics compared to 23 regular campus/carrier networks. 25 Status of this Memo 27 This Internet-Draft is submitted to IETF in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as 33 Internet-Drafts. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/1id-abstracts.html 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html 46 Copyright and License Notice 47 Copyright (c) 2012 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.2 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 2.1 Flow tracing . . . . . . . . . . . . . . . . . . . . . . . . 5 67 2.2 Fate sharing and actual flow interference . . . . . . . . . 5 68 2.2.1 Side effects of requirements 2.1 and 2.2 . . . . . . . . 5 69 2.3 Capability to send the response to a monitoring station . . 6 70 2.4 Terminating the trace on a transit device . . . . . . . . . 6 71 2.5 Flow monitoring . . . . . . . . . . . . . . . . . . . . . . 6 72 2.5.1 Parameters for monitoring . . . . . . . . . . . . . . . 6 73 2.5.1.1 Time-based and Monitor-till-stop monitoring . . . . 6 74 2.6 Loop Detection . . . . . . . . . . . . . . . . . . . . . . . 7 75 2.7 Additional Information . . . . . . . . . . . . . . . . . . 7 76 2.7.1 Link statistics . . . . . . . . . . . . . . . . . . . . 7 77 2.7.2 Packet drops and their reasons . . . . . . . . . . . . . 7 78 2.8 Additional enhancements . . . . . . . . . . . . . . . . . . 7 79 2.8.1 Fat-Tree traversal. . . . . . . . . . . . . . . . . . . 7 80 2.8.2 Hash Algorithm Parameters . . . . . . . . . . . . . . . 8 81 2.9 Future Requirements . . . . . . . . . . . . . . . . . . . . 8 82 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 8 83 3.1 Securing Requests and Responses . . . . . . . . . . . . . . 8 84 3.2 Information hiding . . . . . . . . . . . . . . . . . . . . . 8 85 3.3 Rate limiting obviating attack vectors . . . . . . . . . . . 9 86 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 9 87 4.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 88 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 89 5.1 Normative References . . . . . . . . . . . . . . . . . . . 10 90 5.2 Informative References . . . . . . . . . . . . . . . . . . 10 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 93 1 Introduction 95 Network operators have traditionally managed IP networks with classic 96 OAM tools like Ping and Traceroute. Operators typically use Ping to 97 perform end-2-end connectivity checks, and Traceroute to trace hop- 98 by-hop path to a given destination. Traceroute is also used to 99 isolate the point of failure along the path to a given destination. 100 Also, while these are useful for basic connectivity checks, they are 101 unable to provide sufficient information about the performance 102 aspects of a path (e.g. utilization levels). 104 In current networks especially data center networks, there are a 105 large number of redundant paths and existing OAM tools are unable to 106 identify flow specific problems and also do not provide sufficient 107 information on the various paths which includes performance 108 characteristics. What is needed is a set of tools that will perform 109 the OAM functions based on header fields of actual user traffic. 111 1.1 Terminology 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in RFC 2119 [RFC2119]. 117 1.2 Acronyms 119 OAM Operations and Management 120 ECMP Equal Cost Multi-Path 121 LAG Link Aggregation Group 122 TRILL Transparent Connection of Lots of Links 123 SPB Shortest Path Bridging 124 L2 Layer 2 125 IP Internet Protocol 126 TCP Transmission Control Protocol 127 UDP User Datagram Protocol 128 IDS Intrusion Detection System 129 IPS Intrusion Prevention System 130 ACL Access Control List 131 ARP Address Resolution Protocol 132 ICMP Internet Control Message Protocol 134 2. Requirements 136 The main requirement is being able to trace the exact path that a 137 particular flow will take through the network and obtain all relevant 138 information about the links along that path which provides the 139 network operator with enough information to troubleshoot a network 140 failure or quickly obtain performance data about that path. 142 Increasing number of networks are using multi-path configurations to 143 improve load-balancing and redundancy in their networks. These multi- 144 paths could be in the form of ECMP paths offered by the routing 145 protocols, but also Layer 2 ECMP paths (e.g. via TRILL [RFC 5556] or 146 SPB [IEEE SPB]) and LAGs between adjacent routers. 148 2.1 Flow tracing 150 Ideally, the OAM trace packet should undergo the same processing at 151 each node as would the actual application flow that is being traced. 152 The forwarding process in switches and routers typically includes 153 fields from the L2, IP, and transport (TCP/UDP) headers, to determine 154 which member of an ECMP or LAG to forward the packet on. The OAM 155 packets should also contain the flow entropy allowing for the same 156 processing that a typical data packet would go through. 158 Current tools like ping and Traceroute do not carry the application 159 flow information, and hence the path that those packets follow 160 through the network could differ from the packets of the specific 161 flow that is of interest. 163 2.2 Fate sharing and actual flow interference 165 OAM probes while sharing fate with the actual flow, should not affect 166 the real application in progress at the time of troubleshooting. The 167 OAM request originating from the sender should not interfere with the 168 actual application at the target host Likewise, the OAM response 169 should not go back to the real application at the originator of the 170 OAM query. 172 2.2.1 Side effects of requirements 2.1 and 2.2 174 To ensure that OAM packets share the same fate as that of the 175 Application's packets, and yet do not get delivered to the 176 application, it would be necessary to have an indication in the 177 packet to distinguish OAM packets from regular application flow 178 packets. The inclusion of such an indication in the packet should 179 still result in the formation of a legitimate packet, and should not 180 trigger security based drops or alarms at intermediate firewalls and 181 IDS/IPS appliances, due to, say, an incorrect checksum or invalid 182 fragment headers, that regular data packets would not normally 183 experience. 185 It would be useful for the operator to control which class of service 186 is used by an OAM packet. For example, when measuring one way or 187 round trip delays, it would be useful to send it in the same class of 188 service as regular data. 190 2.3 Capability to send the response to a monitoring station 192 When tracing the flow from node A to node B, it should be possible to 193 direct all the response packets to a third node C, which could be a 194 management station. 196 2.4 Terminating the trace on a transit device 198 The tool should have the capability to terminate the trace at a 199 specific hop specified by an IP address or by specifying a limit on 200 the number of hops. This helps in segmented tracing, where portions 201 of the path can be traced. 203 2.5 Flow monitoring 205 It should be possible to initiate flow monitoring on one or all of 206 the intermediate devices, and should have the following capabilities. 208 2.5.1 Parameters for monitoring 210 The tool should provide an extensible mechanism by which the 211 monitoring station can ask for monitoring of certain parameters for 212 the flow like input rate, packet drops, etc at a given network node. 213 It should also be possible to request for packet samples for external 214 monitoring tool to calculate statistics on the flow or interfaces 216 The requested device may honor the monitoring request based on its 217 policy, authentication of the requester and also the available 218 resources on the device. It should be able to indicate back in the 219 response if and what parts of the monitoring are activated. 221 The period for which this monitoring is activated could be... 223 2.5.1.1 Time-based and Monitor-till-stop monitoring 225 The OAM packet carries a time period and frequency of sampling, and 226 the requested devices send the samples at the specified frequency for 227 the specified time period. This could also be overridden by local 228 policy. In case of the Monitor-till-stop monitoring the OAM packet 229 will initiate the monitoring at a specific sampling rate. The 230 monitoring will continue till there is a new request for turning off 231 the monitoring. A local policy can also override this behavior and 232 restrict this to a maximum period that is locally defined. 234 2.6 Loop Detection 236 The tool should be capable of detecting that OAM packets are being 237 looped. If this happens the operation should be aborted. Appropriate 238 heuristics may be considered while implementing this feature. 240 2.7 Additional Information 242 Apart from reporting the incoming and outgoing interfaces, it would 243 be useful for the tool to report on the following 245 2.7.1 Link statistics 247 There is a necessity to collect useful information to enable 248 operators to perform more detailed problem analysis or network 249 optimization. The operator may need to know the utilization of the 250 links along the path in addition to the fan-out information. This 251 information could be for example be used by servers to selec source 252 ephemeral ports in such a way as to avoid over-utilized links. Also 253 disparities in LAG members with respect to over-utilization of some 254 links and under-utilization of others could help the operator to 255 tweak some of the available parameters or available hash functions 256 for better load distribution. 258 2.7.2 Packet drops and their reasons 260 Packets may get dropped due to a variety of reasons, and the OAM 261 mechanism should be able to indicate the actual reasons for drop. The 262 response OAM packet should indicate the error code appropriately for 263 various reasons why a packet may have been dropped. 265 2.8 Additional enhancements 267 Data center networks and applications have specialized needs. To 268 accomplish this the new tools provide certain additional information 269 for the data centers such as the following. 271 2.8.1 Fat-Tree traversal. 273 The tracing of a fat-tree (i.e. all paths) from the source to the 274 destination is a very important requirement from modern day 275 administrators running say a data-center. This could be done within 276 an administrative boundary and not beyond it. 278 2.8.2 Hash Algorithm Parameters 280 When choosing from a set of ECMP links or LAG members it is common 281 for a hash function to be performed on select header fields. This 282 hash algorithm is important with respect to which ECMP or LAG member 283 is chosen to forward the packet on. It would be useful to know which 284 fields play a part in the computation of this choice. The actual hash 285 function may be internal to the device and need not be returned since 286 it may be proprietary to the vendor but the header fields accounted 287 for in the hash function would provide enough information for the 288 system / network administrator to vary these parameters in order to 289 figure out a specific path through which the traffic for a flow or a 290 set of flows can be engineered. 292 2.9 Future Requirements 294 The requirements specified in this document relate to tools for 295 trouble shooting IP layer connectivity with respect to IP nodes in 296 the path from source to destination. This draft deals with tracing 297 the IP nodes such as transit devices and the end target destination. 298 A future version of this set of requirements would look at tracing 299 the intermediate network between IP nodes. 301 3 Security Considerations 303 This section discusses threats to which these new set of tools might 304 be vulnerable and discusses means by which those threats might be 305 mitigated. 307 The following are some of the security requirements that need to be 308 adhered to under this framework. 310 3.1 Securing Requests and Responses 312 Tool developed under this framework should require mechanisms to 313 secure the requests and responses. The security provided for these 314 requests and responses should ensure integrity of these packets and 315 ensure confidentiality if necessary. An administrator should be able 316 to select the information that will be sent in insecure messages, 317 should such secure mechanisms not be available. For example, the set 318 of information exchanged in that case could be limited to the 319 information obtainable via traditional Ping and Traceroute. 321 3.2 Information hiding 323 There is a concern that tools developed to satisfy the requirements 324 in this document might allow an external user to probe the detailed 325 path that a flow takes through a network. To address this the network 326 operator could associate multiple security levels with the different 327 types of information that may be included in the response to a 328 discovery packet coming from a legitimate tool. For example only the 329 "Next Hop Router" may be marked as publicly accessible information 330 whereas everything else may be marked as private information. On 331 receiving a flow discovery request packet originating outside the 332 local network, only the publicly accessible information should be 333 included in the response to the originator. However if the request 334 was originated by a legitimate, known source the device could include 335 all of the requested information in the response. 337 The Result and Additional Information types specified in the section 338 2.7 provide detailed information about the processing of the request 339 packet and may possibly leak information about the locally configured 340 policies. The amount of information to be included in these sets of 341 data should also depend on whether the request was originated from a 342 legitimate source. The network operator may choose to silently drop 343 the Flow Discovery Request packet without providing any indication of 344 the reason for doing so if the request was originated externally. 346 3.3 Rate limiting obviating attack vectors 348 Today most network operators throttle conventional OAM traffic (ping 349 and traceroute, and other ICMP messages) that is serviced by the 350 device to protect against Denial-of-Service attacks. Such mechanisms 351 should be employed for OAM packets under this framework for the same 352 reason. 354 4 IANA Considerations 356 This document does not need any consideration from IANA. 358 It is likely that tools under this framework may require new IANA 359 assigned protocol ports that signify the specific OAM protocol that 360 is to be implemented to satisfy such requirements. Tools developed to 361 satisfy will require such IANA assignments as the needs arise. 363 4.1 Acknowledgements 365 The authors would like to thank Ron Bonica for his thorough review 366 and critique of the traceflow proposal [2]. We also would like to 367 thank Melinda Shore for her direction and review of the traceflow 368 proposal which gave rise to this document. 370 The requirements presented in this document were a result of the 371 traceflow proposal submitted to the IETF by A. Viswanathan, S. 372 Krishnamurthy, R. Manur and V. Zinjuvadia. 374 5 References 376 5.1 Normative References 378 [RFC 792] "Internet Control Message Protocol", 2005. 380 [RFC 5556] "Transparent Interconnection of Lots of Links", May 2009. 382 [IEEE SPB] "IEEE Shortest path Bridging", IEEE 802.1aq 384 5.2 Informative References 386 [1] Zinjuvadia et.al, draft-zinjuvadia-traceflow-02.txt, Internet- 387 Draft, August 2008 389 [2] Shane Amante et.al, draft-amante-oam-ng-requirements-01.txt, 390 Internet-draft, February 2008. 392 Authors' Addresses 394 Janardhanan Pathangi, 395 Dell-Force10, 396 Olympia Technology Park, 397 Fortius block, 7th & 8th Floor, 398 Plot No. 1, SIDCO Industrial Estate, 399 Guindy, Chennai - 600032. 400 TamilNadu, India. 401 Tel: +91 (0) 44 4220 8400 402 Fax: +91 (0) 44 2836 2446 404 Email: Pathangi_janardhanan@dell.com 406 Balaji Venkat Venkataswami, 407 Dell-Force10, 408 Olympia Technology Park, 409 Fortius block, 7th & 8th Floor, 410 Plot No. 1, SIDCO Industrial Estate, 411 Guindy, Chennai - 600032. 412 TamilNadu, India. 413 Tel: +91 (0) 44 4220 8400 414 Fax: +91 (0) 44 2836 2446 416 Email: BALAJI_VENKAT_VENKAT@dell.com 417 Richard Groves, 418 Microsoft Corporation, 419 One Microsoft Way, 420 Redmond, WA 98052 422 Email: rgroves@microsoft.com 424 Peter Hoose, 425 Facebook, 426 1600, Willow Rd., 427 Menlo Park, CA 94025 429 Email: phoose@fb.com