OPSAWG Working Group Janardhanan Pathangi INTERNET-DRAFT Balaji Venkat Venkataswami Intended Status: Proposed Standard DELL Expires: November 2012 Richard Groves MICROSOFT Peter Hoose FACEBOOK May 8, 2012 Requirements for OAM tools that enable flow Analysis draft-janapath-opsawg-flowoam-req-00 Abstract This document specifies Operations and Management (OAM) requirements that improve on the traditional OAM tools like Ping and Traceroute. These requirements have arisen from the fact that more details than given by Ping and Traceroute are required while troubleshooting or doing performance and network planning. These requirements have been gathered from network operators especially from data centers where the networks have slightly different characteristics compared to regular campus/carrier networks. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Janardhanan Pathangi Expires November 2012 [Page 1] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Flow tracing . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Fate sharing and actual flow interference . . . . . . . . . 5 2.2.1 Side effects of requirements 2.1 and 2.2 . . . . . . . . 5 2.3 Capability to send the response to a monitoring station . . 6 2.4 Terminating the trace on a transit device . . . . . . . . . 6 2.5 Flow monitoring . . . . . . . . . . . . . . . . . . . . . . 6 2.5.1 Parameters for monitoring . . . . . . . . . . . . . . . 6 2.5.1.1 Time-based and Monitor-till-stop monitoring . . . . 6 2.6 Loop Detection . . . . . . . . . . . . . . . . . . . . . . . 7 2.7 Additional Information . . . . . . . . . . . . . . . . . . 7 2.7.1 Link statistics . . . . . . . . . . . . . . . . . . . . 7 2.7.2 Packet drops and their reasons . . . . . . . . . . . . . 7 2.8 Additional enhancements . . . . . . . . . . . . . . . . . . 7 2.8.1 Fat-Tree traversal. . . . . . . . . . . . . . . . . . . 7 2.8.2 Hash Algorithm Parameters . . . . . . . . . . . . . . . 8 2.9 Future Requirements . . . . . . . . . . . . . . . . . . . . 8 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 8 3.1 Securing Requests and Responses . . . . . . . . . . . . . . 8 3.2 Information hiding . . . . . . . . . . . . . . . . . . . . . 8 3.3 Rate limiting obviating attack vectors . . . . . . . . . . . 9 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 9 4.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1 Normative References . . . . . . . . . . . . . . . . . . . 10 5.2 Informative References . . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 Janardhanan Pathangi Expires November 2012 [Page 2] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 Janardhanan Pathangi Expires November 2012 [Page 3] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 1 Introduction Network operators have traditionally managed IP networks with classic OAM tools like Ping and Traceroute. Operators typically use Ping to perform end-2-end connectivity checks, and Traceroute to trace hop- by-hop path to a given destination. Traceroute is also used to isolate the point of failure along the path to a given destination. Also, while these are useful for basic connectivity checks, they are unable to provide sufficient information about the performance aspects of a path (e.g. utilization levels). In current networks especially data center networks, there are a large number of redundant paths and existing OAM tools are unable to identify flow specific problems and also do not provide sufficient information on the various paths which includes performance characteristics. What is needed is a set of tools that will perform the OAM functions based on header fields of actual user traffic. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.2 Acronyms OAM Operations and Management ECMP Equal Cost Multi-Path LAG Link Aggregation Group TRILL Transparent Connection of Lots of Links SPB Shortest Path Bridging L2 Layer 2 IP Internet Protocol TCP Transmission Control Protocol UDP User Datagram Protocol IDS Intrusion Detection System IPS Intrusion Prevention System ACL Access Control List ARP Address Resolution Protocol ICMP Internet Control Message Protocol Janardhanan Pathangi Expires November 2012 [Page 4] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 2. Requirements The main requirement is being able to trace the exact path that a particular flow will take through the network and obtain all relevant information about the links along that path which provides the network operator with enough information to troubleshoot a network failure or quickly obtain performance data about that path. Increasing number of networks are using multi-path configurations to improve load-balancing and redundancy in their networks. These multi- paths could be in the form of ECMP paths offered by the routing protocols, but also Layer 2 ECMP paths (e.g. via TRILL [RFC 5556] or SPB [IEEE SPB]) and LAGs between adjacent routers. 2.1 Flow tracing Ideally, the OAM trace packet should undergo the same processing at each node as would the actual application flow that is being traced. The forwarding process in switches and routers typically includes fields from the L2, IP, and transport (TCP/UDP) headers, to determine which member of an ECMP or LAG to forward the packet on. The OAM packets should also contain the flow entropy allowing for the same processing that a typical data packet would go through. Current tools like ping and Traceroute do not carry the application flow information, and hence the path that those packets follow through the network could differ from the packets of the specific flow that is of interest. 2.2 Fate sharing and actual flow interference OAM probes while sharing fate with the actual flow, should not affect the real application in progress at the time of troubleshooting. The OAM request originating from the sender should not interfere with the actual application at the target host Likewise, the OAM response should not go back to the real application at the originator of the OAM query. 2.2.1 Side effects of requirements 2.1 and 2.2 To ensure that OAM packets share the same fate as that of the Application's packets, and yet do not get delivered to the application, it would be necessary to have an indication in the packet to distinguish OAM packets from regular application flow packets. The inclusion of such an indication in the packet should still result in the formation of a legitimate packet, and should not trigger security based drops or alarms at intermediate firewalls and IDS/IPS appliances, due to, say, an incorrect checksum or invalid Janardhanan Pathangi Expires November 2012 [Page 5] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 fragment headers, that regular data packets would not normally experience. It would be useful for the operator to control which class of service is used by an OAM packet. For example, when measuring one way or round trip delays, it would be useful to send it in the same class of service as regular data. 2.3 Capability to send the response to a monitoring station When tracing the flow from node A to node B, it should be possible to direct all the response packets to a third node C, which could be a management station. 2.4 Terminating the trace on a transit device The tool should have the capability to terminate the trace at a specific hop specified by an IP address or by specifying a limit on the number of hops. This helps in segmented tracing, where portions of the path can be traced. 2.5 Flow monitoring It should be possible to initiate flow monitoring on one or all of the intermediate devices, and should have the following capabilities. 2.5.1 Parameters for monitoring The tool should provide an extensible mechanism by which the monitoring station can ask for monitoring of certain parameters for the flow like input rate, packet drops, etc at a given network node. It should also be possible to request for packet samples for external monitoring tool to calculate statistics on the flow or interfaces The requested device may honor the monitoring request based on its policy, authentication of the requester and also the available resources on the device. It should be able to indicate back in the response if and what parts of the monitoring are activated. The period for which this monitoring is activated could be... 2.5.1.1 Time-based and Monitor-till-stop monitoring The OAM packet carries a time period and frequency of sampling, and the requested devices send the samples at the specified frequency for the specified time period. This could also be overridden by local policy. In case of the Monitor-till-stop monitoring the OAM packet will initiate the monitoring at a specific sampling rate. The Janardhanan Pathangi Expires November 2012 [Page 6] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 monitoring will continue till there is a new request for turning off the monitoring. A local policy can also override this behavior and restrict this to a maximum period that is locally defined. 2.6 Loop Detection The tool should be capable of detecting that OAM packets are being looped. If this happens the operation should be aborted. Appropriate heuristics may be considered while implementing this feature. 2.7 Additional Information Apart from reporting the incoming and outgoing interfaces, it would be useful for the tool to report on the following 2.7.1 Link statistics There is a necessity to collect useful information to enable operators to perform more detailed problem analysis or network optimization. The operator may need to know the utilization of the links along the path in addition to the fan-out information. This information could be for example be used by servers to selec source ephemeral ports in such a way as to avoid over-utilized links. Also disparities in LAG members with respect to over-utilization of some links and under-utilization of others could help the operator to tweak some of the available parameters or available hash functions for better load distribution. 2.7.2 Packet drops and their reasons Packets may get dropped due to a variety of reasons, and the OAM mechanism should be able to indicate the actual reasons for drop. The response OAM packet should indicate the error code appropriately for various reasons why a packet may have been dropped. 2.8 Additional enhancements Data center networks and applications have specialized needs. To accomplish this the new tools provide certain additional information for the data centers such as the following. 2.8.1 Fat-Tree traversal. The tracing of a fat-tree (i.e. all paths) from the source to the destination is a very important requirement from modern day administrators running say a data-center. This could be done within an administrative boundary and not beyond it. Janardhanan Pathangi Expires November 2012 [Page 7] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 2.8.2 Hash Algorithm Parameters When choosing from a set of ECMP links or LAG members it is common for a hash function to be performed on select header fields. This hash algorithm is important with respect to which ECMP or LAG member is chosen to forward the packet on. It would be useful to know which fields play a part in the computation of this choice. The actual hash function may be internal to the device and need not be returned since it may be proprietary to the vendor but the header fields accounted for in the hash function would provide enough information for the system / network administrator to vary these parameters in order to figure out a specific path through which the traffic for a flow or a set of flows can be engineered. 2.9 Future Requirements The requirements specified in this document relate to tools for trouble shooting IP layer connectivity with respect to IP nodes in the path from source to destination. This draft deals with tracing the IP nodes such as transit devices and the end target destination. A future version of this set of requirements would look at tracing the intermediate network between IP nodes. 3 Security Considerations This section discusses threats to which these new set of tools might be vulnerable and discusses means by which those threats might be mitigated. The following are some of the security requirements that need to be adhered to under this framework. 3.1 Securing Requests and Responses Tool developed under this framework should require mechanisms to secure the requests and responses. The security provided for these requests and responses should ensure integrity of these packets and ensure confidentiality if necessary. An administrator should be able to select the information that will be sent in insecure messages, should such secure mechanisms not be available. For example, the set of information exchanged in that case could be limited to the information obtainable via traditional Ping and Traceroute. 3.2 Information hiding There is a concern that tools developed to satisfy the requirements in this document might allow an external user to probe the detailed path that a flow takes through a network. To address this the network Janardhanan Pathangi Expires November 2012 [Page 8] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 operator could associate multiple security levels with the different types of information that may be included in the response to a discovery packet coming from a legitimate tool. For example only the "Next Hop Router" may be marked as publicly accessible information whereas everything else may be marked as private information. On receiving a flow discovery request packet originating outside the local network, only the publicly accessible information should be included in the response to the originator. However if the request was originated by a legitimate, known source the device could include all of the requested information in the response. The Result and Additional Information types specified in the section 2.7 provide detailed information about the processing of the request packet and may possibly leak information about the locally configured policies. The amount of information to be included in these sets of data should also depend on whether the request was originated from a legitimate source. The network operator may choose to silently drop the Flow Discovery Request packet without providing any indication of the reason for doing so if the request was originated externally. 3.3 Rate limiting obviating attack vectors Today most network operators throttle conventional OAM traffic (ping and traceroute, and other ICMP messages) that is serviced by the device to protect against Denial-of-Service attacks. Such mechanisms should be employed for OAM packets under this framework for the same reason. 4 IANA Considerations This document does not need any consideration from IANA. It is likely that tools under this framework may require new IANA assigned protocol ports that signify the specific OAM protocol that is to be implemented to satisfy such requirements. Tools developed to satisfy will require such IANA assignments as the needs arise. 4.1 Acknowledgements The authors would like to thank Ron Bonica for his thorough review and critique of the traceflow proposal [2]. We also would like to thank Melinda Shore for her direction and review of the traceflow proposal which gave rise to this document. The requirements presented in this document were a result of the traceflow proposal submitted to the IETF by A. Viswanathan, S. Krishnamurthy, R. Manur and V. Zinjuvadia. Janardhanan Pathangi Expires November 2012 [Page 9] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 5 References 5.1 Normative References [RFC 792] "Internet Control Message Protocol", 2005. [RFC 5556] "Transparent Interconnection of Lots of Links", May 2009. [IEEE SPB] "IEEE Shortest path Bridging", IEEE 802.1aq 5.2 Informative References [1] Zinjuvadia et.al, draft-zinjuvadia-traceflow-02.txt, Internet- Draft, August 2008 [2] Shane Amante et.al, draft-amante-oam-ng-requirements-01.txt, Internet-draft, February 2008. Authors' Addresses Janardhanan Pathangi, Dell-Force10, Olympia Technology Park, Fortius block, 7th & 8th Floor, Plot No. 1, SIDCO Industrial Estate, Guindy, Chennai - 600032. TamilNadu, India. Tel: +91 (0) 44 4220 8400 Fax: +91 (0) 44 2836 2446 Email: Pathangi_janardhanan@dell.com Balaji Venkat Venkataswami, Dell-Force10, Olympia Technology Park, Fortius block, 7th & 8th Floor, Plot No. 1, SIDCO Industrial Estate, Guindy, Chennai - 600032. TamilNadu, India. Tel: +91 (0) 44 4220 8400 Fax: +91 (0) 44 2836 2446 Email: BALAJI_VENKAT_VENKAT@dell.com Janardhanan Pathangi Expires November 2012 [Page 10] INTERNET DRAFT OAM Reqts. tools that enable flow Analysis May 2012 Richard Groves, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 Email: rgroves@microsoft.com Peter Hoose, Facebook, 1600, Willow Rd., Menlo Park, CA 94025 Email: phoose@fb.com Janardhanan Pathangi Expires November 2012 [Page 11]