idnits 2.17.1 draft-janapath-intarea-traceflow-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 567 has weird spacing: '...of Most used ...' == Line 1305 has weird spacing: '... that the c...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Filters with a corresponding PUNT to software action should be programmable in hardware to trap packets with UDP destination port signifying Traceflow packets. For platforms that support hardware based filtering would benefit most from this filter support. All Layer 3 devices would be most appropriate for programming this filter. However please note that the UDP port based filter will not be and SHOULD not be applied to MPLS packets or IP-in-IP tunneled packets. This tunneling variety of packets be it MPLS or IP-in-IP (include IP-GRE) are out of scope of this document. -- The document date (23 January 2012) is 4470 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '3' on line 153 == Missing Reference: 'RFC2119' is mentioned on line 166, but not defined -- Looks like a reference, but probably isn't: '2' on line 170 -- Looks like a reference, but probably isn't: '4' on line 1633 == Unused Reference: 'KEYWORDS' is defined on line 1682, but no explicit reference was found in the text == Unused Reference: 'RFC1776' is defined on line 1685, but no explicit reference was found in the text == Unused Reference: 'TRUTHS' is defined on line 1688, but no explicit reference was found in the text == Unused Reference: 'EVILBIT' is defined on line 1693, but no explicit reference was found in the text == Unused Reference: 'RFC5513' is defined on line 1696, but no explicit reference was found in the text == Unused Reference: 'RFC5514' is defined on line 1699, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1776 ** Downref: Normative reference to an Informational RFC: RFC 1925 (ref. 'TRUTHS') Summary: 3 errors (**), 0 flaws (~~), 11 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Intarea working group Janardhanan Narasimhan 3 Internet Draft Balaji Venkat Venkataswami 4 Intended Status: Proposed Standard Dell-Force10 5 Expires: 23 July 2012 Rich Groves 6 Microsoft 7 Peter Hoose 8 Facebook 9 23 January 2012 11 Traceflow 12 draft-janapath-intarea-traceflow-00.txt 14 Abstract 16 This document describes a new OAM protocol - TraceFlow that captures 17 information pertaining to a traffic flow along the path that the flow 18 takes through the network. TraceFlow is ECMP and link-aggregation 19 aware and captures the information about constituent members through 20 which the traffic flow passes. TraceFlow gathers information that is 21 relevant to the flow such as outgoing interface Layer 3 address, 22 Next-hop to which the packet of the flow is forwarded, effect of 23 network policies such as access control lists on the flow. This draft 24 requires the Traceflow protocol to be processed by Layer 3 devices 25 only. Devices such as Layer 2 devices, MPLS LERs/LSRs along the way 26 are passed through without any processing as if in a pass-through 27 mode. IP tunnels such as IP-in-IP, IP-in-GRE mechanisms are expected 28 to pass the Traceflow packets through them using the pass through 29 mode. For achieving its purpose Traceflow advocates the use of a 30 specific UDP destination port to be assigned from IANA. 32 Status of this Memo 34 This Internet-Draft is submitted to IETF in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF), its areas, and its working groups. Note that 39 other groups may also distribute working documents as 40 Internet-Drafts. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 46 The list of current Internet-Drafts can be accessed at 47 http://www.ietf.org/1id-abstracts.html 49 The list of Internet-Draft Shadow Directories can be accessed at 50 http://www.ietf.org/shadow.html 52 Copyright and License Notice 54 Copyright (c) 2012 IETF Trust and the persons identified as the document 55 authors. All rights reserved. 57 This document is subject to BCP 78 and the IETF Trust's Legal 58 Provisions Relating to IETF Documents 59 (http://trustee.ietf.org/license-info) in effect on the date of 60 publication of this document. Please review these documents 61 carefully, as they describe your rights and restrictions with respect 62 to this document. Code Components extracted from this document must 63 include Simplified BSD License text as described in Section 4.e of 64 the Trust Legal Provisions and are provided without warranty as 65 described in the Simplified BSD License. 67 Table of Contents 69 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 71 2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 2.1. Evolution of IP networks . . . . . . . . . . . . . . . . . 5 73 3. Packet Formats . . . . . . . . . . . . . . . . . . . . . . . . 7 74 3.1. Flow Discovery Request/Response Packet Format . . . . . . . 7 75 3.2. Flow Discovery Request TLVs . . . . . . . . . . . . . . . . 8 76 3.2.1. Flow Descriptor TLV . . . . . . . . . . . . . . . . . . 8 77 3.2.2. Originator Address TLV . . . . . . . . . . . . . . . . 10 78 3.2.3. Information Request bitmap TLV . . . . . . . . . . . . 11 79 3.2.4. Termination TLV . . . . . . . . . . . . . . . . . . . . 12 80 3.3. Flow Discovery Response TLVs . . . . . . . . . . . . . . . 13 81 3.3.1. Information Response TLV . . . . . . . . . . . . . . . 13 82 3.3.1.1 Utilization Anomaly TLV . . . . . . . . . . . . . . 16 83 3.3.2. Result TLV . . . . . . . . . . . . . . . . . . . . . . 19 84 3.3.3. Additional Informational Code TLV . . . . . . . . . . . 21 85 3.4. TLVs common to Flow Discovery Request and Response . . . . 22 86 3.4.1. Encapsulated Packet TLV . . . . . . . . . . . . . . . . 22 87 3.4.2. Encapsulated Packet Mask TLV . . . . . . . . . . . . . 24 88 3.4.3. Record Route TLV . . . . . . . . . . . . . . . . . . . 25 89 4. Protocol Operation . . . . . . . . . . . . . . . . . . . . . . 26 90 4.0.1 Assessing why redundant responses come through. . . . . 30 92 4.1. Using Hardware to gather details for the response packet. . 31 93 4.2 Interaction with MPLS based transit devices. . . . . . . . . 31 94 4.3 Applicability to Layer 2 devices. . . . . . . . . . . . . . 31 95 4.4 Applicability to platforms that have trouble determining 96 incoming Interface. . . . . . . . . . . . . . . . . . . . . 31 97 4.5 Applicability to Network Address Translators . . . . . . . . 31 98 5. Application Scenarios . . . . . . . . . . . . . . . . . . . . . 32 99 5.1. Troubleshooting network failures . . . . . . . . . . . . . 32 100 5.2. Network flow planning . . . . . . . . . . . . . . . . . . . 33 101 5.2.1 Programmatic migration to mitigate LAG link 102 polarization . . . . . . . . . . . . . . . . . . . . . . 34 103 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 35 104 7. Hardware pre-requisites for implementing Traceflow. . . . . . . 35 105 7.1 filter to trap packets with UDP destination port . . . . . . 35 106 7.2 Packet injection mode directly to egress port. . . . . . . . 36 107 7.3 Packet injection mode through hardware engine but not to 108 output port. . . . . . . . . . . . . . . . . . . . . . . . . 36 109 7.4 Hardware rate limiter support (preventing DOS attacks) . . . 36 110 7.5 RPF check support in hardware (security consideration) . . . 36 111 7.6 Regular Security ACLs in the boundary of the network. . . . 37 112 7.7 Implementing the LAG / ECMP using software state . . . . . . 37 113 7.8 Implementation considerations . . . . . . . . . . . . . . . 37 114 7.7.l Using ingress port as part of the LAG/ECMP hashing 115 function. . . . . . . . . . . . . . . . . . . . . . . . . 37 116 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 38 117 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 38 118 APPENDIX A: . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 119 A.1. Encapsulation Format Choices . . . . . . . . . . . . . . . 38 120 A.1.1. Carrying a separate Flow Descriptor TLV inside the 121 Flow . . . . . . . . . . . . . . . . . . . . . . . . . 38 122 A.1.2. Using the traffic flow's parameter values in the 123 external header. . . . . . . . . . . . . . . . . . . . 39 124 A.2. Layer 4 Protocol Choices and Router Alert option . . . . . 39 125 A.2.1. UDP Encapsulation . . . . . . . . . . . . . . . . . . . 39 126 A.2.2. ICMP Encapsulation . . . . . . . . . . . . . . . . . . 39 127 A.3. Legacy Devices (Not supporting TraceFlow) . . . . . . . . . 40 128 A.4. TTL Scoping . . . . . . . . . . . . . . . . . . . . . . . . 40 129 A.5. Additional Information in the Flow Discovery Response . . . 40 130 A.6. Choices for supporting remote TraceFlow requests . . . . . 41 131 A.6.1. Terminating the request at the Proxy device and 132 re-originate it . . . . . . . . . . . . . . . . . . . . 41 133 A.6.2. Source-Routing the request through the Proxy device . . 41 134 A.7. Applicability to Multicast . . . . . . . . . . . . . . . . 41 135 A.8. Applicability to Layer 2 networks . . . . . . . . . . . . . 41 136 A.9. Applicability to IPv6 . . . . . . . . . . . . . . . . . . . 42 137 A.10. Applicability to MPLS . . . . . . . . . . . . . . . . . . 42 138 A.11. Flow Discovery and Response packet fragmentation . . . . . 42 139 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 42 140 9.1. Normative References . . . . . . . . . . . . . . . . . . . 42 141 9.2. Informative References . . . . . . . . . . . . . . . . . . 42 142 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 144 1 Introduction 146 TraceFlow protocol allows user to determine the path taken by a flow 147 through a network. It provides capability to collect relevant 148 information at each hop of the network that pertains to the 149 forwarding for the flow. Information can include individual member 150 information in a link-aggregation group (LAG) or ECMP. 152 There is a need for a mechanism that allows user to determine the 153 path that a flow takes through a network [3]. Current solutions (such 154 as traceroute) do not provide the details about the exact physical or 155 logical interface through with the flow passes in cases where LAG 156 and/or ECMP are employed or policy based routing is in effect. 158 Such information at intermediate hops in the network can prove to be 159 useful to network operators in trouble-shooting network failures. 161 1.1 Terminology 163 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 164 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 165 document are to be interpreted as described in RFC 2119 [RFC2119]. 167 2. Motivation 169 Network operators have traditionally managed IP networks with classic 170 OAM tools like Ping and Traceroute[2]. Operators typically use Ping 171 to perform end-2-end connectivity checks, and Traceroute to trace 172 hop-by-hop path to a given destination. Traceroute is also used to 173 isolate the point of failure along the path to a given destination. 174 These tools have performed very well for the IP networks they were 175 designed for. 177 2.1. Evolution of IP networks 179 With the passage of time networks have morphed into more complex 180 heterogeneous entities. Many a times Layer-2 switches and MPLS LSRs 181 are intermixed with IP routers. MPLS ping and MPLS traceroute also 182 known as LSP ping and LSP traceroute handle the identification of the 183 intermediate hops through which they travel, using methods such as 184 router alert label. Relevant RFCs specify these methods as far as 185 MPLS troubleshooting goes. This document doesnt intend to interfere 186 with the MPLS OAM methods. Traceflow is exclusively intended for pure 187 Layer 3 troubleshooting and will not troubleshoot layer 2 device 188 failure or MPLS transit node failure. Also plain IP-in-IP tunneling 189 varieties of forwarding will not be of interest in this document. 191 Increasing number of networks are using multipath configurations to 192 improve load-balancing and redundancy in their networks. These 193 multipaths could be in the form of end-2-end ECMP paths, or LAGs 194 between directly connected hops. Existing tools such as Ping and 195 Traceroute that follow the destination IP address based routing model 196 may not follow the path taken by the actual traffic in multipath 197 and/or policy based routing scenarios. The forwarding of actual 198 traffic in such scenarios is based on a set of packet header fields. 199 Clearly, the OAM tools have not kept up with the new requirements of 200 the evolving networks. Hence there is a need to extend the OAM tools 201 to facilitate the operators to execute new OAM functions: 203 1. Perform Ping or traceroute based on a set of link layer and/or 204 TCP/IP header fields of actual user traffic. This feature will be 205 very useful for troubleshooting network problems, and 206 planning/provisioning network resources. 208 2. Trace end-2-end paths comprising of a mix of Layer-2 hops, 209 IP+MPLS routers along the way. Layer 2 hops and MPLS hops are 210 traversed through in pass through mode. 212 3. Collect more intelligent and useful information to enable 213 operators to perform more detailed problem analysis. 215 This document proposes a new OAM protocol - TraceFlow that attempts 216 to bridge the gap between today's fast evolving networks and the 217 traditional OAM tools. The following section (Section 3) discusses 218 the packet formats used by TraceFlow to avoid forward references in 219 subsequent sections. It is suggested that first-time readers skip 220 section 3 and read the Protocol Overview in Section 4. Applications 221 scenarios are discussed in section 5 and the security considerations 222 in section 6. 224 3. Packet Formats 226 3.1. Flow Discovery Request/Response Packet Format 228 Flow Discovery Request and Response packets follow the general format 229 shown below. The TLVs included in each message type may be 230 different. 231 0 232 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 233 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 234 | Version | Hopcount | Length | 235 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 236 | Type | Reserved | Query ID | 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 238 | 16-byte opaque System Identifier of the Requestor. | 239 // // 240 | Used as a unique identifier of the system requesting. | 241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 242 | 16-byte opaque System Identifier of the Responder. | 243 // // 244 | Used as a unique identifier of the system Responding. | 245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 246 | TLVs... | 247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 249 The Flow Discovery Request packet SHOULD be sent with the DF bit set 250 in the external IP header. 252 Version: The version number of the protocol. This document defines 253 protocol version 1. 255 Hopcount: Allows keeping track of the number of transit nodes that 256 processed the Flow Discovery Request packet. This field is 257 decremented at each device that processes the Flow Discovery Request 258 packet. This field also helps in determining if there were any legacy 259 devices not supporting TraceFlow protocol along the way. 261 Length: Length of the packet including the length of the header. This 262 offers a mechanism whereby the length of the payload can be 263 determined by a simple subtraction of header length from this given 264 Length field. 266 Type: 1 Direct Flow Discovery Request - Ping mode 268 2 Direct Flow Discovery Request - Traceroute mode 270 3 Indirect Flow Discovery Request - Ping mode 271 4 Indirect Flow Discovery Request - Traceroute mode 273 5 Response for the Flow Discovery Request 275 Reserved: This field should be set to zero on transmit and ignored on 276 received entity. Future use could be determined at a later version of 277 the protocol. 279 Query ID: A unique identifier generated by the originator that allows 280 it to co-relate the responses from the transit nodes with the Flow 281 Discovery Request packet generated. 283 System Identifier: (Requestor and Responder) This is a opaque 16 byte 284 field, which would be unique per node in that network, and it is up 285 to the administrators to define what this means within their network, 286 as long as they ensure that it is unique across all the nodes in that 287 network. The Requestor fills in its System Identifier in its request 288 packet while the Responder fills in both Requestor field (from the 289 packet received) and the Responder field which corresponds to its 290 System Identifier. Thus the Discovery Request packet contains the 291 Requestor System Identifier and the Response packet contains both 292 Requestor and Responder System Identifier as well. 294 The TLVs are divided into three categories: 296 1. TLVs that can show up in the Flow Discovery Request packet 298 2. TLVs that can show up in the Flow Discovery Response packet 300 3. TLVs that can show up in the Flow Discovery Request as well as 301 Response packet 303 Those TLVs that are not understood in previous versions of the 304 protocol are ignored. These TLVs SHOULD be considered as opaque and 305 passed along to the next transit device along the path. Hence these 306 opaque TLVs are treated as transitive for versions of the protocol 307 that dont understand them. 309 3.2. Flow Discovery Request TLVs 311 3.2.1. Flow Descriptor TLV 313 This TLV is included in the Flow Discovery Request packet and 314 identifies the traffic flow that the originator device is interested 315 in probing. This is a mandatory TLV. 317 The definition of a traffic flow varies from one network to another. 318 Most traffic flows in today's networks can be uniquely identified 319 using fields from the data packet's headers. TraceFlow protocol 320 requires the first 256 bytes of the traffic flow's data packet to be 321 encoded in this Flow Descriptor TLV. For version 1 including the 322 versions to come henceforth, these 256 bytes SHOULD include the Layer 323 2 headers as well. This way when Traceflow supports Layer 2 devices 324 the information in the 256 bytes would help to discover intermediate 325 Layer 2 devices as well. 327 0 1 2 3 328 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 330 | Type | Code | Length | 331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 332 | Value... | padding | 333 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 335 Type: The type of the TLV. In this case, the value is 1 meaning Flow 336 Descriptor TLV 338 Code: The Code identifies the sub-type of the TLV. In this case, this 339 field is not defined. It SHOULD be set to 0. 341 Length: The length of the TLV 343 Value: The value encoded in this TLV depending on the Type and the 344 Code specified 346 Padding: This might be necessary to ensure the packet ends on a word 347 boundary 349 Refer to section 3.4.1.1 (Encapsulated Packet TLV) that describes how 350 a data packet can be used to specify the traffic flow. 352 3.2.2. Originator Address TLV 354 This TLV carries the address of the originator of the Flow 355 Discovery Request packet. The responses from the intermediate devices 356 processing the request are sent to this address. This is an optional 357 TLV to be included only when an Indirect Flow Discovery Request is 358 originated. 360 0 1 2 3 361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 363 | Type | Code | Length | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | Value... | padding | 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 368 Type: 2 Originator Address 370 Code: 1 IPv4 Address 372 2 IPv6 Address 374 3.2.3. Information Request bitmap TLV 376 This TLV is used by the originator device to specify the information 377 requested for the flow identified by the Flow Descriptor TLV in the 378 Flow Discovery Request packet. This is an optional TLV. In absence of 379 this TLV, the transit and the end devices processing the Flow 380 Discovery Request packet respond with the default set of 381 information. 383 0 1 2 3 384 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 386 | Type | Code | Length | 387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 388 | Flags... | 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 391 Type: 3 Information Request 393 Code: 1 Incoming Interface related 395 2 Outgoing Interface related 397 Flags: 399 Bit 0 : IP Address 401 Bit 1 : SNMP ifName 403 Bit 2 : SNMP ifIndex and ifType 405 Bit 3 : Lag details. 407 Bit 4 : Ecmp details. To be specified only for Outgoing interface. 409 Bit 5 : Hash algorithm. To be specified only for Outgoing 410 interface. 412 Note that the Hash algorithm mask TLVs can be specified in the 413 response packet. But the actual hash algorithm need not be specified 414 in the response packet. 416 Code: 3 Global information 418 Flags: 420 Bit 0 : Next Hop Router Address 422 3.2.4. Termination TLV 424 This TLV includes a list of addresses. If a device notices that it 425 owns any of the addresses listed in this TLV, it MUST NOT forward the 426 Flow Discovery request packet any further and MUST respond to the 427 originator with a Flow Discovery Response packet. 429 0 1 2 3 430 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 432 | Type | Code | Length | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 | Address-type | Address... | 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 | Address-type | Address... | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 // // 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | Address-type | Address... | 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 443 Address-type: 445 0x1: IPv4 Address 447 0x2: IPv6 Address 449 Address: The address where the request MUST be terminated. 451 3.3. Flow Discovery Response TLVs 453 3.3.1. Information Response TLV 455 This TLV is used by the devices processing the Flow Discovery Request 456 packet to provide the information requested by the originator device. 457 This is a mandatory TLV. It should be included in the response sent 458 to the device originating the Flow Discovery Request packet. 460 0 1 2 3 461 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 463 | Type | Code | Sub-Code | 464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 | Length | Value... | padding | 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 Type: 5 Information Response 470 Code: 1 Incoming Interface related 472 2 Outgoing Interface related 474 Sub-Code: 476 0 : IP Address 478 1 : SNMP ifName 480 2 : SNMP ifIndex and ifType 482 3 : Lag details 484 4 : Ecmp details. To be specified only for Outgoing interface. 486 5 : Hash algorithm. To be specified only for Outgoing interface. 488 The LAG and ECMP details are described in more detail. Following is 489 the frame format if the originator device requested LAG or ECMP 490 related details. 492 0 1 2 3 493 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 495 | Type | Code | Sub-Code | 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | Length | No. of members | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 | Component Link Information.. | 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 501 // // 502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 503 | Component Link Information.. | 504 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 506 No. of members: This is the number of members in the LAG or the ECMP 507 segment that is being described 509 Component Link Information: Individual component links are encoded in 510 this field. The "No. of members" field describes how many component 511 links are listed. 513 The frame format for the "Component Link Information" portion of the 514 TLV is shown below. 516 0 1 2 3 517 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 | SNMP ifIndex | 520 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 521 | SNMP ifType | 522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 523 | Flags | SNMP ifName length | 524 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 525 | SNMP ifName... | 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 SNMP ifIndex: The ifIndex of the component link being specified 530 SNMP ifType: The ifType of the component link being specified 532 Flags: 534 0x1: If set, the Component Link is administratively down. 536 0x2: If set, the Component Link is operationally down. 538 If the above cannot be determined then the flags SHOULD be set to 539 0. 541 The rest of the bits in the Flags field are reserved. 543 3.3.1.1 Utilization Anomaly TLV 545 An optional TLV to report LAG utilization anomaly is also included. 546 The user could configure a threshold of congruence with respect to 547 utilization amongst the least utilized member of the LAG and the 548 maximally used member of the LAG. If say the threshold is configured 549 as 80% and if the difference in utilization between the least 550 utilized member of the LAG and the maximally used member of the LAG, 551 then an anomaly TLV is sent to report such a condition. On getting 552 this Utilization anomaly TLV the Originator device could report this 553 to the user and a subsequent NMS query to the appropriate device 554 could reveal more information into this anomaly. 556 The TLV format for this Utilization anomaly TLV would be as follows. 558 0 1 2 3 559 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 561 | Type | Code | Sub-Code | 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 563 | Length |Configured Divergence Threshold| 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 565 | SNMP ifIndex of Least used component link in the LAG | 566 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 567 | SNMP ifIndex of Most used component link in the LAG | 568 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 569 |Actual Divergence in percentage| Padding... | 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 It is important to note that in the case of this Optional TLV the 573 device which reports it to the Originator should support keeping 574 track of the rate at which each member unit of the LAG is forwarding 575 traffic and report the divergence in terms of the rate. If the 576 implementation cannot keep track of the rate then it would have to 577 report the divergence in terms of packet counts. But the latter might 578 lead to a mis-interpretation in case of link up down events or other 579 conditions. 581 TLV format specifies the packet fields that are used by the hash 582 algorithm configured on the device. 584 0 1 2 3 585 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 | Type | Code | Sub-Code | 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 | Length | No. of hash parameters | 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 591 | byte-offset-1 | no. of bytes | 592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 593 | byte-offset-2 | no. of bytes | 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 595 | ... | 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | Encapsulated Packet ... | 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 600 No. of hash parameters: This specifies the number of parameters in 601 the packet that are used by the hash algorithm to calculate the 602 egress port 604 Byte-offset-N: This is the offset to the start of the Nth parameter 605 that is used by the hash algorithm to calculate egress port 607 No. of bytes: For the byte-offset specified, the number of bytes 608 starting at that offset that are used by the hash algorithm 610 Encapsulated Packet: The encapsulated packet received in the Flow 611 Discovery Request packet on the input port by the device is returned 612 in the response packet. This should be the packet that is used in the 613 egress component link calculations by the device processing the Flow 614 Discovery Request packet. 616 Note that the Hash algorithm mask TLVs can be specified in the 617 response packet. But the actual hash algorithm need not be specified 618 in the response packet. 620 The following TLV is mandatory. 622 0 1 2 3 623 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 625 | Type | Code | Sub-Code | 626 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 627 | Length | Reserved | 628 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 629 | Address-type | Next Hop Address ... | 630 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 632 The next hop address is encoded as shown above. 634 Code: 3 Global information 636 Sub-Code: 638 1 Next Hop Address 640 Address-type: 642 0x1: IPv4 Address 644 0x2: IPv6 Address 646 Next Hop Address: This field carries the next hop address. 648 3.3.2. Result TLV 650 The device processing the Flow Discovery Request packet includes a 651 Result TLV in the response to the originator device to indicate the 652 result of the processing. This TLV is mandatory. 654 0 1 2 3 655 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 656 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 657 | Type | Code | Length | 658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 659 | Result Code | Sub-code | Diagnostic Data.. | 660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 661 | Diagnostic Data... | padding | 662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 664 Type: 7 Result TLV 666 Result Code: This field carries a value indicating the result of the 667 processing of the Flow Discovery Request packet 669 Sub-Code: This field further qualifies the "Result Code" field and 670 provides more information about the result of processing the Flow 671 Discovery Request packet 673 Diagnostic Data: This field is used in conjunction with the "Result 674 Code" and "Sub-code" to return any information that may be useful to 675 the originator of the Flow Discovery Request packet. Its format is 676 defined based on the "Result Code" and "Sub-code" field. 678 Result Code: 1 Success 680 Result Sub-code: 0 682 Result Code: 2 Administratively disabled 684 Result Sub-code: 0 686 Diagnostic Data: A list of Information Request Sub-Codes that are not 687 being fulfilled. These Sub-Codes could indicate whether the outgoing 688 interface is currently disabled or not. If the forwarding tables in 689 hardware are set to the interface which has been Administratively 690 disabled then that would indicate an error in those tables which may 691 lead to a confirmation that the software state is not in sync with 692 the hardware. 694 Result Code: 3 Routing failure 696 Result Sub-code: 1 No route in table 698 Result Sub-code: 2 RPF check failed 700 Result Sub-code: 3 ARP Failure. 702 Result Code: 4 Packet Error 704 Result Sub-code: 1 hopcount = 0 706 This may be the case where the TTL has counted down to 0 in IPv4 or 707 Hopcount has counted down to 0 in IPv6. This is a method by which 708 even if the ICMP "Time to Live Exceeded" packets are dropped on the 709 way back, the Originator may be able to determine that the TTL 710 counted down to zero. 712 Result Code: 5 Malformed packet 714 Result Sub-code: 1 Unknown TLVs for this version. 716 In this case the packet is not dropped but forwarded with the unknown 717 TLVs. This offers the older versions of the protocol the ability to 718 report back to the originator that the packet was processed but with 719 one or more unknown TLVs, but that the packet was forwarded to the 720 next transit device with the unknown TLVs. 722 Result Code: 6 Data-path Error 724 Result Sub-code: 1 Fragmentation needed but not allowed by Flow 725 Information TLV in Flow Discovery Request packet 727 Result Code: 7 Generic Error 729 Result Sub-code: 0 (TBD: Sub-codes to identify the type of error 730 may need to be defined) 732 3.3.3. Additional Informational Code TLV 734 This TLV may accompany the Result TLV if the device processing the 735 Flow Discovery Request packet has any additional information that the 736 originator device may be interested in. This TLV is optional. 738 0 1 2 3 739 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 741 | Type | Code | Length | 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 743 | Status Code | Sub-code | Additional Data.. | 744 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 745 | Additional Data... | 746 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 748 Type: 8 Additional Informational Code 750 Status code: 1 ACL drop 752 Status Sub-code: 1 Ingress ACL drop 754 Status Sub-code: 2 Egress ACL drop 756 Status code: 2 Dataplane failure 758 Status Sub-code: 1 Switch fabric failure 760 Status Sub-code: 2 Linecard failure 762 Status Sub-code: 3 Port failure 764 Status Code: 3 Generic Information 766 Status Sub-code: 1 TTL/Hopcount mismatch noticed 768 Status Sub-code: 2 Default route used to forward packet 770 Status Sub-code: 3 Per-packet load-balancing enabled. 772 In case of TTL/Hopcount mismatch, the "Additional Data" field carries 773 the difference in the Hopcount and the IP TTL field values. This may 774 provide an indication of the number of previous hop routers that did 775 not support TraceFlow protocol. 777 3.4. TLVs common to Flow Discovery Request and Response 779 3.4.1. Encapsulated Packet TLV 781 This TLV is included in the Flow Discovery Request and is returned in 782 the Flow Discovery Response packet by devices processing the request 783 packet. In the response packet, this TLV contains the encapsulated 784 packet as it was received from the previous-hop device. It helps the 785 originator keep track of how the data packet gets modified along the 786 way. This TLV is mandatory. 788 0 1 2 3 789 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 790 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 791 | Type | Code | Length | 792 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 793 | Flags | First Hdr | Reserved | 794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 795 | Encapsulated Packet... | 796 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 798 Type: 1 Flow Discovery Request 800 Code: 1 Encapsulated traffic flow data packet 802 Encapsulated Packet: The first 256 bytes of a data packet 803 belonging to the flow are encapsulated in this field of the packet 805 Flags: 807 0x1: fan-out option; if set, the transit node SHOULD forward the 808 Flow Discovery Request packet to all possible egress links for the 809 specified flow. Since use of the fan-out option is liable to create 810 multiple instances of the packet through each egress link possible in 811 a LAG or ECMP situation, this should be used with caution. A specific 812 admin command knob should be available to turn this option off or on, 813 on the device. Thus even if fan-out is requested in the Flags the 814 fan-out discovery is done only if the said transit device permits it 815 through an admin command knob. 817 First Hdr: Specifies the first header that appears in the 818 encapsulated packet. The values defined by this document are: 820 0x1: Layer 2 MAC Header 822 0x2: IPv4 Header 823 0x3: IPv6 Header 825 0x4: MPLS Header 827 3.4.2. Encapsulated Packet Mask TLV 829 This TLV allows the operator to specify what portion of the 830 encapsulated packet carries flow data and what portion is left 831 unspecified. This allows the intermediate nodes to determine if they 832 have enough information to calculate an egress interface to forward 833 the Flow Discovery Request packet. If this TLV is omitted from the 834 Flow Discovery Request packet, no portion of the packet is left 835 unspecified and the transit device may use any of the fields to make 836 the forwarding decision. This TLV is optional. 838 This TLV includes a sequence of tuples. 841 0 1 2 3 842 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 | Type | Code | Length | 845 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 846 | No. of tuples | byte-offset-1 | 847 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 848 | no. of bytes | byte-mask-1 | 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 850 | byte-offset-2 | no. of bytes | 851 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 852 | byte-mask-1 | ... | 853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 855 No. of tuples: Total number of 856 tuples carried in this TLV 858 Byte-offset: The byte offset for the field being specified 860 No. of bytes: The number of bytes from the byte-offset to 861 consider 863 Mask: The mask to be applied to the bytes starting at the byte- 864 offset. This specifies the bits starting at byte-offset the length of 865 which is specified by the number of bytes which is to be used in 866 determinaton of the information to calculate the egress interface to 867 forward the Flow Discovery Request packet. 869 3.4.3. Record Route TLV 871 This TLV is used to record the information about the path taken by a 872 Flow Discovery Request packet as it traverses through the network. It 873 is included by the originator and each transit device processing the 874 Flow Discovery Request packet includes information about its incoming 875 interface in this TLV. This TLV is included in the response sent by 876 the transit nodes (in trace-route mode) to the originator of the 877 Flow 879 Discovery Request packet. This TLV is optional. However if it is 880 included by the originator node in the Flow Discovery Request packet, 881 the subsequent nodes SHOULD prepend to the list of addresses. 883 0 1 2 3 884 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 885 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 886 | Type | Code | Length | 887 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 888 | Address-type | Incoming interface Address... | 889 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 891 Type : 9 Record-Route TLV 893 Code: 1 895 Address-type: 897 0x1: IPv4 Address 899 0x2: IPv6 Address 901 Incoming interface Address: This field carries the incoming 902 interface address at the device processing the Flow Discovery Request 903 packet. Each node receiving the request packet with this TLV should 904 prepend its incoming interface address to this TLV. 906 The device SHOULD include the Record-Route TLV as it received on its 907 input interface in the Flow Discovery Response packet it sends out. 909 4. Protocol Operation 911 A Flow Discovery Request packet is a UDP packet addressed to a well- 912 known destination port. The source UDP port in the packet is 913 ephemeral. It consists of a "Flow Descriptor" TLV that allows the 914 originator of the request to encode a flow data packet in the TLV. 915 On Layer 3 or multi-layer devices that incorporate Layer 3 based 916 forwarding, using a UDP port would be most useful. Hardware support 917 for this needs to be provided in terms of programming a filter that 918 inspects a packet for a specific UDP destination port and punts the 919 same to the software. Layer-2 devices in L2 clouds are passed through 920 and so are MPLS LSRs. For the pure L3 devices the ability to setup 921 the filter to enable traceflow should be turned on by a per-device 922 knob. 924 Certain fields in a traffic flow data packet get modified by the 925 transit devices as the data packet traverses the network. A transit 926 device that processes a Flow Discovery Request packet would need to 927 edit those fields in the encapsulated data packet that represents the 928 flow. Some such fields are source and destination MAC Addresses and 929 MPLS label stack. 931 Consider a transit device that uses the source or destination MAC 932 address of a data packet in order to determine the egress port. The 933 transit device could choose to pick up the MAC addresses from the 934 external header of the Flow Discovery Request packet or from the 935 encapsulated packet. 937 TraceFlow can operate in two separate modes: 939 1) Trace-route mode: In the traceroute mode of operation, each 940 transit device and the end node respond to the Flow Discovery Request 941 packet by sending a flow discovery response. 943 2) Ping mode: Transit nodes do not send a response message to the 944 originator. Rest of the behavior is same as traceroute mode. 946 The following applies to Ping and Traceroute mode unless otherwise 947 specified. 949 The destination address of the Flow Discovery Request packet is the 950 destination address for the desired traffic flow. In Ping mode a 951 separate TLV may be included that specifies a list of addresses. If a 952 device processing the Flow Discovery Request packet notices that one 953 of its IP addresses matches with one of the addresses specified in 954 the Termination TLV, then the device MUST NOT forward the Flow 955 Discovery Request packet further and send a response packet to the 956 originator. 958 The Flow Discovery Request packet travels the exact same path that a 959 data packet for the specified traffic flow would have followed. This 960 includes the exact physical or logical interface that belongs to a 961 LAG or a set of ECMP paths. It is important to note here that the 962 hardware supports a mechanism to determine where the packet would be 963 forwarded and send the result to the software as well as inject the 964 packet to the next-hop along the way to the destination. 966 If per-packet loadbalancing is enabled on the way to the destination 967 then it would be ambiguous to return the Discovery response packet 968 since another iteration of flow discovery packets headed through the 969 node would result in packet being forwarded across a interface 970 (logical or otherwise) which is different from the one in the 971 previous iteration. So if per-packet is enabled on the multipaths 972 that exist (ECMP or otherwise) it is important to return in the 973 response packet that it is so configured on that node. A status code 974 is reserved for this to note this anomaly. This may totally vary the 975 path that is taken by a traceflow packet than an actual data packet 976 if two or more ECMP or UCMP paths exist. 978 The device interested in receiving information about the traffic flow 979 originates a Flow Discovery Request packet. The Flow Descriptor TLV 980 in this packet specifies the flow of interest whereas a Requested 981 Information TLV specifies the flow related information that the 982 originator device is requesting from each transit router. The Flow 983 Discovery packet needs to be processed by all routers along the path 984 to the destination. This can be achieved by using a well-known UDP 985 port as the destination port in the UDP header. When a transit device 986 receives a Flow Discovery Request packet, it reads the flow 987 information from the Flow Descriptor TLV, looks up the local 988 forwarding database(s) and determines an egress port or ports for 989 this traffic flow. The transit device forwards the Flow Discovery 990 packet along the egress port calculated using this lookup. The egress 991 port is calculated based on the flow information from the Flow 992 Descriptor TLV in the request packet and not based on destination IP 993 address in the IP header of the Flow Discovery Request packet. 995 When processing the Flow Discovery Request packet, the transit node 996 MUST consider the packet length specified in the encapsulated packet 997 in the Flow descriptor TLV. 999 The transit device also gathers the relevant information for the flow 1000 which could include details such as: 1002 1. incoming and outgoing interface related details such as 1003 ifIndex, IP Address, Lag and ECMP related information. 1005 2. Next-Hop Router information 1007 The transit device processing the Flow Discovery Request packet may 1008 choose to respond to only a subset of the information requested in 1009 the Flow Discovery Request packet. 1011 The transit device includes additional information related to the 1012 incoming or outgoing LAG or ECMP interface. This additional 1013 information includes the number of LAG or ECMP links that are 1014 configured and their operational status and the parameters included 1015 in the hashing algorithm that is used to select an egress port for 1016 the traffic flow. 1018 This information is sent back to the IP address specified as the 1019 Originator IP Address in the Flow Discovery Request packet. In case 1020 the Indirect Request is used the Originator TLV specifies the IP 1021 address else the source IP address in the outer header is the 1022 Originator address. 1024 The Flow Discovery Request packet includes a hop count field which is 1025 initialized to the same value as the IP header's TTL field. This hop 1026 count field is decremented by one at each intermediate hop router 1027 that processes the Flow Discovery Request. In conjunction with the 1028 TTL field in the IP header this hop count field can help determine if 1029 there are any intermediate routers that do not support the TraceFlow 1030 protocol. When an intermediate hop router detects that the hop count 1031 field is greater than the IP header TTL field it indicates that one 1032 or more previous hop routers do not support the TraceFlow protocol. 1033 This information is added to the response sent to the Originator IP 1034 Address. Thus the intermediate router after one or more hops of 1035 devices not supporting Traceflow, will determine the fact that one or 1036 more previous devices did not support Traceflow. The output at the 1037 Originator end can be customized to display in the following format.. 1039 Device 1: (Description) (Traceflow capable) 1041 Unknown Devices : n (where n >= 1) 1043 Device 2: (Description) (Traceflow capable) 1045 The IP TTL field as well as the hopcount field SHOULD be initialized 1046 to values that limit the Flow Discovery Request packet to the desired 1047 network boundary. This may be required to restrict the Traceflow 1048 packets to specific boundaries within an administrative domain given 1049 that there are well defined such boundaries within the domain. 1051 A router can originate periodic Flow Discovery Requests for a traffic 1052 flow. The Query ID field in the Flow Discovery Request packet helps 1053 the originator identify the responses from the transit routers as 1054 they process the request. 1056 When processing a Flow Discovery Request packet at a device along the 1057 path towards the destination it is likely that the device may 1058 encounter an error condition and is not able to continue processing 1059 the packet. Some examples of the error conditions are: 1061 1. TraceFlow protocol has been administratively disabled 1063 2. Unicast RPF check failed for the flow specified in the Flow 1064 Discovery Request packet 1066 3. No route exists in the routing table to route the flow 1067 specified in the Flow Descriptor TLV. 1069 4. IP TTL or the Hop Count field in the Flow Discovery Packet 1070 becomes zero. 1072 The "Result TLV" is used to carry this information back to the 1073 originator of the Flow Discovery Request packet. 1075 It is also likely that the device is able to successfully process the 1076 Flow Discovery Request packet; however it encounters a condition 1077 during the processing that may be of interest to the originator. Some 1078 examples of such conditions are: 1080 1. The flow specified in the Flow Descriptor TLV would be dropped 1081 due to Ingress ACL or Egress ACL policies 1083 2. Dataplane failure may prevent the specified flow from being 1084 successfully switched/routed. 1086 3. IP TTL and the Hop-count field in the Flow Discovery Request 1087 packet do not match possibly due to one or more previous hop routers 1088 not supporting the TraceFlow protocol. 1090 4. The specified flow would be routed using default route in the 1091 routing table. 1093 This information is returned to the originator of the Flow Discovery 1094 Request packet using the "Additional Information Code TLV". 1096 The originator of the Flow Discovery Request packet may set the fan- 1097 out bit in the Flow Descriptor TLV to request the transit node to 1098 forward the request packet through all possible egress ports for the 1099 specified flow. The transit device would process the Flow Discovery 1100 Request packet as described above and forward it out of all possible 1101 egress ports in multipath scenarios. If the fan-out option is 1102 selected, the Flow Discovery Request packet received, is forwarded 1103 only on the primary port of the LAG interface. The primary port 1104 selected may differ from vendor to vendor. This helps reduce the 1105 number of redundant request packets generated as a result of the fan- 1106 out behavior. The originator of the request packet with the fan-out 1107 option enabled may get redundant responses in certain circumstances. 1109 Note that the LAG details are provided in the response packet, only 1110 if the LAG exists on an L3 device. This is due to the fact that L2 1111 devices supporting LAG do not have the capability to process the 1112 Traceflow protocol for now. In future drafts L2 support may be added 1113 to the Traceflow protocol and at that point it may be dealt with in 1114 detail. 1116 4.0.1 Assessing why redundant responses come through. 1118 In case a fan-out happens at a initial point in the path towards the 1119 destination, there might be a case that the paths diverge initially 1120 and cover a few transit devices before they re-converge to one more 1121 points to the destination. In this case the multiple fan-out 1122 Discovery packets may result in redundant responses from the same re- 1123 converged transit devices along the way. This can be used to find out 1124 if there exist totally dis-joint paths to the destination. If the 1125 redundant responses emanate from the ultimate destination it is 1126 reasonably easy to figure out that there exist totally dis-joint 1127 paths to the destination. But if in case redundant responses arise 1128 from transit devices much earlier than the destination there would be 1129 a need to assume that the reconvergence of paths (partially dis-joint 1130 case) has occurred earlier to the ultimate destination. This would be 1131 a most opportune moment to use this feature for finding all possible 1132 paths by correlating the information received at the originator using 1133 an Network management station on an appliance or otherwise. 1135 The Flow Discovery Request packet SHOULD pass through the Layer 2 or 1136 MPLS routed segments along the path in pass-through mode as data 1137 packets. The appendix discusses the possibility of extending the 1138 TraceFlow protocol to allow the devices in the Layer 2 and MPLS 1139 segments along the path of the traffic flow to respond to the Flow 1140 Discovery Request packet. But this is saved for future work. 1142 The discussion so far has assumed that the Flow Discovery Request 1143 packet would originate on one device (say device A) and terminate on 1144 some other device (say device B). It is likely that a third device 1145 (say device C) would be interested in obtaining the flow related 1146 information for a flow traversing from device A to device B. In this 1147 case, device C sends a Flow Discovery Packet to device A. The Flow 1148 Discovery Request type specified in the packet would indicate to 1149 device A that this is an indirect request from device C to obtain 1150 information relevant to the flow specified in the Flow Descriptor 1151 TLV. Device A then generates a new Flow Discovery Request packet with 1152 the destination IP set to device B and the Originator IP Address set 1153 to device C. All transit routers that process this request would send 1154 their responses to device C. See security considerations to get more 1155 information on issues with the indirect mode and ways to mitigate 1156 them. 1158 4.1. Using Hardware to gather details for the response packet. 1160 It is RECOMMENDED that the TLVs SHOULD be filled with as much 1161 information gathered directly by reading the hardware elements that 1162 are used in forwarding of a flow. 1164 4.2 Interaction with MPLS based transit devices. 1166 Current MPLS ping standard supports ping/traceroute between ingress 1167 and egress LSRs only. There is need for a singular probe that traces 1168 all types of hops which includes MPLS LSRs which can be addressed 1169 with our protocol. But we intend to support only pass pipe mode (pass 1170 through) of tracing where entire MPLS lsp is treated as a single 1171 interface. Uniform mode where we trace every hop along the way is 1172 totally excluded in this scheme. It may however be taken up for 1173 future work. 1175 In the MPLS case given the difference in the TTL value one can arrive 1176 at the conclusion that the MPLS network in the middle did a pass 1177 through of the packet. The egress LER can begin to send back the 1178 Discovery responses from where the Ingress LER left off. 1180 4.3 Applicability to Layer 2 devices. 1182 Layer 2 devices in this version of the draft are totally bypassed 1183 with respect to Traceflow. L2 devices are expected to merely forward 1184 the Traceflow frames. Future work may be done to extend to support 1185 Traceflow on Layer 2 devices. 1187 4.4 Applicability to platforms that have trouble determining incoming 1188 Interface. 1190 Appropriate hardware assists need to be done to indicate to the 1191 software as regards which incoming interface the packet came on with 1192 regard to platforms that have trouble determining which interface the 1193 packet came through. 1195 4.5 Applicability to Network Address Translators 1197 This aspect has not been studied well as yet and future revisions of 1198 the draft or addendum documents to this draft may make this behaviour 1199 more clearer. The aspect to worry about is the shipping back of the 1200 response packet to the originator in case the outer IP header is 1201 subject to translation. Both the encapsulated packet and the outer IP 1202 header may need to undergo translation. Normally firewalls that 1203 surround NATs or are in-built with the capability of NATs may drop 1204 packets for which the port assignments are not set for pass-thru or 1205 translation. So some hole poking on the firewall may be required to 1206 pass the response through to get the response packet back to the 1207 originator. As specified, this aspect has to be thought through and 1208 document in subsequent versions or added as additional drafts 1209 modifying the behaviour to enable NAT traversal of Traceflow packets. 1211 One advantage though is that since the request and response is not an 1212 ICMP packet, the Traceflow packets may need to be considered as mere 1213 data packets and may pass through without a hitch. Trust boundaries 1214 as encompassed by firewalls may however not like the intrusion. 1216 5. Application Scenarios 1218 This section discusses Trouble-shooting applications of this 1219 proposal. The application scenarios can broadly be divided into two 1220 categories: 1222 1. Troubleshooting network failures 1224 2. Network planning 1226 5.1. Troubleshooting network failures 1228 Several network monitoring tools provide us the capability to monitor 1229 the health of a network by polling information from the network 1230 devices (primarily through the use of SNMP). They help us in 1231 detecting network failures, imminent failures or other anomalies in 1232 the network. 1234 For troubleshooting these failures, the network operators typically 1235 rely initially on tools such as ping and traceroute. Unfortunately 1236 they do not provide detailed information about the traffic flow that 1237 is affected for a couple of reasons: 1239 1. It is likely that ping and traceroute control packets follow a 1240 different path through the network compared to the traffic flow that 1241 is being investigated - for example when policy-based routing is in 1242 effect or when there are one or more ECMP segments along the path of 1243 the traffic flow. 1245 2. Ping and traceroute do not provide us with details about the 1246 constituent members of a port-channel trunk through which the 1247 affected flow would have traversed. 1249 3. It is common practice to rate limit ping and traceroute traffic 1250 at the router. This creates a lack of deterministic responses to ping 1251 and traceroute. 1253 Being able to trace the exact path that a particular flow might have 1254 taken through the network and obtain all relevant information about 1255 the hops along that path provides the network operator with enough 1256 information to troubleshoot a network failure quickly. 1258 By setting the fan-out bit in the Flow Descriptor TLV, the operator 1259 should be able to determine all possible paths through the network 1260 that traffic to a particular destination may take. Along with the 1261 paths, the operator should also be able to obtain information 1262 relevant to the traffic flow from transit devices along the paths. 1263 This might prove to be useful in trouble-shooting certain type of 1264 network problems. 1266 5.2. Network flow planning 1268 During production, it may be useful to know which ephemeral source 1269 port can be used to divert the flow on a suitable LAG member or an 1270 ECMP component link by using Traceflow packets with different 1271 ephemeral source port / ports in a range. 1273 It would be useful to determine that the network access-lists are 1274 properly configured and the traffic would not get blocked 1275 inadvertently by an access-list somewhere. 1277 Typically the issues listed above are discovered once the network is 1278 in production. 1280 By having the ability to exercise the traffic flow's data path before 1281 it starts handling production traffic would help the operator to: 1283 1. Rectify any configuration issues such as ACL policies. 1285 2. Modify the ephemeral source port to get the flow traffic to 1286 flow across a specific constituent member of a port-channel trunk or 1287 an ECMP path 1289 Note that this application of the Traceflow protocol may not be 1290 relevant to all types of networks. Campus networks, enterprise 1291 networks and datacenters with well defined traffic flow patterns may 1292 benefit from the capability to detect the above problems. However for 1293 tier 1 providers this application of the TraceFlow has limited 1294 relevance as the traffic flows are not well-defined. 1296 The operator may use the fan-out bit in the Flow Descriptor TLV to 1297 request the transit devices to provide all the paths that traffic 1298 flow to a certain destination address would take. This allows the 1299 operator to validate the ECMP or LAG configuration in the network. 1301 5.2.1 Programmatic migration to mitigate LAG link polarization 1303 In later versions of the openflow specification virtual ports such as 1304 LAGs are exposed to the openflow forwarding path. It is imperative 1305 that the controller has a standards based ability to discover lag 1306 hashing functionality. Through the traceflow discovery and fanout 1307 process the controller is able to proactively determine which action 1308 to take to influence flows to move from one Lag member to another. 1309 This will aid in the automated troubleshooting of link polarity 1310 problems 1312 6. Security Considerations 1314 This section discusses threats to which TraceFlow might be vulnerable 1315 and discusses means by which those threats might be mitigated. 1317 There is a concern that this protocol might allow an external user to 1318 probe the detailed path that a flow takes through a network. 1320 The network operator can associate multiple levels with the different 1321 types of information that are included in the response to a Flow 1322 Discovery Request packet. For example only the "Next Hop Router" may 1323 be marked as publicly accessible information whereas everything else 1324 may be marked as private information. On receiving a Flow Discovery 1325 Request packet originating outside the local network, only the 1326 publicly accessible information is included in the response to the 1327 originator. However if the request was originated locally the device 1328 includes all requested information in the response. 1330 The Result TLV and Additional Information Codes TLV provide detailed 1331 information about the processing of the Flow Discovery Request packet 1332 and may possibly leak information about the locally configured 1333 policies. The amount of information to be included in these TLVs 1334 should also depend on whether the request was originated externally 1335 or internally. The network operator may choose to silently drop the 1336 Flow Discovery Request packet without providing any indication of the 1337 reason for doing so if the request was originated externally. 1339 Today most network operators throttle conventional OAM traffic (For 1340 example ping and traceroute) that is serviced by the device to 1341 protect against Denial-of-Service attacks. Such mechanisms should be 1342 employed for TraceFlow packets for the same reason. Rate limiting any 1343 packets punted to the software can include traffic relating to 1344 management plane. Many platforms offer to rate limit M no of packets 1345 per second or per minute. Facilities like these can be used to 1346 procure a rate limited quantum of traffic to go to the management 1347 plane as would be the case in Traceflow traffic. Configuring M would 1348 be a user provided option with a default set to a suitable quantum. 1350 Hardware assisted rate limiting would be a pre-requisite for this 1351 feature. 1353 7. Hardware pre-requisites for implementing Traceflow. 1355 7.1 filter to trap packets with UDP destination port 1357 Filters with a corresponding PUNT to software action should be 1358 programmable in hardware to trap packets with UDP destination port 1359 signifying Traceflow packets. For platforms that support hardware 1360 based filtering would benefit most from this filter support. All 1361 Layer 3 devices would be most appropriate for programming this 1362 filter. However please note that the UDP port based filter will not 1363 be and SHOULD not be applied to MPLS packets or IP-in-IP tunneled 1364 packets. This tunneling variety of packets be it MPLS or IP-in-IP 1365 (include IP-GRE) are out of scope of this document. 1367 7.2 Packet injection mode directly to egress port. 1369 For the purpose of making Traceflow take a proper output member in a 1370 LAG or ECMP case, there should be packet injection mode supported in 1371 hardware. Once the software control plane for Traceflow gets the 1372 packet, the updated packet should be sent across to the appropriate 1373 next-hop transit device through the appropriate LAG or ECMP member as 1374 is calculated by the hardware algorithm and for this purpose the 1375 hardware should support packet injection mode directly to egress port 1376 without interference from the hardware forwarding engine. In this 1377 mode the software sends the packet across to the egress port 1378 bypassing the hardware forwarding engine from the software control 1379 plane to make it take the appropriate LAG or ECMP member which ever 1380 is appropriate. 1382 7.3 Packet injection mode through hardware engine but not to output 1383 port. 1385 For the purpose of making Traceflow provide the proper result as to 1386 which LAG / ECMP member the packet will go out on, the hardware 1387 should provide assist to the CPU to inject the packet to get the 1388 forwarding result but not route or switch the packet onto the next- 1389 hop. 1391 7.4 Hardware rate limiter support (preventing DOS attacks) 1393 There should exist support for hardware rate limiter based on filters 1394 in order that DOS attacks are not mounted on the control plane / the 1395 software part of the Traceflow engine. Normally the control plane of 1396 the Traceflow engine exists in the Router Processor Module of the 1397 transit devices or the end device against which a Traceflow 1398 traceroute and ping packets are sent respectively. This hardware rate 1399 limiter makes use of the filter to count the number of packets per 1400 unit time like a minute to determine if too many Traceflow packets 1401 are being sought to be sent to the control plane in the Route 1402 Processor Module. This is another requirement from the hardware. 1404 7.5 RPF check support in hardware (security consideration) 1406 To implement security across trust boundaries Reverse Path Forwarding 1407 check (RPF check) should be enabled on the domain's boundary devices. 1409 This is to ensure that the IP addresses internal to the domain are 1410 not used by outside entities to initiate a Traceflow from the outside 1411 of the boundary of the domain in question. 1413 7.6 Regular Security ACLs in the boundary of the network. 1415 Apart from RPF check to check whether the Originator IP address is 1416 internal to the network and is being spoofed from an outside the 1417 boundary entity, regular security ACLs should be programmed at the 1418 boundary to ensure that outside entities are not allowed to generate 1419 Traceflow packets into the boundary and across into the insides of a 1420 network domain. 1422 7.7 Implementing the LAG / ECMP using software state 1424 Earlier exact same hashing function / functions that the hardware 1425 implements was required to be implemented in the software control 1426 plane of the Traceflow engine in the Route Processor Module. This is 1427 in effect to determine the LAG or ECMP member through which the 1428 packet will be forwarded if sent through hardware. This mimicing is 1429 not sufficient as the hardware software synchronization may not be in 1430 place at that point in time. That is the hardware and software may be 1431 out of sync with each other resulting in the wrong result if mimicing 1432 the hardware in software, is the mechanism to get the result. The 1433 hardware would possibly give us a wrong result if actually exercised. 1434 In effect the hardware assist should support packet injection from 1435 CPU and provide the required results back to the CPU Traceflow 1436 control process. 1438 7.8 Implementation considerations 1440 Several aspects of hardware utilize internal packet headers to 1441 determine aspects of an incoming packet such as ingress port, ACL 1442 based packet drops etc. All the said codes corresponding to the 1443 reasons why a packet is dropped should be determined through the 1444 packet injection mode available in a hardware in part utilizing these 1445 internal headers. This is so because when a packet is sought to be 1446 forwarded and is actually dropped in hardware the reason codes like 1447 ACL based drops, policing etc., should be available to the software 1448 control plane to construct the Traceflow response packet with their 1449 appropriate fields. 1451 7.7.l Using ingress port as part of the LAG/ECMP hashing function. 1453 LAG / ECMP hashing function on certain platforms use the ingress port 1454 as well in their hashing to arrive at the LAG / ECMP member on which 1455 the packet is to be forwarded out on. Normally packet injection mode 1456 supporting platforms provide the ability to inject a packet into the 1457 hardware Forwarding Engine and make it look like the packet came in 1458 on a specific ingress port. Now on some vendor platforms this may not 1459 be possible. On platforms where the ingress port is not part of the 1460 equation to the hashing function, they can support Traceflow with 1461 normal packet injection supported. 1463 When ingress port is involved, CPU injection MAY be used. 1465 If we do so the LAG or ECMP that the packet takes MAY be different 1466 from the one that is actually chosen if the ingress port was taken 1467 into account. 1469 All this just because ingress port is part of a hashing function 1470 determining a LAG / ECMP member and some platforms dont support 1471 packet injection from software with the ingress port under 1472 consideration. 1474 8. IANA Considerations 1476 TraceFlow protocol would need a UDP port assignment to be used as the 1477 destination port in the TraceFlow packets. 1479 9. Contributors 1481 This document in its original version was submitted to the IETF on 1482 August 16th 2008 by the following authors. These authors were namely 1483 A. Viswanathan, S. Krishnamurthy, R. Manur, V. Zinjuvadia who at that 1484 time were part of Force10 Networks with inputs and suggestions from 1485 Shane Amante. We would like to acknowledge their contribution to this 1486 draft as in its original version. 1488 This document was prepared using Nroff Internet Draft Editor. 1490 APPENDIX A: 1492 A.1. Encapsulation Format Choices 1494 A.1.1. Carrying a separate Flow Descriptor TLV inside the Flow Discovery 1495 Request packet 1497 This is the approach selected for this proposal. In order to specify 1498 a flow, the originating device encapsulates the entire data packet 1499 belonging to the traffic flow of interest in the Flow Descriptor TLV. 1501 If a traffic flow data packet is not readily available, the operator 1502 may have to generate a data packet with the traffic flow information 1503 available and encapsulate that in the Flow Descriptor TLV. 1505 Future revisions of this document may update the Flow Descriptor TLV 1506 if there is a need to allow the Flow Descriptor TLV to carry 1507 individual flow parameters (such as the Source IP Address, 1508 Destination IP Address, UDP/TCP Port numbers, etc.) in sub-TLV format 1509 rather than using an encapsulated data packet. 1511 A.1.2. Using the traffic flow's parameter values in the external header. 1513 This is done to encapsulate the Flow Discovery Request packet. This 1514 approach involves using the traffic flow's header as the outer header 1515 of the Flow Discovery Request packet. This ensures that the Flow 1516 Discovery Request packet would take the same path as the traffic flow 1517 would have. We could use Layer 2 EtherType to differentiate between 1518 this OAM packet and the data packets belonging to the traffic flow. 1520 This approach was not selected due to the added requirement on the 1521 intermediate devices to process new EtherType which might be limited 1522 by hardware. Moreover it is likely that the OAM packet would have to 1523 make a stop at the intermediate device anyway in order to gather the 1524 relevant information for the traffic flow specified. 1526 If the Flow Discovery Request packet does not use a special 1527 EtherType, it would be difficult for network operator to filter these 1528 OAM packets as they would be indistinguishable compared to the 1529 traffic flow. Moreover such TraceFlow OAM packets may be considered 1530 as 'spoofed' packets. 1532 Even though this approach is not being selected for TraceFlow 1533 protocol in this document, it helps TraceFlow protocol in supporting 1534 certain networks with legacy devices (not supporting TraceFlow). This 1535 approach may be reconsidered in future revisions of this document. 1537 A.2. Layer 4 Protocol Choices and Router Alert option 1539 A.2.1. UDP Encapsulation 1541 This approach has been selected in this proposal. The Traceflow 1542 packets are UDP packets with a well-known destination port number (to 1543 be requested from IANA). 1545 A.2.2. ICMP Encapsulation 1547 This approach involves sending TraceFlow packets as ICMP packets. 1548 This was not selected in this proposal due to the simplicity of the 1549 UDP approach. 1551 A.3. Legacy Devices (Not supporting TraceFlow) 1553 It is necessary that the entire flow information available through 1554 the encapsulated packet in the Flow Discovery Request packet be used 1555 in determining the egress port. If the Flow Discovery Request packet 1556 reaches a legacy device that does not support TraceFlow, it is likely 1557 that the request packet gets forwarded along a different egress link 1558 compared to the egress link through which the data packets belonging 1559 to the traffic flow would have been forwarded. Hence the information 1560 received from the transit routers beyond the legacy device in a 1561 TraceFlow probe may not be useful. Typically if the legacy device 1562 does not employ LAGs or ECMP paths or policy-based routing, the 1563 TraceFlow packet may proceed in the direction that the traffic flow 1564 would have taken and subsequent transit nodes may still be able to 1565 provide useful and relevant information to the originator of the Flow 1566 Discovery Request packet. 1568 A.4. TTL Scoping 1570 Conventional traceroute employs TTL Scoping as a means to determine 1571 the path followed by destination address based hop-by-hop routing of 1572 a packet. 1574 TraceFlow protocol does not employ TTL Scoping in the current 1575 specification. However using TraceFlow with TTL Scoping has certain 1576 applications in networks that contain some legacy devices that do not 1577 support TraceFlow. This may be explored in future revisions of this 1578 document if there is interest in the community to solve this problem. 1580 An implementation may allow the operator to send out the TraceFlow 1581 packets with TTL Scoping just like conventional traceroute. In such a 1582 mode following points should be noted: 1584 1) The originator node may receive multiple packets from the 1585 transit nodes - an ICMP 'TTL Expired' packet and a TraceFlow response 1586 packet 1588 2) In this mode, the transit devices SHOULD send out the TraceFlow 1589 response packet only if the TTL has also expired for that Flow 1590 Discovery Request packet on that device. This is needed to prevent 1591 duplicate Flow Discovery Response packets from the transit node for 1592 each request packet that the originator device sends when performing 1593 TTL Scoping. 1595 A.5. Additional Information in the Flow Discovery Response 1597 This document lists the information that can be requested by the 1598 originator of the TraceFlow Flow Discovery Request packet and that 1599 may be included by the transit devices in their response. Future 1600 revisions of this document may modify this list based on the feedback 1601 from the community. For example the QoS related statistics and queue 1602 depth information may be included in the Flow Discovery Response 1603 packets for the traffic flow being investigated. 1605 A.6. Choices for supporting remote TraceFlow requests 1607 A.6.1. Terminating the request at the Proxy device and re-originate it 1609 This approach was selected in this proposal. For indirect Flow 1610 Discovery Requests, the originating device sends the request to 1611 another proxy device that is the intended starting point for probing 1612 the flow and gathering relevant information about the flow. This 1613 proxy device receives the Flow Discovery Request packet, processes it 1614 and re-originates a Flow Discovery Request towards the destination of 1615 the flow. 1617 A.6.2. Source-Routing the request through the Proxy device 1619 This approach involved sending the Flow Discovery Request with IP 1620 Source Routing option that forced the packet to be received by the 1621 proxy device that is the intended starting point for probing the flow 1622 and gathering relevant information about the flow. It was not 1623 selected for this proposal. 1625 A.7. Applicability to Multicast 1627 Multicast networks have also evolved into more complex heterogeneous 1628 networks in the recent years. These advancements place more burden on 1629 multicast OAM tools employed by network operators. Troubleshooting 1630 network problems, monitoring network performance and network planning 1631 and provisioning become difficult due to the gap between the 1632 complexities in the network compared to the capabilities of the OAM 1633 tools. Mtrace [4] has evolved into a useful OAM tool to address some 1634 of the problems faced in multicast network. However it does not 1635 address all the problems discussed in this document. We believe that 1636 TraceFlow protocol can be extended to assist the network operator 1637 with their multicast deployments. Specific mechanics of any such 1638 extensions may be defined in the later versions of the draft. 1640 A.8. Applicability to Layer 2 networks 1642 The Layer 2 devices in the path taken by the TraceFlow packets should 1643 be able to snoop on the higher layer headers in the packet to 1644 determine that it is a TraceFlow Flow Discovery Request packet. Most 1645 of the TraceFlow packet processing and operations discussed in this 1646 document should apply to the layer 2 devices also. But however, the 1647 current version of the draft treats Layer 2 devices as pass-through. 1648 Refer to section 4.3 to see more of the discussion with respect to 1649 this issue. 1651 However specific mechanics of any separate extensions necessary for 1652 Layer 2 networks may be defined in the later versions of the 1653 protocol. 1655 A.9. Applicability to IPv6 1657 The TraceFlow protocol described in this document should apply to 1658 IPv6 networks or IPv4-IPv6 dual stack networks with straight-forward 1659 extensions. 1661 Specific mechanics of extensions to address IPv6 networks may be 1662 defined in the later versions of the draft. 1664 A.10. Applicability to MPLS 1666 MPLS networks are to be considered at a later point in time in the 1667 future. Revisions or addendums to this proposal to include MPLS 1668 networks are currently out of scope of this document. 1670 A.11. Flow Discovery and Response packet fragmentation 1672 It is highly RECOMMENDED that the network allow the Flow Discovery 1673 Request packet to travel through to the destination without 1674 fragmentation. The Flow Discovery Response packet that is originated 1675 by the transit devices processing the request packet may be 1676 fragmented on its way to the originator device. 1678 9. References 1680 9.1. Normative References 1682 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 1683 Requirement Levels", BCP 14, RFC 2119, March 1997. 1685 [RFC1776] Crocker, S., "The Address is the Message", RFC 1776, April 1686 1 1995. 1688 [TRUTHS] Callon, R., "The Twelve Networking Truths", RFC 1925, 1689 April 1 1996. 1691 9.2. Informative References 1693 [EVILBIT] Bellovin, S., "The Security Flag in the IPv4 Header", 1694 RFC 3514, April 1 2003. 1696 [RFC5513] Farrel, A., "IANA Considerations for Three Letter 1697 Acronyms", RFC 5513, April 1 2009. 1699 [RFC5514] Vyncke, E., "IPv6 over Social Networks", RFC 5514, April 1 1700 2009. 1702 Author's Addresses 1704 Janardhanan Narasimhan.P, 1705 Dell-Force10, 1706 Olympia Technology Park, 1707 Fortius block, 7th & 8th Floor, 1708 Plot No. 1, SIDCO Industrial Estate, 1709 Guindy, Chennai - 600032. 1710 TamilNadu, India. 1711 Tel: +91 (0) 44 4220 8400 1712 Fax: +91 (0) 44 2836 2446 1714 Email: Pathangi_janardhanan@dell.com 1716 Balaji Venkat Venkataswami, 1717 Dell-Force10, 1718 Olympia Technology Park, 1719 Fortius block, 7th & 8th Floor, 1720 Plot No. 1, SIDCO Industrial Estate, 1721 Guindy, Chennai - 600032. 1722 TamilNadu, India. 1723 Tel: +91 (0) 44 4220 8400 1724 Fax: +91 (0) 44 2836 2446 1726 Email: BALAJI_VENKAT_VENKAT@dell.com 1728 Richard Groves, 1729 Microsoft Corporation, 1730 One Microsoft Way, 1731 Redmond, WA 98052 1733 Email: rgroves@microsoft.com 1735 Peter Hoose, 1736 Facebook, 1737 Willow Rd., 1738 Menlo Park, CA 94025 1740 Email: phoose@fb.com