idnits 2.17.1 draft-fenner-traceroute-ipm-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1.a on line 17. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** The document seems to lack an RFC 3978 Section 5.5 (updated by RFC 4748) Disclaimer -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 2 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([Brad97]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 2005) is 6828 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'Brad97' on line 50 looks like a reference -- Missing reference section? 'Brad88' on line 156 looks like a reference -- Missing reference section? 'Katz97' on line 665 looks like a reference -- Missing reference section? 'Pusa99' on line 738 looks like a reference -- Missing reference section? 'Thal99' on line 738 looks like a reference -- Missing reference section? 'Thal00' on line 738 looks like a reference Summary: 12 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Individual submission 3 INTERNET-DRAFT W. Fenner 4 draft-fenner-traceroute-ipm-01.txt AT&T Research 5 S. Casner 6 Packet Design 7 February 11, 2005 8 Expires August 2005 10 A "traceroute" facility for IP Multicast. 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware have 16 been or will be disclosed, and any of which he or she becomes aware will 17 be disclosed, in accordance with Section 6 of RFC 3668. 19 Internet-Drafts are working documents of the Internet Engineering Task 20 Force (IETF), its areas, and its working groups. Note that other groups 21 may also distribute working documents as Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference material 26 or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.html. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Distribution of this document is unlimited. 36 Abstract 38 This draft describes the IGMP multicast traceroute facility. 39 Unlike unicast traceroute, multicast traceroute requires a special 40 packet type and implementation on the part of routers. This speci- 41 fication describes the required functionality in multicast routers, 42 as well as how management applications can use the new router func- 43 tionality. 45 Key Words 47 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 48 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 49 document are to be interpreted as described in RFC 2119 [Brad97]. 51 1. Introduction 53 The unicast "traceroute" program allows the tracing of a path from one 54 machine to another, using a mechanism that already existed in IP. 55 Unfortunately, no such existing mechanism can be applied to IP multicast 56 paths. The key mechanism for unicast traceroute is the ICMP TTL 57 exceeded message, which is specifically precluded as a response to mul- 58 ticast packets. Thus, we specify the multicast "traceroute" facility to 59 be implemented in multicast routers and accessed by diagnostic programs. 60 While it is a disadvantage that a new mechanism is required, the multi- 61 cast traceroute facility can provide additional information about packet 62 rates and losses that the unicast traceroute cannot, and generally 63 requires fewer packets to be sent. 65 Goals: 67 o To be able to trace the path that a packet would take from some 68 source to some destination. 70 o To be able to isolate packet loss problems (e.g., congestion). 72 o To be able to isolate configuration problems (e.g., TTL threshold). 74 o To minimize packets sent (e.g. no flooding, no implosion). 76 2. Overview 78 Given a multicast distribution tree, tracing from a source to a multi- 79 cast destination is hard, since you don't know down which branch of the 80 multicast tree the destination lies. This means that you have to flood 81 the whole tree to find the path from one source to one destination. 82 However, walking up the tree from destination to source is easy, as most 83 existing multicast routing protocols know the previous hop for each 84 source. Tracing from destination to source can involve only routers on 85 the direct path. 87 The party requesting the traceroute (which need be neither the source 88 nor the destination) sends a traceroute Query packet to the last-hop 89 multicast router for the given destination. The last-hop router turns 90 the Query into a Request packet by adding a response data block contain- 91 ing its interface addresses and packet statistics, and then forwards the 92 Request packet via unicast to the router that it believes is the proper 93 previous hop for the given source and group. Each hop adds its response 94 data to the end of the Request packet, then unicast forwards it to the 95 previous hop. The first hop router (the router that believes that pack- 96 ets from the source originate on one of its directly connected networks) 97 changes the packet type to indicate a Response packet and sends the com- 98 pleted response to the response destination address. The response may 99 be returned before reaching the first hop router if a fatal error condi- 100 tion such as "no route" is encountered along the path. 102 Multicast traceroute uses any information available to it in the router 103 to attempt to determine a previous hop to forward the trace towards. 104 Multicast routing protocols vary in the type and amount of state they 105 keep; multicast traceroute endeavors to work with all of them by using 106 whatever is available. For example, if a DVMRP router has no active 107 state for a particular source but does have a DVMRP route, it chooses 108 the parent of the DVMRP route as the previous hop. If a PIM-SM router 109 is on the (*,G) tree, it chooses the parent towards the RP as the previ- 110 ous hop. In these cases, no source/group-specific state is available, 111 but the path may still be traced. 113 3. Multicast Traceroute header 115 The header for all multicast traceroute packets is as follows. The 116 header is only filled in by the originator of the traceroute Query; 117 intermediate hops MUST NOT modify any of the fields. 119 0 1 2 3 120 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 121 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 122 | IGMP Type | # hops | checksum | 123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 124 | Multicast Group Address | 125 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 126 | Source Address | 127 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 128 | Destination Address | 129 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 130 | Response Address | 131 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 132 | resp ttl | Query ID | 133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 135 3.1. IGMP Type: 8 bits 137 The IGMP type field is defined to be 0x1F for traceroute queries 138 and requests. The IGMP type field is changed to 0x1E when the 139 packet is completed and sent as a response from the first hop 140 router to the querier. Two codes are required so that multicast 141 routers won't attempt to process a completed response in those 142 cases where the initial query was issued from a router or the 143 response is sent via multicast. 145 3.2. # hops: 8 bits 147 This field specifies the maximum number of hops that the requester 148 wants to trace. If there is some error condition in the middle of 149 the path that keeps the traceroute request from reaching the first- 150 hop router, this field can be used to perform an expanding-length 151 search to trace the path to just before the problem. 153 3.3. Checksum: 16 bits 155 The checksum is the 16-bit one's complement of the one's complement 156 sum of the whole IGMP message (the entire IP payload)[Brad88]. 157 When computing the checksum, the checksum field is set to zero. 158 When transmitting packets, the checksum MUST be computed and 159 inserted into this field. When receiving packets, the checksum 160 MUST be verified before processing a packet. 162 3.4. Group address 164 This field specifies the group address to be traced, or zero if no 165 group-specific information is desired. Note that non-group-spe- 166 cific traceroutes may not be possible with certain multicast rout- 167 ing protocols. 169 3.5. Source address 171 This field specifies the IP address of the multicast source for the 172 path being traced, or 0xFFFFFFFF if no source-specific information 173 is desired. Note that non-source-specific traceroutes may not be 174 possible with certain multicast routing protocols. 176 3.6. Destination address 178 This field specifies the IP address of the multicast receiver for 179 the path being traced. The trace starts at this destination and 180 proceeds toward the traffic source. 182 3.7. Response Address 184 This field specifies where the completed traceroute response packet 185 gets sent. It can be a unicast address or a multicast address, as 186 explained in section 6.2. 188 3.8. resp ttl: 8 bits 190 This field specifies the TTL at which to multicast the response, if 191 the response address is a multicast address. 193 3.9. Query ID: 24 bits 195 This field is used as a unique identifier for this traceroute 196 request so that duplicate or delayed responses may be detected and 197 to minimize collisions when a multicast response address is used. 199 4. Definitions 201 Since multicast traceroutes flow in the opposite direction to the data 202 flow, we always refer to "upstream" and "downstream" with respect to 203 data, unless explicitly specified. 205 Incoming Interface 206 The interface on which traffic is expected from the specified 207 source and group. 209 Outgoing Interface 210 The interface on which traffic is forwarded from the specified 211 source and group towards the destination. Also called the "Recep- 212 tion Interface", since it is the interface on which the multicast 213 traceroute Request was received. 215 Previous-Hop Router 216 The router, on the Incoming Interface, which is responsible for 217 forwarding traffic for the specified source and group. 219 5. Response data 221 Each router adds a "response data" segment to the traceroute packet 222 before it forwards it on. The response data looks like this: 224 0 1 2 3 225 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 227 | Query Arrival Time | 228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 229 | Incoming Interface Address | 230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 231 | Outgoing Interface Address | 232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 233 | Previous-Hop Router Address | 234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 235 | Input packet count on incoming interface | 236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 237 | Output packet count on outgoing interface | 238 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 239 | Total number of packets for this source-group pair | 240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 241 | | |M| | | | 242 | Rtg Protocol | FwdTTL |B|S| Src Mask |Forwarding Code| 243 | | |Z| | | | 244 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 246 5.1. Query Arrival Time 248 The Query Arrival Time is a 32-bit NTP timestamp specifying the 249 arrival time of the traceroute request packet at this router. The 250 32-bit form of an NTP timestamp consists of the middle 32 bits of 251 the full 64-bit form; that is, the low 16 bits of the integer part 252 and the high 16 bits of the fractional part. 254 The following formula converts from a UNIX timeval to a 32-bit NTP 255 timestamp: 257 query_arrival_time = (tv.tv_sec + 32384) << 16 + ((tv.tv_usec << 258 10) / 15625) 260 The constant 32384 is the number of seconds from Jan 1, 1900 to Jan 261 1, 1970 truncated to 16 bits. ((tv.tv_usec << 10) / 15625) is a 262 reduction of ((tv.tv_usec / 100000000) << 16). 264 5.2. Incoming Interface Address 266 This field specifies the address of the interface on which packets 267 from this source and group are expected to arrive, or 0 if unknown. 269 5.3. Outgoing Interface Address 271 This field specifies the address of the interface on which packets 272 from this source and group flow to the specified destination, or 0 273 if unknown. 275 5.4. Previous-Hop Router Address 277 This field specifies the router from which this router expects 278 packets from this source. This may be a multicast group (e.g. 279 ALL-[protocol]-ROUTERS.MCAST.NET) if the previous hop is not known 280 because of the workings of the multicast routing protocol. How- 281 ever, it should be 0 if the incoming interface address is unknown. 283 5.5. Packet counts 285 Note that these packet counts SHOULD be as up to date as possible. 286 If packet counts are not being maintained on the processor that 287 handles the traceroute request in a multi-processor router archi- 288 tecture, the packet SHOULD be delayed while the counters are gath- 289 ered from the remote processor(s). If this occurs, the Query 290 Arrival Time should be updated to reflect the time at which the 291 packet counts were learned. 293 5.6. Input packet count on incoming interface 295 This field contains the number of multicast packets received for 296 all groups and sources on the incoming interface, or 0xffffffff if 297 no count can be reported. This counter should have the same value 298 as ifInMulticastPkts from the IF-MIB for this interface. 300 5.7. Output packet count on outgoing interface 302 This field contains the number of multicast packets that have been 303 transmitted or queued for transmission for all groups and sources 304 on the outgoing interface, or 0xffffffff if no count can be 305 reported. This counter should have the same value as ifOutMulti- 306 castPkts from the IF-MIB for this interface. 308 5.8. Total number of packets for this source-group pair 310 This field counts the number of packets from the specified source 311 forwarded by this router to the specified group, or 0xffffffff if 312 no count can be reported. If the S bit is set, the count is for 313 the source network, as specified by the Src Mask field. If the S 314 bit is set and the Src Mask field is 63, indicating no source-spe- 315 cific state, the count is for all sources sending to this group. 316 This counter should have the same value as ipMRoutePkts from the 317 IPMROUTE-STD-MIB for this forwarding entry. 319 5.9. Rtg Protocol: 8 bits 321 This field describes the routing protocol in use between this 322 router and the previous-hop router. Specified values include: 324 1 DVMRP 325 2 MOSPF 326 3 PIM 327 4 CBT 328 5 PIM using special routing table 329 6 PIM using a static route 330 7 DVMRP using a static route 331 8 PIM using MBGP (aka BGP4+) route 332 9 CBT using special routing table 333 10 CBT using a static route 334 11 PIM using state created by Assert processing 336 5.10. FwdTTL: 8 bits 338 This field contains the TTL that a packet is required to have 339 before it will be forwarded over the outgoing interface. 341 5.11. MBZ: 1 bit 343 Must be zeroed on transmission and ignored on reception. 345 5.12. S: 1 bit 347 If this bit is set, it indicates that the packet count for the 348 source-group pair is for the source network, as determined by mask- 349 ing the source address with the Src Mask field. 351 5.13. Src Mask: 6 bits 353 This field contains the number of 1's in the netmask this router 354 has for the source (i.e. a value of 24 means the netmask is 355 0xffffff00). If the router is forwarding solely on group state, 356 this field is set to 63 (0x3f). 358 5.14. Forwarding Code: 8 bits 360 This field contains a forwarding information/error code. Defined 361 values include: 363 Value Name Description 364 -------------------------------------------------------------------- 365 0x00 NO_ERROR No error 366 0x01 WRONG_IF Traceroute request arrived on an interface to 367 which this router would not forward for this 368 source,group,destination. 369 0x02 PRUNE_SENT This router has sent a prune upstream which 370 applies to the source and group in the tracer- 371 oute request. 372 0x03 PRUNE_RCVD This router has stopped forwarding for this 373 source and group in response to a request from 374 the next hop router. 375 0x04 SCOPED The group is subject to administrative scoping 376 at this hop. 377 0x05 NO_ROUTE This router has no route for the source or 378 group and no way to determine a potential 379 route. 380 0x06 WRONG_LAST_HOP This router is not the proper last-hop router. 381 0x07 NOT_FORWARDING This router is not forwarding this 382 source,group out the outgoing interface for an 383 unspecified reason. 384 0x08 REACHED_RP Reached Rendez-vous Point or Core 385 0x09 RPF_IF Traceroute request arrived on the expected RPF 386 interface for this source,group. 387 0x0A NO_MULTICAST Traceroute request arrived on an interface 388 which is not enabled for multicast. 389 0x0B INFO_HIDDEN One or more hops have been hidden from this 390 trace. 391 0x81 NO_SPACE There was not enough room to insert another 392 response data block in the packet. 393 0x82 OLD_ROUTER The previous hop router does not understand 394 traceroute requests. 395 0x83 ADMIN_PROHIB Traceroute is administratively prohibited. 397 Note that if a router discovers there is not enough room in a 398 packet to insert its response, it puts the 0x81 error code in the 399 previous router's Forwarding Code field, overwriting any error the 400 previous router placed there. A multicast traceroute client, upon 401 receiving this error, MAY restart the trace at the last hop listed 402 in the packet. 404 The 0x80 bit of the Forwarding Code is used to indicate a fatal 405 error. A fatal error is one where the router may know the previous 406 hop but cannot forward the message to it. 408 6. Router Behavior 410 All of these actions are performed in addition to (NOT instead of) for- 411 warding the packet, if applicable. E.g. a multicast packet that has TTL 412 remaining MUST be forwarded normally, as MUST a unicast packet that has 413 TTL remaining and is not addressed to this router. 415 6.1. Traceroute Query 417 A traceroute Query message is a traceroute message with no response 418 blocks filled in, and uses IGMP type 0x1F. 420 6.1.1. Packet Verification 422 Upon receiving a traceroute Query message, a router must examine 423 the Query to see if it is the proper last-hop router for the desti- 424 nation address in the packet. It is the proper last-hop router if 425 it has a multicast-capable interface on the same subnet as the Des- 426 tination Address and is the router that would forward traffic from 427 the given source onto that subnet. 429 If the router determines that it is not the proper last-hop router, 430 or it cannot make that determination, it does one of two things 431 depending if the Query was received via multicast or unicast. If 432 the Query was received via multicast, then it MUST be silently 433 dropped. If it was received via unicast, a forwarding code of 434 WRONG_LAST_HOP is noted and processing continues as in section 6.2. 436 Duplicate Query messages as identified by the tuple (IP Source, 437 Query ID) SHOULD be ignored. This MAY be implemented using a sim- 438 ple 1-back cache (i.e. remembering the IP source and Query ID of 439 the previous Query message that was processed, and ignoring future 440 messages with the same IP Source and Query ID). Duplicate Request 441 messages MUST NOT be ignored in this manner. 443 6.1.2. Normal Processing 445 When a router receives a traceroute Query and it determines that it 446 is the proper last-hop router, it treats it like a traceroute 447 Request and performs the steps listed in section 6.2. 449 6.2. Traceroute Request 451 A traceroute Request is a traceroute message with some number of 452 response blocks filled in, and also uses IGMP type 0x1F. Routers 453 can tell the difference between Queries and Requests by checking 454 the length of the packet. 456 6.2.1. Packet Verification 458 If the traceroute Request is not addressed to this router, or if 459 the Request is addressed to a multicast group which is not a link- 460 scoped group (e.g. 224.0.0.x), it MUST be silently ignored. 462 6.2.2. Normal Processing 464 When a router receives a traceroute Request, it performs the fol- 465 lowing steps. Note that it is possible to have multiple situations 466 covered by the Forwarding Codes. The first one encountered is the 467 one that is reported, i.e. all "note forwarding code N" should be 468 interpreted as "if forwarding code is not already set, set forward- 469 ing code to N". 471 1. If there is room in the current buffer (or the router can effi- 472 ciently allocate more space to use), insert a new response 473 block into the packet and fill in the Query Arrival Time, Out- 474 going Interface Address, Output Packet Count, and FwdTTL. If 475 there was no room, fill in the response code "NO_SPACE" in the 476 *previous* hop's response block, and forward the packet to the 477 requester as described in "Forwarding Traceroute Requests". 479 2. Attempt to determine the forwarding information for the source 480 and group specified, using the same mechanisms as would be used 481 when a packet is received from the source destined for the 482 group. State need not be instantiated, it can be "phantom" 483 state created only for the purpose of the trace. 485 If using a shared-tree protocol and there is no source-specific 486 state, or if the source is specified as 0xFFFFFFFF, group state 487 should be used. If there is no group state or the group is 488 specified as 0, potential source state (i.e. the path that 489 would be followed for a source-specific Join) should be used. 490 If this router is the Core or RP and no source-specific infor- 491 mation is available, note an error code of REACHED_RP. 493 3. If no forwarding information can be determined, the router 494 notes an error code of NO_ROUTE, sets the remaining fields that 495 have not yet been filled in to zero, and the forwards the 496 packet to the requester as described in "Forwarding Traceroute 497 Requests". 499 4. Fill in the Incoming Interface Address, Previous-Hop Router 500 Address, Input Packet Count, Total Number of Packets, Routing 501 Protocol, S, and Src Mask from the forwarding information that 502 was determined. 504 5. If traceroute is administratively prohibited or the previous 505 hop router does not understand traceroute requests, note the 506 appropriate forwarding code (ADMIN_PROHIB or OLD_ROUTER). If 507 traceroute is administratively prohibited and any of the fields 508 as filled in step 4 are considered private information, zero 509 out the applicable fields. Then the packet is forwarded to the 510 requester as described in "Forwarding Traceroute Requests". 512 6. If the reception interface is not enabled for multicast, note 513 forwarding code NO_MULTICAST. If the reception interface is 514 the interface from which the router would expect data to arrive 515 from the source, note forwarding code RPF_IF. Otherwise, if 516 the reception interface is not one to which the router would 517 forward data from the source to the group, a forwarding code of 518 WRONG_IF is noted. 520 7. If the group is subject to administrative scoping on either the 521 Outgoing or Incoming interfaces, a forwarding code of SCOPED is 522 noted. 524 8. If this router is the Rendez-vous Point or Core for the group, 525 a forwarding code of REACHED_RP is noted. 527 9. If this router has sent a prune upstream which applies to the 528 source and group in the traceroute Request, it notes forwarding 529 code PRUNE_SENT. If the router has stopped forwarding down- 530 stream in response to a prune sent by the next hop router, it 531 notes forwarding code PRUNE_RCVD. If the router should nor- 532 mally forward traffic for this source and group downstream but 533 is not, it notes forwarding code NOT_FORWARDING. 535 10. The packet is then sent on to the previous hop or the requester 536 as described in "Forwarding Traceroute Requests". 538 6.3. Traceroute response 540 A router must forward all traceroute response packets normally, 541 with no special processing. If a router has initiated a traceroute 542 with a Query or Request message, it may listen for Responses to 543 that traceroute but MUST still forward them as well. 545 6.4. Forwarding Traceroute Requests 547 If the Previous-hop router is known for this request and the number 548 of response blocks is less than the number requested, the packet is 549 sent to that router. If the Incoming Interface is known but the 550 Previous-hop router is not known, the packet is sent to an appro- 551 priate multicast address on the Incoming Interface. The appropri- 552 ate multicast address may depend on the routing protocol in use, 553 MUST be a link-scoped group (i.e. 224.0.0.x), MUST NOT be ALL-SYS- 554 TEMS.MCAST.NET (224.0.0.1) and MAY be ALL-ROUTERS.MCAST.NET 555 (224.0.0.2) if the routing protocol in use does not define a more 556 appropriate group. Otherwise, it is sent to the Response Address 557 in the header, as described in "Sending Traceroute Responses". 558 Note that it is not an error for the number of response blocks to 559 be greater than the number requested; such a packet should simply 560 be forwarded to the requester as described in "Sending Traceroute 561 Responses". 563 6.5. Sending Traceroute Responses 565 6.5.1. Destination Address 567 A traceroute response must be sent to the Response Address in the 568 traceroute header. 570 6.5.2. TTL 572 If the Response Address is unicast, the router inserts its normal 573 unicast TTL in the IP header. If the Response Address is multi- 574 cast, the router copies the Response TTL from the traceroute header 575 into the IP header. 577 6.5.3. Source Address 579 If the Response Address is unicast, the router may use any of its 580 interface addresses as the source address. Since some multicast 581 routing protocols forward based on source address, if the Response 582 Address is multicast, the router MUST use an address that is known 583 in the multicast routing topology if it can make that determina- 584 tion. 586 6.5.4. Sourcing Multicast Responses 588 When a router sources a multicast response, the response packet 589 MUST be sent on a single interface, then forwarded as if it were 590 received on that interface. It MUST NOT source the response packet 591 individually on each interface, in order to avoid duplicate pack- 592 ets. 594 6.6. Hiding information 596 Information about a domain's topology and connectivity may be hid- 597 den from multicast traceroute requests. The exact mechanism is not 598 specified here; however, the INFO_HIDDEN forwarding code may be 599 used to note that, for example, the incoming interface address and 600 packet count are for the entrance to the domain and the outgoing 601 interface address and packet count are the exit from the domain. 602 The source-group packet count may be from either router or not 603 specified (0xffffffff). 605 7. Using multicast traceroute 607 7.1. Sample Client 609 This section describes the behavior of an example multicast traceroute 610 client. 612 7.1.1. Sending Initial Query 614 When the destination of the trace is the machine running the 615 client, the traceroute Query packet can be sent to the ALL-ROUTERS 616 multicast group (224.0.0.2). This will ensure that the packet is 617 received by the last-hop router on the subnet. Otherwise, if the 618 proper last-hop router is known for the trace destination, the 619 Query could be unicasted to that router. Otherwise, the Query 620 packet should be multicasted to the group being queried; if the 621 destination of the trace is a member of the group this will get the 622 Query to the proper last-hop router. In this final case, the 623 packet should contain the Router Alert option, to make sure that 624 routers that are not members of the multicast group notice the 625 packet. See also section 7.2 on determining the last-hop router. 627 7.1.2. Determining the Path 629 The client could send a small number of Initial Query messages with 630 a large "# hops" field, in order to try to trace the full path. If 631 this attempt fails, one strategy is to perform a linear search (as 632 the traditional unicast traceroute program does); set the "#hops" 633 field to 1 and try to get a response, then 2, and so on. If no 634 response is received at a certain hop, the hop count can continue 635 past the non-responding hop, in the hopes that further hops may 636 respond. These attempts should continue until a user-defined time- 637 out has occurred. 639 See also section 7.3 and 7.4 on receiving the results of a trace. 641 7.1.3. Collecting Statistics 643 After a client has determined that it has traced the whole path or 644 as much as it can expect to (see section 7.5), it might collect 645 statistics by waiting a short time and performing a second trace. 646 If the path is the same in the two traces, statistics can be dis- 647 played as described in section 8.3 and 8.4. 649 Details of performing a multicast traceroute: 651 7.2. Last hop router 653 The traceroute querier may not know which is the last hop router, 654 or that router may be behind a firewall that blocks unicast packets 655 but passes multicast packets. In these cases, the traceroute 656 request should be multicasted to the group being traced (since the 657 last hop router listens to that group). All routers except the 658 correct last hop router should ignore any multicast traceroute 659 request received via multicast. Traceroute requests which are mul- 660 ticasted to the group being traced must include the Router Alert IP 661 option [Katz97]. 663 Another alternative is to unicast to the trace destination. 664 Traceroute requests which are unicasted to the trace destination 665 must include the Router Alert IP option [Katz97], in order that the 666 last-hop router is aware of the packet. 668 If the traceroute querier is attached to the same router as the 669 destination of the request, the traceroute request may be multicas- 670 ted to 224.0.0.2 (ALL-ROUTERS.MCAST.NET) if the last-hop router is 671 not known. 673 7.3. First hop router 675 The traceroute querier may not be unicast reachable from the first 676 hop router. In this case, the querier should set the traceroute 677 response address to a multicast address, and should set the 678 response TTL to a value sufficient for the response from the first 679 hop router to reach the querier. It may be appropriate to start 680 with a small TTL and increase in subsequent attempts until a suffi- 681 cient TTL is reached, up to an appropriate maximum (such as 192). 683 The IANA has assigned 224.0.1.32, MTRACE.MCAST.NET, as the default 684 multicast group for multicast traceroute responses. Other groups 685 may be used if needed, e.g. when using mtrace to diagnose problems 686 with the IANA-assigned group. 688 7.4. Broken intermediate router 690 A broken intermediate router might simply not understand traceroute 691 packets, and drop them. The querier would then get no response at 692 all from its traceroute requests. It should then perform a hop-by- 693 hop search by setting the number of responses field until it gets a 694 response (both linear and binary search are options, but binary is 695 likely to be slower because a failure requires waiting for a time- 696 out). 698 7.5. Trace termination 700 When performing an expanding hop-by-hop trace, it is necessary to 701 determine when to stop expanding. 703 7.5.1. Arriving at source 705 A trace can be determined to have arrived at the source if the 706 Incoming Interface of the last router in the trace is non-zero, but 707 the Previous Hop router is zero. 709 7.5.2. Fatal Error 711 A trace has encountered a fatal error if the last Forwarding Error 712 in the trace has the 0x80 bit set. 714 7.5.3. No Previous Hop 716 A trace can not continue if the last Previous Hop in the trace is 717 set to 0. 719 7.5.4. Trace shorter than requested 721 If the trace that is returned is shorter than requested (i.e. the 722 number of Response blocks is smaller than the "# hops" field), the 723 trace encountered an error and could not continue. 725 7.6. Continuing after an error 727 When the NO_SPACE error occurs, the client might try to continue 728 the trace by starting it at the last hop in the trace. It can do 729 this by unicasting to this router's outgoing interface address, 730 keeping all fields the same. If this results in a single hop and a 731 "WRONG_IF" error, the client may try setting the trace destination 732 to the same outgoing interface address. 734 If a trace times out, it is likely to be because a router in the 735 middle of the path does not support multicast traceroute. That 736 router's address will be in the Previous Hop field of the last 737 entry in the last reply packet received. A client may be able to 738 determine (via mrinfo[Pusa99] or SNMP[Thal99,Thal00]) a list of 739 neighbors of the non-responding router. If desired, each of those 740 neighbors could be probed to determine the remainder of the path. 741 Unfortunately, this heuristic may end up with multiple paths, since 742 there is no way of knowing what the non-responding router's algo- 743 rithm for choosing a previous-hop router is. However, if all paths 744 but one flow back towards the non-responding router, it is possible 745 to be sure that this is the correct path. 747 7.7. Multicast Traceroute and shared-tree routing protocols 749 When using shared-tree routing protocols like PIM-SM and CBT, a 750 more advanced client may use multicast traceroute to determine 751 paths or potential paths. 753 7.7.1. PIM-SM 755 When a multicast traceroute reaches a PIM-SM RP and the RP does not 756 forward the trace on, it means that the RP has not performed a 757 source-specific join so there is no more state to trace. However, 758 the path that traffic would use if the RP did perform a source-spe- 759 cific join can be traced by setting the trace destination to the 760 RP, the trace source to the traffic source, and the trace group to 761 0. This trace Query may be unicasted to the RP. 763 7.7.2. CBT 765 When a multicast traceroute reaches a CBT Core, it must simply stop 766 since CBT does not have source-specific state. However, a second 767 trace can be performed, setting the trace destination to the traf- 768 fic source, the trace group to the group being traced, and the 769 trace source to the Core (or to 0, since CBT does not have source- 770 specific state). This trace Query may be unicasted to the Core. 771 There are two possibilities when combining the two traces: 773 7.7.2.1. No overlap 775 If there is no overlap between the two traces, the second trace can 776 be reversed and appended to the first trace. This composite trace 777 shows the full path from the source to the destination. 779 7.7.2.2. Overlapping paths 781 If there is a portion of the path that is common to the ends of the 782 two traces, that portion is removed from both traces. Then, as in 783 the no overlap case, the second trace is reversed and appended to 784 the first trace, and the composite trace again contains the full 785 path. 787 This algorithm works whether the source has joined the CBT tree or 788 not. 790 7.8. Protocol-specific considerations 792 7.8.1. DVMRP 794 DVMRP's dominant router election and route exchange guarantees that 795 DVMRP routers know whether or not they are the last-hop forwarder 796 for the link and who the previous hop is. 798 7.8.2. PIM Dense Mode 800 Routers running PIM Dense Mode do not know the path packets would 801 take unless traffic is flowing. Without some extra protocol mecha- 802 nism, this means that in an environment with multiple possible 803 paths with branch points on shared media, multicast traceroute can 804 only trace existing paths, not potential paths. When there are 805 multiple possible paths but the branch points are not on shared 806 media, the previous hop router is known, but the last hop router 807 may not know that it is the appropriate last hop. 809 When traffic is flowing, PIM Dense Mode routers know whether or not 810 they are the last-hop forwarder for the link (because they won or 811 lost an Assert battle) and know who the previous hop is (because it 812 won an Assert battle). Therefore, multicast traceroute is always 813 able to follow the proper path when traffic is flowing. 815 8. Problem Diagnosis 817 8.1. Forwarding Inconsistencies 819 The forwarding error code can tell if a group is unexpectedly 820 pruned or administratively scoped. 822 8.2. TTL problems 824 By taking the maximum of (hops from source + forwarding TTL thresh- 825 old) over all hops, you can discover the TTL required for the 826 source to reach the destination. 828 8.3. Packet Loss 830 By taking two traces, you can find packet loss information by com- 831 paring the difference in input packet counts to the difference in 832 output packet counts at the previous hop. On a point-to-point 833 link, any difference in these numbers implies packet loss. Since 834 the packet counts may be changing as the trace query is propagat- 835 ing, there may be small errors (off by 1 or 2) in these statistics. 836 However, these errors will not accumulate if multiple traces are 837 taken to expand the measurement period. On a shared link, the 838 count of input packets can be larger than the number of output 839 packets at the previous hop, due to other routers or hosts on the 840 link injecting packets. This appears as "negative loss" which may 841 mask real packet loss. 843 In addition to the counts of input and output packets for all mul- 844 ticast traffic on the interfaces, the response data includes a 845 count of the packets forwarded by a node for the specified source- 846 group pair. Taking the difference in this count between two traces 847 and then comparing those differences between two hops gives a mea- 848 sure of packet loss just for traffic from the specified source to 849 the specified receiver via the specified group. This measure is 850 not affected by shared links. 852 On a point-to-point link that is a multicast tunnel, packet loss is 853 usually due to congestion in unicast routers along the path of that 854 tunnel. On native multicast links, loss is more likely in the out- 855 put queue of one hop, perhaps due to priority dropping, or in the 856 input queue at the next hop. The counters in the response data do 857 not allow these cases to be distinguished. Differences in packet 858 counts between the incoming and outgoing interfaces on one node 859 cannot generally be used to measure queue overflow in the node. 861 8.4. Link Utilization 863 Again, with two traces, you can divide the difference in the input 864 or output packet counts at some hop by the difference in time 865 stamps from the same hop to obtain the packet rate over the link. 866 If the average packet size is known, then the link utilization can 867 also be estimated to see whether packet loss may be due to the rate 868 limit or the physical capacity on a particular link being exceeded. 870 8.5. Time delay 872 If the routers have synchronized clocks, it is possible to estimate 873 propagation and queuing delay from the differences between the 874 timestamps at successive hops. However, this delay includes con- 875 trol processing overhead, so is not necessarily indicative of the 876 delay that data traffic would experience. 878 9. Implementation-specific Caveats 880 Some routers with distributed forwarding architectures may not update 881 the main processor's packet counts often enough for the packet counters 882 to be meaningful on a small time scale. This can be recognized during a 883 periodic trace by seeing positive loss in one trace and negative loss in 884 the next, with no (or small) net loss over a longer interval. The sug- 885 gested solution to this problem is to simply collect statistics over a 886 longer interval. 888 In the multicast extensions for SunOS 4.1.x from Xerox PARC, which are 889 the basis for many UNIX-based multicast routers, both the output packet 890 count and the packet forwarding count for the source-group pair are 891 incremented before priority dropping for rate limiting occurs and before 892 the packets are put onto the interface output queue which may overflow. 893 These drops will appear as (positive) loss on the link even though they 894 occur within the router. 896 In release 3.3/3.4 of the UNIX multicast extensions, a multicast packet 897 generated on a router will be counted as having come in an interface 898 even though it did not. This can create the appearance of negative loss 899 even on a point-to-point link. 901 In releases up through 3.5/3.6, packets were not counted as input on an 902 interface if the reverse-path forwarding check decided that the packets 903 should be dropped. That causes the packets to appear as lost on the 904 link if they were output by the upstream hop. This situation can arise 905 when two routers on the path for the group being traced are connected by 906 a shared link, and the path for some other group does not flow between 907 those two routers because the downstream router receives packets for the 908 other group on another interface, but the upstream router is the elected 909 forwarder to other routers or hosts on the shared link. 911 The packet counts for source/group pairs are generally kept in router 912 forwarding caches. These cache entries may be occasionally garbage-col- 913 lected on routers, so a multicast traceroute client should be prepared 914 to see packet counts decrease. If a long-running traceroute is keeping 915 a "base" to compare against, it should use the post-reset trace as the 916 new "base", as previous values returned by this hop are no longer valid. 917 In addition, it may choose to discard the data for all other hops to 918 cover the same amount of time for all hops. 920 Some routers (notably the obsolete mrouted 3.3 and 3.4) can constantly 921 reset these packet counts. A client might want to detect routers that 922 are constantly resetting and simply fail to collect statistics for that 923 hop (instead of allowing it to cause all other data to be discarded). 925 Some routers send byte-swapped counter values. If the difference 926 between a pair of measurements is extremely large, a traceroute client 927 may want to see if the difference is more reasonable when byte-swapped. 928 Note that this heuristic may start misfiring when packet rates get high, 929 so implementations may want to only attempt this heuristic when the 930 packet rate is much different on one router than on surrounding routers. 932 Some implementations (e.g. UNIX mrouted 3.8 and before) return incorrect 933 time values; the difference between the time values for the same hop in 934 two traces may have no relationship with the amount of time that passed 935 between making the traces. Implementations should check that time val- 936 ues look valid before using them. 938 10. Acknowledgments 940 This specification started largely as a transcription of Van Jacobson's 941 slides from the 30th IETF, and the implementation in mrouted 3.3 by Ajit 942 Thyagarajan. Van's original slides credit Steve Casner, Steve Deering, 943 Dino Farinacci and Deb Agrawal. A multicast traceroute client, mtrace, 944 has been implemented by Ajit Thyagarajan, Steve Casner and Bill Fenner. 946 The idea of unicasting a multicast traceroute Query to the destination 947 of the trace with Router Alert set is due to Tony Ballardie. The idea 948 of the "S" bit to allow statistics for a source subnet is due to Tom 949 Pusateri. 951 11. IANA Considerations 953 11.1. Routing Protocols 955 The IANA is responsible for allocating new Routing Protocol codes. 956 The Routing Protocol code is somewhat problematic, since in the 957 case of protocols like CBT and PIM it must encode both a unicast 958 routing algorithm and a multicast tree-building protocol. The 959 space was not divided into two fields because it was already small 960 and some combinations (e.g. DVMRP) would be wasted. 962 Routing Protocol codes should be allocated for any combination of 963 protocols that are in common use in the Internet. 965 11.2. Forwarding Codes 967 New Forwarding codes must only be created by an RFC that modifies 968 this document's section 7, fully describing the conditions under 969 which the new forwarding code is used. The IANA may act as a cen- 970 tral repository so that there is a single place to look up forward- 971 ing codes and the document in which they are defined. 973 12. Security Considerations 975 12.1. Topology discovery 977 mtrace can be used to discover any actively-used topology. If your 978 network topology is a secret, mtrace may be restricted at the bor- 979 der of your domain, using the ADMIN_PROHIB forwarding code. 981 12.2. Traffic rates 983 mtrace can be used to discover what sources are sending to what 984 groups and at what rates. If this information is a secret, mtrace 985 may be restricted at the border of your domain, using the 986 ADMIN_PROHIB forwarding code. 988 12.3. Unicast replies 990 The "Response address" field may be used to send a single packet 991 (the traceroute Reply packet) to an arbitrary unicast address. It 992 is possible to use this facility as a packet amplifier, as a small 993 multicast traceroute Query may turn into a large Reply packet. 995 13. References 997 Brad88 Braden, B., D. Borman, C. Partridge, "Computing the 998 Internet Checksum", RFC 1071, ISI, September 1988. 1000 Brad97 Bradner, S., "Key words for use in RFCs to Indicate 1001 Requirement Levels", RFC 2119/BCP 14, Harvard University, 1002 March 1997. 1004 Katz97 Katz, D., "IP Router Alert Option," RFC 2113, Cisco Sys- 1005 tems, February 1997. 1007 Pusa99 Pusateri, T., "DVMRP Version 3", work in progress, 1008 September 1999. 1010 Thal00 Thaler, D., "PIM MIB", work in progress, July 2000. 1012 Thal99 Thaler, D., "DVMRP MIB", work in progress, October 1999. 1014 14. Authors' Addresses 1016 William C. Fenner 1017 AT&T Labs -- Research 1018 75 Willow Rd. 1019 Menlo Park, CA 94025 1020 United States 1021 Email: fenner@research.att.com 1023 Stephen L. Casner 1024 Packet Design, Inc. 1025 3400 Hillview Avenue, Building 3 1026 Palo Alto, CA 94304 1027 Email: casner@packetdesign.com 1029 15. Change History 1031 (To be removed before publication as RFC) 1033 15.1. Changes from draft-fenner-traceroute-ipm-00.txt: 1035 - Updated boilerplate 1037 - None, just republish 1039 15.2. Changes from draft-ietf-idmr-traceroute-ipm-07.txt: 1041 - Renamed to individual submission draft-fenner-traceroute-ipm-00.txt 1043 - None, just republish 1045 15.3. Changes from draft-ietf-idmr-traceroute-ipm-06.txt: 1047 - Added implementation-specific notes as suggested by Dave Thaler: 1049 - Forwarding cache entries going away while traffic is flowing, 1050 causing reset counters. 1052 - mrouted 3.3 and 3.4 constant resets 1054 - byte-swapped counters 1056 - bogus time due to missed ntohl() parenthesis in mrouted <= 3.8 1058 - Add example of ALL-[protocol]-ROUTERS.MCAST.NET for the multicast- 1059 on-prev-hop. (Maybe this isn't important any more; PIM used to be 1060 allowed to not know the proper prev hop but that's not true any 1061 more) 1063 15.4. Changes from draft-ietf-idmr-traceroute-ipm-05.txt: 1065 - Changes section added. 1067 - Updated abstract 1069 - Added mention of up-to-date packet counts, in particular allowing 1070 the delay of an mtrace packet while the counts are fetched in a 1071 distributed architecture. 1073 - Added mention of ifInMulticastPkts, ifOutMulticastPkts, and ipM- 1074 RoutePkts for clarification of what counts should be used. 1076 - Note that the dropping of duplicate Queries MAY be a 1-back cache 1077 and that duplicate Requests MUST NOT be dropped 1079 - Add no-space processing rule 1081 - Note that it's not an error for there to be more blocks than 1082 requested, just send it back after adding yours. 1084 - Clean up some of section 8 - move implementation-specific stuff to 1085 a separate section, rename "Congestion" to "Packet Loss", note that 1086 time delay isn't actually that useful. 1088 Copyright (C) The Internet Society (2005). This document is subject to 1089 the rights, licenses and restrictions contained in BCP 78, and except as 1090 set forth therein, the authors retain all their rights. 1092 This document and the information contained herein are provided on an 1093 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR 1094 IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGI- 1095 NEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUD- 1096 ING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1097 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- 1098 CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.