idnits 2.17.1 draft-elkins-v6ops-ipv6-packet-sequence-needed-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 30, 2013) is 3984 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT N. Elkins 3 Intended Status: Informational Inside Products 4 M. Ackermann 5 BCBS Michigan 6 K. Haining 7 US Bank 8 S. Perdomo 9 DTCC 10 W. Jouris 11 Inside Products 12 D. Boyes 13 Sine Nomine 14 Expires: November 30, 2013 May 30, 2013 16 IPv6 Packet Sequence Number Needed 17 draft-elkins-v6ops-ipv6-packet-sequence-needed-00 19 Abstract 21 For a number of Enterprise Data Center Operators (EDCO) both real- 22 time and after the fact problem resolution is critical. Two metrics 23 are critical for timely end-to-end problem resolution, without 24 impacting an operational production network. They are: packet 25 sequence number and packet timestamp. Packet sequence number is 26 required for diagnostics. Packet timestamp is required to calculate 27 end-to-end response time. Current methods are inadequate for these 28 purposes because they assume unreasonable access to intermediate 29 devices, are cost prohibitive, require infeasible changes to a 30 running production network, or do not provide timely data. This 31 document provides the background and rationale for the packet 32 sequence number which is a part of the IPv6 Performance and 33 Diagnostic Metrics Destination Option (PDM). 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that 42 other groups may also distribute working documents as 43 Internet-Drafts. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 50 The list of current Internet-Drafts can be accessed at 51 http://www.ietf.org/1id-abstracts.html 53 The list of Internet-Draft Shadow Directories can be accessed at 54 http://www.ietf.org/shadow.html 56 Copyright and License Notice 58 Copyright (c) 2013 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 74 1.1 Why Packet Sequence Number . . . . . . . . . . . . . . . . . 3 75 1.2 IPv4 IPID : DeFacto Sequence Number . . . . . . . . . . . . 4 76 1.2.1 Description of IPID in IPv4 . . . . . . . . . . . . . . 4 77 1.2.2 DeFacto Use of IPID . . . . . . . . . . . . . . . . . . 4 78 1.2.3 Merits of DeFacto Usage . . . . . . . . . . . . . . . . 5 79 1.2.4 Use Cases of IPv4 IPID in Diagnostics . . . . . . . . . 5 80 1.3 TCP sequence number is not enough . . . . . . . . . . . . . 6 81 1.4 Inadequacy of current measurement techniques . . . . . . . . 7 82 1.4.1 SNMP / CMIP Counters . . . . . . . . . . . . . . . . . . 7 83 1.4.2 Router / Firewall Logs . . . . . . . . . . . . . . . . . 7 84 1.4.3 Netflow . . . . . . . . . . . . . . . . . . . . . . . . 7 85 1.4.4 Access to Intermediate Devices . . . . . . . . . . . . . 8 86 1.4.5 Modifications to an Operational Production Network . . . 8 87 2 Solution Parameters . . . . . . . . . . . . . . . . . . . . . . 9 88 2.1 Packet Trace Meets Criteria . . . . . . . . . . . . . . . . 9 89 2.1.1 Limitations of Packet Capture . . . . . . . . . . . . . 9 90 2.1.2 Problem Scenario 1 . . . . . . . . . . . . . . . . . . . 9 91 2.1.3 Problem Scenario 2 . . . . . . . . . . . . . . . . . . . 11 92 3 Rationale for Proposed Solution (PDM) . . . . . . . . . . . . . 11 93 4 Backward Compatibility . . . . . . . . . . . . . . . . . . . . 11 94 5 Security Considerations . . . . . . . . . . . . . . . . . . . . 12 95 6 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 12 96 7 References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 97 7.1 Normative References . . . . . . . . . . . . . . . . . . . 12 98 8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 12 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 101 1 Background 103 To diagnose problems for a number of Enterprise Data Center Operators 104 (EDCO) two metrics are critical for timely end-to-end problem 105 resolution, both real-time and after the fact, without impacting an 106 operational production network. They are: packet sequence number and 107 packet timestamp. Packet sequence number is required for diagnostics. 108 Packet timestamp is required to calculate end-to-end response time. 110 This document provides the background and rationale for the packet 111 sequence number which is a part of the IPv6 Performance and 112 Diagnostic Metrics destination option (PDM). 114 For background, please see Draft-Elkins-6MAN-IPv6-PDM-Dest-Option-00 115 [PDMELK], Draft-Elkins-End-To-End-Response-Time-00 [RSPELK], and 116 Draft-Elkins-PDM-Recommended-Usage-00 [USEELK]. These drafts are 117 companion documents to this document. All four documents should be 118 read together. 120 As discussed in the above Internet Drafts, current methods are 121 inadequate for these purposes because they assume unreasonable access 122 to intermediate devices, are cost prohibitive, require infeasible 123 changes to a running production network, or do not provide timely 124 data. The IPv6 Performance and Diagnostic Metrics destination option 125 (PDM) provides a solution to these problems. This document will 126 detail the background and need for the packet sequence number. 128 1.1 Why Packet Sequence Number 130 In many EDCO networks, during network diagnostics of an end-to-end 131 connection, it becomes necessary to find the device along the network 132 path creating problems. Diagnostic data may be collected at multiple 133 places along the path (if possible), or at the source and 134 destination. Then, the diagnostic data must be matched. Packet 135 sequence number is critical in this matching process. The timestamp 136 or even the IP addresses may be different at different devices. In 137 IPv4 networks, the IPID field was used as a de facto sequence number. 138 This will be discussed at greater length in section 1.2. 140 This method of data collection along the path is of special use on 141 large multi-tier networks to determine where packet loss or packet 142 corruption is happening. Multi-tier networks are those which have 143 multiple routers or switches on the path between the sender and the 144 receiver. 146 1.2 IPv4 IPID : DeFacto Sequence Number 148 With IPv4 networks, on many stack implementations, but not all, the 149 IPID field has the property of sequentiality. 151 1.2.1 Description of IPID in IPv4 153 In IPv4, the 16 bit IP Identification (IPID) field is located at an 154 offset of 4 bytes into the IPv4 header and is described in RFC0791 155 [RFC0791]. In IPv6, the IPID field is a 32-bit field contained in the 156 Fragment Header defined by section 4.5 of RFC2460 [RFC2460]. 157 Unfortunately, unless fragmentation is being done by the source node, 158 the IPv6 packet will not contain this Fragment Header, and therefore 159 will have no Identification field. 161 The intended purpose of the IPID field, in both IPv4 and IPv6, is to 162 enable fragmentation and reassembly, and as currently specified is 163 required to be unique within the maximum segment lifetime (MSL) on 164 all datagrams. The MSL is often 2 minutes. 166 1.2.2 DeFacto Use of IPID 168 In many EDCO networks, the IPID field is used for more than 169 fragmentation. During network diagnostics, packet traces may be 170 taken at multiple places along the path, or at the source and 171 destination. Then, packets can be matched by looking at the IPID. 173 The inclusion of the IPID makes it easier for a device(s) in the 174 middle of the network, or on the receiving end of the network, to 175 identify flows belonging to a single node, even if that node might 176 have a different IP address. For example, in the case of sessions 177 going through a NAT or proxy server. 179 For its de-facto diagnostic mode usage, the IPID field needs to be 180 available whether or not fragmentation occurs. It also needs to be 181 unique in the context of the session, and across all the connections 182 controlled by the stack. In IPv4, the IPID is in the main header, so 183 it is available for all packets. As it is a 16-bit field, it wrapped 184 during the course of the session and thus had some limitations. 186 Even with these limitations, the IPID has been valuable and useful in 187 IPv4 for diagnostics and problem resolution. It is a practical 188 solution that is 'good enough' in many instances. Not having it 189 available in IPv6, may be a major detriment to new IPv6 deployments 190 and contribute to protracted downtimes in existing IPv6 operations. 192 1.2.3 Merits of DeFacto Usage 194 As network technology evolves, the uses to which fields are put can 195 change as well. De-facto use is powerful, and should not be lightly 196 ignored. In fact, it is a testament to the power and pervasiveness 197 of the protocol that users create new uses for the original 198 technology. 200 For example, the use of the IPID goes beyond the vision of the 201 original authors. This sort of thing has happened with numerous 202 other technologies and protocols. 204 The implementation of the traceroute command sends ICMP echo packets 205 with a varying TTL. This is a very useful for diagnostics yet 206 departs from the original purpose of TTL. 208 Similarly, cell phones have evolved to be more than just a means of 209 vocal communication, including Internet communications, photo- 210 sharing, stock exchange transactions, etc. Indeed, the Internet 211 itself has evolved, from a small network for researchers and the 212 military to share files into the pervasive global information 213 superhighway that it is today. 215 1.2.4 Use Cases of IPv4 IPID in Diagnostics 216 Use Case # 1 --- Large Insurance Company 217 - (estimated time saved by use of IPID: 7 hours) 219 Performance Tool produces extraneous packets 221 - Issue was whether a performance tool was accurately replicating 222 session flow during performance testing. 223 - Trace IPIDs showed more unique packets within same flow from 224 performance tool compared to IE Browser. 225 - Having the clear IPID sequence numbers also showed where and why 226 the extra packets were being generated. 227 - Solution: Problem rectified in subsequent version of performance 228 tool. 229 - Without IPID, it was not clear if there was an issue at all. 231 Use Case #2 --- Large Bank 232 - (estimated time saved by use of IPID: 4 hours) 234 Batch transfer duration increases 12x 236 - A data transfer which formerly took 30 minutes to complete 237 started taking 6-8 hours to complete. 238 - Was there packet loss? All the vendors said no. 239 - The other applications on the network did not report any 240 problems. 241 - 4 trace points were used, and the IPIDs in the packets were 242 compared. 243 - The comparison showed 7% packet loss. 244 - Solution: WAN hardware was replaced and problem fixed. 245 - Without IPID, no one would agree a problem existed 247 Use Case #3 --- Large Bank 248 - (estimated time saved by use of IPID: 6 hours) 250 Very slow interactive performance 252 - All network links looked good. 253 - Traces showed duplicated small packets (which can be OK). 254 - We saw that the IPID was the same in both packets but the TTL 255 was always + 1. 256 - A network device was "splitting" only small packets over two 257 interfaces. 258 - The small packets were control info, telling other side to slow 259 down. 260 - It erroneously looked like network congestion. 261 - Solution: Network device replaced and good interactive 262 performance restored. 263 - Without IPID, flows would have appeared OK. 265 Use Case #4 --- Large Government Agency 266 - (estimated time saved by use of IPID: 9 hours) 268 VPN drops 270 - Cell phone connections to law enforcement were being dropped. 271 The connections were going through a VPN. 272 - All parties (both sides of VPN connection, application, etc.) said 273 it was not their problem. The problem went on for weeks. 274 - Finally, we took a trace which showed packets with IPID and TTL 275 that did not match others in the flow AT ALL coming from the 276 router nearest the application server end of VPN. 277 - Solution: Provider for VPN for application server changed. 278 Problem resolved. 279 - Without IPID, much harder to diagnose problem. 280 - (Same case also happened with large corporation. Again, all 281 parties saying not their fault until proven via packet trace.) 283 1.3 TCP sequence number is not enough 285 TCP Sequence number is defined in RFC0793 [RFC0793]. Some have 286 proposed that this field will meet the needs of EDCO networks for a 287 packet sequence number. Indeed, the TCP Sequence Number along with 288 the TCP Acknowledgment number can be used to calculate dropped 289 packets, duplicate packets, out-of-order packets etc. That is, IF the 290 packet flow itself reflects accurately what happened on the wire! 292 See Scenario 1 (Section 1.5.2) and Scenario 2 (Section 1.5.3) for 293 what happens with packet trace capture in real networks. 295 The TCP Sequence Number is, obviously, available only for TCP and not 296 other transport protocols. 298 1.4 Inadequacy of current measurement techniques 300 The question arises of whether current methods of instrumentation 301 cannot be used without a change to the protocol. Current methods of 302 measuring network data, other than packet traces, are inadequate 303 because they assume unreasonable access to intermediate devices, are 304 cost prohibitive, require infeasible changes to a running production 305 network, or do not provide timely data. This section will discuss 306 each of these in detail. 308 Current methods include both instrumentation and third party 309 products. These include SNMP, CMIP, router logs, and firewall logs. 311 1.4.1 SNMP / CMIP Counters 313 The traditional network performance counters measured by SNMP or CMIP 314 do not provide information at the granularity desired on the behavior 315 of application flows across the network. The problem is that such 316 counters do not contain enough data be able to provide a detailed and 317 realistic view of the end-to-end behavior of a connection. 319 1.4.2 Router / Firewall Logs 321 Router and firewall logs may provide some information for diagnostics 322 But as discussed in section 1.4.5, routers and firewalls in a 323 production network are generally set to do minimal logging and 324 diagnostics to allow maximum efficiency and throughput. Such devices 325 cannot be asked to collect detailed data for an operational problem, 326 as this requires a change to a production network. 328 1.4.3 Netflow 330 Netflow is instrumentation which is available from some middle 331 devices. As discussed in detail in section 1.4.5, such devices are 332 generally set to do minimal logging and diagnostics to allow maximum 333 efficiency and throughput. 335 Correlations to produce some level of response time data may be 336 possible from Netflow. But, it is not an adequate picture of end-to- 337 end response time as Netflow is in an intermediate device and is not 338 in a position to know what has happened at a client. 340 1.4.4 Access to Intermediate Devices 342 The above current methods require access to the transport 343 infrastructure - that is, the routers, switches or other intermediate 344 devices. In some cases, this is possible; in others, the connections 345 in question may cross a number of administrative entities (both in 346 the transport and in the endpoints). When it is the enterprise at 347 the endpoint which is interested in the diagnostics, the 348 administrative entities who own the devices in the middle of the path 349 have no stake in operational measurement at the enterprise or 350 application level. They have no reason to provide the necessary 351 data or to impact the basic transport with the instrumentation 352 necessary to capture flow-oriented data as a continuous stream 353 suitable for general consumption. 355 In other words, if you don't own the path end-to-end, you will not be 356 able to get the data you need if you are required to get it from the 357 devices in the middle. Not only that, the devices in the middle do 358 not have the instrumentation necessary to make it easy to do end-to- 359 end diagnostics because they are not responsible for that and so do 360 not want to burden their devices with doing those kind of functions. 362 Many EDCO networks may not own the path end-to-end. They may be 363 working with a business partner's network or crossing the Internet. 365 1.4.5 Modifications to an Operational Production Network 367 Even when the enterprise does own all the devices along the entire 368 path, to get enough data to adequately resolve a problem means 369 changing the device configuration to do detailed diagnostics. In a 370 production network, devices are generally set to do minimal logging 371 and diagnostics. This is to allow maximum efficiency and throughput. 372 The more logging and diagnostics such devices do, the fewer resources 373 they have for actually transmitting traffic across the network. 375 So, if devices are to be asked to collect more data for an 376 operational problem, this requires a change to a production network. 377 This is generally not possible as it destabilizes a critical network 378 during business hours, thus potentially disrupting many customers. 379 Making changes is usually a lengthy process requiring change control, 380 testing on a test network, etc. On networks which are critical to 381 the business function, such as the networks we are discussing, it is 382 hardly likely that changing configuration "in flight" is an option. 384 2 Solution Parameters 386 What is needed is: 388 1) A method to identify and/or track the behavior of a connection 389 without assuming access to the transport devices. 391 2) A method to observe a connection in flight without introducing 392 agents at endpoints. 394 3) a method to observe arbitrary flows at multiple points within a 395 network and correlate the results of those observations in a 396 consistent manner. 398 4) A method to signal and correlate transport issues to application 399 end-to-end behavior. 401 5) A method which does not require changes to a production network in 402 real time. 404 6) Adequate granularity in the measurement technique to provide the 405 needed metrics. 407 2.1 Packet Trace Meets Criteria 409 The only instrumentation which provides enough detail to diagnose 410 end-to-end problems is a packet trace. Packet traces do not require 411 changes to devices in production mode because in many large EDCO 412 networks, products are available to capture packets in passive mode. 413 Such products continuously monitor network traffic. Often, they are 414 used not for diagnostic reasons but for regulatory reasons. For 415 example, there may be legal requirements to log all stock exchange 416 transactions. 418 Products for packet tracing are available freely and can be used at a 419 client host without disrupting major portions of the network. 421 2.1.1 Limitations of Packet Capture 423 Even though packets are the only reliable way to provide data at the 424 needed granularity, there are limitations with collecting packet 425 traces in some situations. They are as follows: 427 2.1.2 Problem Scenario 1 429 1. Packets are captured for analysis at places like large core 430 switches. All packets are kept. Again, not necessarily for 431 diagnostic reasons but for regulatory ones. For example, records of 432 all stock trades may need to be kept for a certain number of years. 434 2. When there is a problem, an analyst extracts the needed 435 information. 437 3. If the extract is done incorrectly, as often happens, or the 438 packet capture itself is incorrect, then there may be false duplicate 439 packets which can be quite misleading and can lead to wrong 440 conclusions. Are these real TCP duplicates? Is there congestion on 441 the subnet? Are these retransmissions? Situations have been seen 442 where routers incorrectly send two packets instead of one - is this 443 such a situation? 445 2.1.3 Problem Scenario 2 447 1. In this scenario, packets are captured for analysis at places like 448 a middleware box. It may be because problems are suspected with the 449 box itself or it is a central point of the suspected failure. 451 2. The box may not offer any way to tailor the packet capture. "You 452 will get what we give you, how we give it to you!" is their 453 philosophy. 455 3. The packet capture incorrectly duplicates only packets going to 456 certain nodes. 458 4. Again, there are false duplicate packets which can be misleading 459 and can lead to wrong conclusions. Are these real TCP duplicates? Is 460 there congestion on the subnet? Situations have been seen where 461 routers incorrectly send two packets instead of one - is this such a 462 situation? 464 3 Rationale for Proposed Solution (PDM) 466 The current IPv6 specification does not provide a packet sequence 467 number or similar field in the IPv6 main header. One option might be 468 to force all IPv6 packets to contain a Fragment Header. In packets 469 which are entire in and of themselves, the fragment ID would be zero 470 - that is, an atomic fragment. Why was a new destination option 471 header defined rather than recommending that Fragment Header be used? 473 Our reasoning was that the PDM destination option header would 474 provide multiple benefits : the packet sequence number and the 475 timestamp to calculate response time. See Draft-Elkins-End-To-End- 476 Response-Time-Needed-00 [RSPELK]. 478 4 Backward Compatibility 480 The scheme proposed in this document is backward compatible with all 481 the currently defined IPv6 extension headers. According to RFC2460 482 [RFC2460], if the destination node does not recognize this option, it 483 should skip over this option and continue processing the header. 485 5 Security Considerations 487 No security considerations are seen. 489 6 IANA Considerations 491 There are no IANA considerations. 493 7 References 495 7.1 Normative References 497 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 498 1981. 500 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 501 RFC 793, September 1981. 503 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 504 (IPv6) Specification", RFC 2460, December 1998. 506 [PDMELK] Elkins, N., "Draft-Elkins-IPv6-PDM-Dest-Option-00", 507 Internet Draft, May 2013. 509 [RSPELK] Elkins, N., "Draft-Elkins-End-To-End-Response-Time-00", 511 [USEELK] Elkins, N., "Draft-Elkins-PDM-Recommended-Usage-00", 513 8 Acknowledgments 515 The authors would like to thank Rick Troth and Fred Baker 516 for their comments. 518 Authors' Addresses 520 Nalini Elkins 521 Inside Products, Inc. 522 36A Upper Circle 523 Carmel Valley, CA 93924 524 United States 525 Phone: +1 831 659 8360 526 Email: nalini.elkins@insidethestack.com 527 http://www.insidethestack.com 529 Michael S. Ackermann 530 Blue Cross Blue Shield of Michigan 531 P.O. Box 2888 532 Detroit, Michigan 48231 533 United States 534 Phone: +1 310 460 4080 535 Email: mackermann@bcbsmi.com 536 http://www.bcbsmi.com 538 Keven Haining 539 US Bank 540 16900 W Capitol Drive 541 Brookfield, WI 53005 542 United States 543 Phone: +1 262 790 3551 544 Email: keven.haining@usbank.com 545 http://www.usbank.com 547 Sigfrido Perdomo 548 Depository Trust and Clearing Corporation 549 55 Water Street 550 New York, NY 10055 551 United States 552 Phone: +1 917 842 7375 553 Email: s.perdomo@dtcc.com 554 http://www.dtcc.com 556 William Jouris 557 Inside Products, Inc. 558 36A Upper Circle 559 Carmel Valley, CA 93924 560 United States 561 Phone: +1 925 855 9512 562 Email: bill.jouris@insidethestack.com 563 http://www.insidethestack.com 565 David Boyes 566 Sine Nomine Associates 567 43596 Blacksmith Square 568 Ashburn, VA 20147 569 United States 570 Phone: +1 703 723 6673 571 dboyes@sinenomine.net 572 http://www.sinenomine.net