idnits 2.17.1 draft-elkins-ippm-pdm-metrics-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 30, 2014) is 3733 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT N. Elkins 3 B. Jouris 4 Inside Products 5 K. Haining 6 U. S. Bank 7 M. Ackermann 8 Intended Status: Proposed Standard BCBS Michigan 9 Expires: July 2014 January 30, 2014 11 IPPM Considerations for the IPv6 PDM Extension Header 12 draft-elkins-ippm-pdm-metrics-04 14 Table of Contents 16 1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 5 17 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 18 1.2 Why End-to-end Response Time is Needed . . . . . . . . . . . 5 19 1.3 Trending of Response Time Data . . . . . . . . . . . . . . . 6 20 1.4 What to measure? . . . . . . . . . . . . . . . . . . . . . . 6 21 1.5 TCP Timestamp not enough . . . . . . . . . . . . . . . . . . 6 22 1.6 Inadequacy of Current Instrumentation Technology . . . . . . 7 23 1.6.1 Synthetic transactions . . . . . . . . . . . . . . . . . 7 24 1.6.2 PING . . . . . . . . . . . . . . . . . . . . . . . . . . 7 25 1.6.3 Estimates of Network Time . . . . . . . . . . . . . . . 8 26 1.6.4 Server / Client Agents . . . . . . . . . . . . . . . . . 8 27 2 Solution Parameters . . . . . . . . . . . . . . . . . . . . . . 9 28 2.1 Rationale for proposed solution . . . . . . . . . . . . . . 9 29 2.2 Merits of timestamp / delta in PDM . . . . . . . . . . . . . 9 30 2.3 What kind of timestamp? . . . . . . . . . . . . . . . . . . 10 31 2 Why Packet Sequence Number . . . . . . . . . . . . . . . . . . 10 32 2.1 IPv4 IPID : DeFacto Sequence Number . . . . . . . . . . . . 11 33 2.1.1 Description of IPID in IPv4 . . . . . . . . . . . . . . 11 34 2.1.2 DeFacto Use of IPID . . . . . . . . . . . . . . . . . . 11 35 2.1.3 Merits of DeFacto Usage . . . . . . . . . . . . . . . . 12 36 2.1.4 Use Cases of IPv4 IPID in Diagnostics . . . . . . . . . 12 37 2.2 TCP sequence number is not enough . . . . . . . . . . . . . 14 38 2.3 Inadequacy of current measurement techniques . . . . . . . . 14 39 2.3.1 SNMP / CMIP Counters . . . . . . . . . . . . . . . . . . 15 40 2.3.2 Router / Firewall Logs . . . . . . . . . . . . . . . . . 15 41 2.3.3 Netflow . . . . . . . . . . . . . . . . . . . . . . . . 15 42 2.3.4 Access to Intermediate Devices . . . . . . . . . . . . . 15 43 2.3.4 Modifications to an Operational Production Network . . . 16 44 3 Solution Parameters . . . . . . . . . . . . . . . . . . . . . . 16 45 3.1 Packet Trace Meets Criteria . . . . . . . . . . . . . . . . 17 46 3.1.1 Limitations of Packet Capture . . . . . . . . . . . . . 17 47 3.1.2 Problem Scenario 1 . . . . . . . . . . . . . . . . . . . 17 48 3.1.2 Problem Scenario 2 . . . . . . . . . . . . . . . . . . . 17 50 4 Rationale for Proposed Solution (PDM) . . . . . . . . . . . . . 18 51 5 Performance and Diagnostic Metrics Destination Option Layout . 18 52 5.1 Destination Options Header . . . . . . . . . . . . . . . . 18 53 5.2 PDM Types . . . . . . . . . . . . . . . . . . . . . . . . . 19 54 5.3 Performance and Diagnostic Metrics Destination Option 55 (Type 1) . . . . . . . . . . . . . . . . . . . . . . . . . 19 56 5.4 Performance and Diagnostic Metrics Destination Option 57 (Type 2) . . . . . . . . . . . . . . . . . . . . . . . . . 21 58 6 Use of the PDM . . . . . . . . . . . . . . . . . . . . . . . . 24 59 6.1 Packet Identification Data . . . . . . . . . . . . . . . . . 24 60 6.2 Data in the PDM Destination Option Headers . . . . . . . . . 24 61 7 Metrics Derived from the PDM Destination Options . . . . . . . . 25 62 8 Base Derived Metrics . . . . . . . . . . . . . . . . . . . . . . 25 63 8.1 One-Way Delay . . . . . . . . . . . . . . . . . . . . . . . 25 64 8.2 Round-Trip Delay . . . . . . . . . . . . . . . . . . . . . . 25 65 8.3 Server Delay . . . . . . . . . . . . . . . . . . . . . . . . 26 66 9 Sample Implementation Flow (PDM Type 1) . . . . . . . . . . . . 26 67 9.1 Step 1 (PDM Type 1) . . . . . . . . . . . . . . . . . . . . 26 68 9.2 Step 2 (PDM Type 1) . . . . . . . . . . . . . . . . . . . . 27 69 9.3 Step 3 (PDM Type 1) . . . . . . . . . . . . . . . . . . . . 28 70 9.4 Step 4 (PDM Type 1) . . . . . . . . . . . . . . . . . . . . 29 71 9.5 Step 5 (PDM Type 1) . . . . . . . . . . . . . . . . . . . . 30 72 10 Sample Implementation Flow (PDM 2) . . . . . . . . . . . . . . 30 73 10.1 Step 1 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 30 74 10.2 Step 2 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 31 75 10.3 Step 3 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 32 76 10.4 Step 4 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 33 77 10.5 Step 5 (PDM Type 2) . . . . . . . . . . . . . . . . . . . . 34 78 11 Derived Metrics : Advanced . . . . . . . . . . . . . . . . . . 34 79 11.1 Advanced Derived Metrics : Triage . . . . . . . . . . . . . 34 80 11.2 Advanced Derived Metrics : Network Diagnostics . . . . . . 35 81 11.2.1 Retransmit Duplication (RD) . . . . . . . . . . . . . . 35 82 11.2.2 ACK Lag (AL) . . . . . . . . . . . . . . . . . . . . . 36 83 11.2.3 Third-party Connection Reset (TPCR) . . . . . . . . . . 36 84 11.2.4 Potential Hang (PH) . . . . . . . . . . . . . . . . . . 37 85 11.3 Advanced Metrics : Session Classification . . . . . . . . . 37 86 12 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 37 87 13 Security Considerations . . . . . . . . . . . . . . . . . . . 38 88 14 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 89 15 References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 90 15.1 Normative References . . . . . . . . . . . . . . . . . . . 38 91 15.2 Informative References . . . . . . . . . . . . . . . . . . 39 92 16 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 39 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39 95 Abstract 96 To diagnose performance and connectivity problems, metrics on real 97 (non-synthetic) transmission are critical for timely end-to-end 98 problem resolution. Such diagnostics may be real-time or after the 99 fact, but must not impact an operational production network. These 100 metrics are defined in the IPv6 Performance and Diagnostic Metrics 101 Destination Option (PDM). The base metrics are: packet sequence 102 number and packet timestamp. Other metrics may be derived from these 103 for use in diagnostics. This document specifies such metrics, their 104 calculation, and usage. 106 Status of this Memo 108 This Internet-Draft is submitted to IETF in full conformance with the 109 provisions of BCP 78 and BCP 79. 111 Internet-Drafts are working documents of the Internet Engineering 112 Task Force (IETF), its areas, and its working groups. Note that 113 other groups may also distribute working documents as 114 Internet-Drafts. 116 Internet-Drafts are draft documents valid for a maximum of six months 117 and may be updated, replaced, or obsoleted by other documents at any 118 time. It is inappropriate to use Internet-Drafts as reference 119 material or to cite them other than as "work in progress." 121 The list of current Internet-Drafts can be accessed at 122 http://www.ietf.org/1id-abstracts.html 124 The list of Internet-Draft Shadow Directories can be accessed at 125 http://www.ietf.org/shadow.html 127 Copyright and License Notice 129 Copyright (c) 2014 IETF Trust and the persons identified as the 130 document authors. All rights reserved. 132 This document is subject to BCP 78 and the IETF Trust's Legal 133 Provisions Relating to IETF Documents 134 (http://trustee.ietf.org/license-info) in effect on the date of 135 publication of this document. Please review these documents 136 carefully, as they describe your rights and restrictions with respect 137 to this document. Code Components extracted from this document must 138 include Simplified BSD License text as described in Section 4.e of 139 the Trust Legal Provisions and are provided without warranty as 140 described in the Simplified BSD License. 142 1 Background 144 To diagnose performance and connectivity problems, metrics on real 145 (non-synthetic) transmission are critical for timely end-to-end 146 problem resolution. Such diagnostics may be real-time or after the 147 fact, but must not impact an operational production network. The base 148 metrics are: packet sequence number and packet timestamp. Metrics 149 derived from these will be described separately. This document starts 150 with the background and rationale for the requirement for end-to-end 151 response time and packet sequence number(s). 153 Current methods are inadequate for these purposes because they assume 154 unreasonable access to intermediate devices, are cost prohibitive, 155 require infeasible changes to a running production network, or do not 156 provide timely data. The IPv6 Performance and Diagnostic Metrics 157 destination option PDM) provides a solution to these problems. 159 1.1 Terminology 161 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 162 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 163 document are to be interpreted as described in RFC 2119 [RFC2119]. 165 1.2 Why End-to-end Response Time is Needed 167 The timestamps or delta values in the PDM traveling along with the 168 packet will be used to calculate end-to-end response time, without 169 requiring agents in devices along the path. In many networks, end-to- 170 end response times are a critical component of Service Levels 171 Agreements (SLAs). 173 End-to-end response is what the user of a network system actually 174 experiences. When the end user is an individual, he is generally 175 indifferent to what is happening along the network; what he really 176 cares about is how long it takes to get a response back. But this is 177 not just a matter of individuals' personal convenience. In many 178 cases, rapid response is critical to the business being conducted. 180 When the end user is a device (e.g. with the Internet of Things), 181 what matters is the speed with which requested data can be 182 transferred -- specifically, whether the requested data can be 183 transferred in time to accomplish the desired actions. This can be 184 important when the relevant external conditions are subject to rapid 185 change. 187 Response time and consistency are not just "nice to have". On many 188 networks, the impact can be financial hardship or endanger human 189 life. In some cities, the emergency police contact system operates 190 over IP, law enforcement uses TCP/IP networks, transactions on our 191 stock exchanges are settled using IP networks. The critical nature 192 of such activities to our daily lives and financial well-being demand 193 a solution. Section 1.5 will detail the current state of end-to-end 194 response time monitoring today. 196 1.3 Trending of Response Time Data 198 In addition to the need for tracking current service, end-to-end 199 response time is valuable for capacity planning. By tracking 200 response times, and identifying trends, it becomes possible to 201 determine when network capacity is being approached. This allows 202 additional capacity to be obtained before service levels fall below 203 requirements. Without that kind of tracking, the only option is to 204 wait until there is a problem, and then scramble to get additional 205 capacity on an emergency (and probably high cost) basis. 207 1.4 What to measure? 209 End to end response time can be broken down into 3 parts: 211 - Network delay - Application (or server) delay- Client delay 213 Network delay may be one-way delay [RFC2679] or round-trip delay 214 [RFC2681]. 216 Additionally, network delay may include multiple hops. Application 217 and server delay include operating system by stack time. By and 218 large, the three timings are 'good enough' measurements to allow 219 rapid triage into the failing component. 221 Ways are available (provided by operating systems) to measure 222 Application and Client times. Network time can also be measured in 223 isolation via some of the measurement techniques described in section 224 1.5. The most difficult portion is to integrate network time with the 225 server or application times. Products exist to do this but are 226 available at an exorbitant cost, require agents, and will likely 227 become more prohibitive as the speed of networks grow and as the 228 world becomes more connected via mobile devices. 230 Measuring network time needs to occur at the end-points of the 231 transactions being measured. The time needs to be available, 232 regardless of the upper layer protocol being used by the transaction. 233 That is, it cannot be for just TCP packets. 235 1.5 TCP Timestamp not enough 237 Some suggest that the TCP Timestamp option might be sufficient to 238 calculate end-to-end response time. 240 The TCP Timestamp Option is defined in RFC1323 [RFC1323]. The reason 241 for the TCP Timestamp option is to be able to discard packets when 242 the TCP sequence number wraps. (PAWS) 244 The problems with the TCP Timestamp option are: 246 1. Not everyone turns this on. 247 2. It is only available for TCP applications 248 3. No indication of date in long-running connections. (That is 249 connections which last longer than one day) 250 4. The granularity of the timestamp is at best at millisecond level. 252 In the future, as speeds of devices and networks grow and network 253 types proliferate, TCP timestamp values, both in terms of granularity 254 and date specification, will become more and more inadequate. Even 255 today, on many networks, the timings are at microsecond level not 256 millisecond. New networks called Delay Tolerant Networks may have 257 connection times which are very large indeed - hours or even days. 259 1.6 Inadequacy of Current Instrumentation Technology 261 The current technology includes: 263 1. Synthetic transactions 264 2. Pings 265 3. Estimates of network time 266 4. Server / Client Agents 268 Let us discuss each of these in detail. 270 1.6.1 Synthetic transactions 272 Synthetic transactions, also known as active measurement, can be 273 extremely useful. However, in a dynamic network, the routes taken by 274 the packet or the current load on the application may not be the same 275 for the real transaction as when the active test was performed. For 276 example, if you time how long it takes for me to drive to work at 277 2:00am in the morning, that may not be the same as how long it takes 278 me to drive to work during rush hour at 8:00am in the morning. So, 279 it is important to have embedded measurement in the actual packet. 281 1.6.2 PING 283 An ICMP ping measures network time. First, you can PING the remote 284 device. Then you assume that the time it takes to get a response to 285 a PING is the same as the time that a transaction would take to 286 traverse the network. However, QoS rules, firewalls, etc. may mean 287 that PING, (and other synthetic transactions) may not be subject to 288 the same conditions. PINGs, though extremely useful, also measure 289 only network delays. Server delays must also be provided. 291 1.6.3 Estimates of Network Time 293 If a packet trace is done, it is possible to look at the time between 294 when a response was seen to be sent at the packet capture device and 295 when the ACK for the response comes back. 297 If you assume that the ACK took the same amount of time as the 298 original query, you have the network time. Unfortunately, the time 299 for the ACK may not be the same as the time for a much larger query 300 transaction to traverse the network. 302 The biggest problem with this method is that of TCP delayed 303 acknowledgements. If the client is doing delayed ACKs, then the ACK 304 will be held until the next request is ready to go out. In this 305 case, the time to receive the ACK has no correlation with network 306 time. 308 1.6.4 Server / Client Agents 310 There are also products which claim that they can determine end-to- 311 end response times, integrating server and network times - and indeed 312 they can do so. But they require agents which must be placed at each 313 point which is to be monitored. That is, it is necessary to add 314 those agents EVERYWHERE around the network, at a very high cost - 315 both in terms of manpower, knowledge and costs. These kind of 316 products can be purchased by only the richest 1% of the corporations. 317 As the speed of networks grow, and as the world becomes more 318 connected via mobile devices, such products will only become more 319 expensive. If, indeed, their technology can keep up. 321 There are many situations where agents cannot be deployed. Many 322 situations which demand a lightweight, cost effective solution. You 323 may think of an ISP with many customers. If the customer complains 324 of poor response time, it is much more cost-effective for the ISP to 325 simply take a packet trace with embedded diagnostics than to 326 instrument the entire customer network. 328 TCP/IP networks, including the Internet, are used throughout the 329 world. If there is not a scalable and affordable way to measure 330 performance bottlenecks and failures, the growth of these networks 331 will suffer and indeed may reach a plateau where further growth 332 becomes impossible. 334 2 Solution Parameters 336 What is needed is: 338 1) A method to identify and/or track the behavior of a connection 339 without assuming access to the transport devices. 341 2) A method to observe a connection in flight without introducing 342 agents. 344 3) a method to observe arbitrary flows at multiple points within a 345 network and correlate the results of those observations in a 346 consistent manner. 348 4) A method to signal and correlate transport issues to application 349 end-to-end behavior. 351 5) A method which does not require changes to a production network in 352 real time. 354 6) Adequate granularity in the measurement technique to provide the 355 needed metrics. 357 7) A method that is scalable to very large networks. 359 8) A method that is affordable to all. 361 2.1 Rationale for proposed solution 363 The current IPv6 specification does not provide a timestamp nor 364 similar field in the IPv6 main header or in any extension header. So, 365 we propose the IPv6 Performance and Diagnostic Metrics destination 366 option (PDM) [ELKPDM]. 368 2.2 Merits of timestamp / delta in PDM 370 Advantages include: 372 1. Less overhead than other alternatives. 373 2. Real measure of actual transactions. 374 3. Less cost to provide solutions 375 4. More accurate and complete information. 376 5. Independence from transport layer protocols. 377 6. Ability to span organizational boundaries with consistent 378 instrumentation 380 In other words, this is a solution to a long-standing problem. The 381 PDM will provide a metric which will allow those responsible for 382 network support to determine what is happening in their network 383 without expensive equipment (agents) at each device. 385 The PDM does not solve every response time issue for every situation. 386 Network connections with multiple hops will still need more granular 387 metrics, as will the differentiation between multiple components at 388 each host. That is, TCP/IP stack time vs. applications time will 389 still need to be broken out by client software. What the PDM does 390 provide is the ability to do rapid triage. That is, to determine 391 quickly if the problem is in the network or in the server or 392 application. 394 2.3 What kind of timestamp? 396 Questions arise about exactly the kind of timestamp to use. Both the 397 Network Time Protocol (NTP) [RFC5905] and Precision Time Protocol 398 (PTP) [IEEE1588] are used to provide timing on TCP/IP networks. 400 NTP has evolved within the IETF structure while PTP has evolved 401 within the Institute of Electrical and Electronics Engineers (IEEE) 402 community. By and large, operating systems such as Windows, Linux, 403 and IBM mainframe computers use NTP. These are the source and 404 destination systems for packets. Intermediate nodes such as routers 405 and switches may prefer PTP. 407 Since we are describing a new extension header for destination 408 systems, the timestamp to be used will be in accordance with NTP. The 409 document, draft-ackermann-ntp-pdm-ntp-usage [NTPPDM], discusses 410 guidelines for implementing NTP for use with the PDM. The timestamp 411 is only relevant for PDM type 1. PDM type 2 uses delta values and 412 requires no time synchronization. 414 2 Why Packet Sequence Number 416 While performing network diagnostics of an end-to-end connection, it 417 often becomes necessary to find the device along the network path 418 creating problems. Diagnostic data may be collected at multiple 419 places along the path (if possible), or at the source and 420 destination. Then, the diagnostic data must be matched. Packet 421 sequence number is critical in this matching process. The timestamp 422 or even the IP addresses may be different at different devices. In 423 IPv4 networks, the IPID field was used as a de facto sequence number. 425 This method of data collection along the path is of special use on 426 large multi-tier networks to determine where packet loss or packet 427 corruption is happening. Multi-tier networks are those which have 428 multiple routers or switches on the path between the sender and the 429 receiver. 431 2.1 IPv4 IPID : DeFacto Sequence Number 433 With IPv4 networks, on many stack implementations, but not all, the 434 IPID field has the property of sequentiality. That is, the IP stack 435 sending the packets sent them in numerical order. This was not a 436 requirement for the field, but an implementation which turned out to 437 be quite useful in diagnostics. 439 2.1.1 Description of IPID in IPv4 441 In IPv4, the 16 bit IP Identification (IPID) field is located at an 442 offset of 4 bytes into the IPv4 header and is described in RFC0791 443 [RFC0791]. In IPv6, the IPID field is a 32-bit field contained in the 444 Fragment Header defined by section 4.5 of RFC2460 [RFC2460]. 445 Unfortunately, unless fragmentation is being done by the source node, 446 the IPv6 packet will not contain this Fragment Header, and therefore 447 will have no Identification field. 449 The intended purpose of the IPID field, in both IPv4 and IPv6, is to 450 enable fragmentation and reassembly, and as currently specified is 451 required to be unique within the maximum segment lifetime (MSL) on 452 all datagrams. The MSL is often 2 minutes. 454 2.1.2 DeFacto Use of IPID 456 In a number of networks, the IPID field is used for more than 457 fragmentation. During network diagnostics, packet traces may be 458 taken at multiple places along the path, or at the source and 459 destination. Then, packets can be matched by looking at the IPID. 461 The inclusion of the IPID makes it easier to identify flows belonging 462 to a single node, even if that node might have a different IP 463 address. For example, in the case of sessions going through a NAT or 464 proxy server. 466 For its de-facto diagnostic mode usage, the IPID field needs to be 467 available whether or not fragmentation occurs. It also needs to be 468 unique in the context of the session, and across all the connections 469 controlled by the stack. In IPv4, the IPID is in the main header, so 470 it is available for all packets. As it is a 16-bit field, it wrapped 471 during the course of the session and thus had some limitations. 473 Even with these limitations, the IPID has been valuable and useful in 474 IPv4 for diagnostics and problem resolution. It is a practical 475 solution that is 'good enough' in many instances. Not having it 476 available in IPv6, may be a major detriment to new IPv6 deployments 477 and contribute to protracted downtimes in existing IPv6 operations. 479 2.1.3 Merits of DeFacto Usage 481 As network technology evolves, the uses to which fields are put can 482 change as well. De-facto use is powerful, and should not be lightly 483 ignored. In fact, it is a testament to the power and pervasiveness 484 of the protocol that users create new uses for the original 485 technology. 487 For example, the use of the IPID goes beyond the vision of the 488 original authors. This sort of thing has happened with numerous 489 other technologies and protocols. 491 The implementation of the traceroute command sends ICMP echo packets 492 with a varying TTL. This is a very useful for diagnostics yet 493 departs from the original purpose of TTL. 495 Similarly, cell phones have evolved to be more than just a means of 496 vocal communication, including Internet communications, photo- 497 sharing, stock exchange transactions, etc. Indeed, the Internet 498 itself has evolved, from a small network for researchers and the 499 military to share files into the pervasive global information 500 superhighway that it is today. 502 2.1.4 Use Cases of IPv4 IPID in Diagnostics 504 Use Case # 1 --- Large Insurance Company 506 - (estimated time saved by use of IPID: 7 hours) 508 Performance Tool produces extraneous packets 510 - Issue was whether a performance tool was accurately replicating 511 session flow during performance testing. 513 - Trace IPIDs showed more unique packets within same flow from 514 performance tool compared to IE Browser. 516 - Having the clear IPID sequence numbers also showed where and why 517 the extra packets were being generated. 519 - Solution: Problem rectified in subsequent version of performance 520 tool. 522 - Without IPID, it was not clear if there was an issue at all. 524 Use Case #2 --- Large Bank 525 - (estimated time saved by use of IPID: 4 hours) 527 Batch transfer duration increases 12x 529 - A data transfer which formerly took 30 minutes to complete started 530 taking 6-8 hours to complete. 532 - Was there packet loss? All the vendors said no. 534 - The other applications on the network did not report any problems. 536 - 4 trace points were used, and the IPIDs in the packets were 537 compared. 539 - The comparison showed 7% packet loss. 541 - Solution: WAN hardware was replaced and problem fixed. 543 - Without IPID, no one would agree a problem existed 545 Use Case #3 --- Large Bank 547 - (estimated time saved by use of IPID: 6 hours) 549 Very slow interactive performance 551 - All network links looked good. 553 - Traces showed duplicated small packets (which can be OK). 555 - We saw that the IPID was the same in both packets but the TTL was 556 always + 1. 558 - A network device was "splitting" only small packets over two 559 interfaces. 561 - The small packets were control info, telling other side to slow 562 down. 564 - It erroneously looked like network congestion. 566 - Solution: Network device replaced and good interactive performance 567 restored. 569 - Without IPID, flows would have appeared OK. 571 Use Case #4 --- Large Government Agency 573 - (estimated time saved by use of IPID: 9 hours) 575 VPN drops 577 - Cell phone connections to law enforcement were being dropped. The 578 connections were going through a VPN. 580 - All parties (both sides of VPN connection, application, etc.) said 581 it was not their problem. The problem went on for weeks. 583 - Finally, we took a trace which showed packets with IPID and TTL 584 that did not match others in the flow AT ALL coming from the 585 router nearest the application server end of VPN. 587 - Solution: Provider for VPN for application server changed. Problem 588 resolved. 590 - Without IPID, much harder to diagnose problem. Same case also 591 happened with large corporation. Again, all parties saying not 592 their fault until proven via packet trace.) 594 2.2 TCP sequence number is not enough 596 TCP Sequence number is defined in RFC0793 [RFC0793]. Some have 597 proposed that this field will meet the needs of diagnostics for a 598 packet sequence number. Indeed, the TCP Sequence Number along with 599 the TCP Acknowledgment number can be used to calculate dropped 600 packets, duplicate packets, out-of-order packets etc. That is, IF the 601 packet flow itself reflects accurately what happened on the wire! 603 See Scenario 1 (Section 1.5.2) and Scenario 2 (Section 1.5.3) for 604 what happens with packet trace capture in real networks. 606 The TCP Sequence Number is, obviously, available only for TCP and not 607 other higher layer protocols. 609 2.3 Inadequacy of current measurement techniques 611 The question arises of whether current methods of instrumentation 612 cannot be used without a change to the protocol. Current methods of 613 measuring network data, other than packet traces, are inadequate 614 because they assume unreasonable access to intermediate devices, are 615 cost prohibitive, require infeasible changes to a running production 616 network, or do not provide timely data. This section will discuss 617 each of these in detail. 619 Current methods include both instrumentation and third party 620 products. These include SNMP, CMIP, router logs, and firewall logs. 622 2.3.1 SNMP / CMIP Counters 624 The traditional network performance counters measured by SNMP or CMIP 625 do not provide information at the granularity desired on the behavior 626 of application flows across the network. The problem is that such 627 counters do not contain enough data be able to provide a detailed and 628 realistic view of the end-to-end behavior of a connection. 630 2.3.2 Router / Firewall Logs 632 Router and firewall logs may provide some information for diagnostics 633 Routers and firewalls in a production network are generally set to do 634 minimal logging and diagnostics to allow maximum efficiency and 635 throughput. Such devices cannot be asked to collect detailed data 636 for an operational problem, as this requires a change to a production 637 network. 639 2.3.3 Netflow 641 Netflow is instrumentation which is available from some middle 642 devices. In production networks, such devices are generally set to 643 do minimal logging and diagnostics to allow maximum efficiency and 644 throughput. 646 It is often also not possible to start data collection in the middle 647 of the day on a production network. 649 2.3.4 Access to Intermediate Devices 651 The above current methods require access to the transport 652 infrastructure - that is, the routers, switches or other intermediate 653 devices. In some cases, this is possible; in others, the connections 654 in question may cross a number of administrative entities (both in 655 the transport and in the endpoints). When it is the enterprise at 656 the endpoint which is interested in the diagnostics, the 657 administrative entities who own the devices in the middle of the path 658 have no stake in operational measurement at the enterprise or 659 application level. They have no reason to provide the necessary 660 data or to impact the basic transport with the instrumentation 661 necessary to capture flow-oriented data as a continuous stream 662 suitable for general consumption. 664 In other words, if you don't own the path end-to-end, you will not be 665 able to get the data you need if you are required to get it from the 666 devices in the middle. Not only that, the devices in the middle do 667 not have the instrumentation necessary to make it easy to do end-to- 668 end diagnostics because they are not responsible for that and so do 669 not want to burden their devices with doing those kind of functions. 671 Many networks may not own the path end-to-end. They may be working 672 with a business partner's network or crossing the Internet. 674 2.3.4 Modifications to an Operational Production Network 676 Even when the enterprise does own all the devices along the entire 677 path, to get enough data to adequately resolve a problem means 678 changing the device configuration to do detailed diagnostics. In a 679 production network, devices are generally set to do minimal logging 680 and diagnostics. This is to allow maximum efficiency and throughput. 681 The more logging and diagnostics such devices do, the fewer resources 682 they have for actually transmitting traffic across the network. 684 So, if devices are to be asked to collect more data for an 685 operational problem, this requires a change to a production network. 686 This is generally not possible as it destabilizes a critical network 687 during business hours, thus potentially disrupting many customers. 688 Making changes is usually a lengthy process requiring change control, 689 testing on a test network, etc. On networks which are critical to 690 the business function, changing configuration "in flight" is 691 generally not an option. 693 3 Solution Parameters 695 What is needed is: 697 1) A method to identify and/or track the behavior of a connection 698 without assuming access to the transport devices. 700 2) A method to observe a connection in flight without introducing 701 agents at endpoints. 703 3) A method to observe arbitrary flows at multiple points within a 704 network and correlate the results of those observations in a 705 consistent manner. 707 4) A method to signal and correlate transport issues to application 708 end-to-end behavior. 710 5) A method which does not require changes to a production network in 711 real time. 713 6) Adequate granularity in the measurement technique to provide the 714 needed metrics. 716 3.1 Packet Trace Meets Criteria 718 The only instrumentation which provides enough detail to diagnose 719 end-to-end problems is a packet trace. Packet traces do not require 720 changes to devices in production mode because in many networks, 721 products are available to capture packets in passive mode. Such 722 products continuously monitor network traffic. Often, they are used 723 not for diagnostic reasons but for regulatory reasons. For example, 724 there may be legal requirements to log all stock exchange 725 transactions. 727 Products for packet tracing are available freely and can be used at a 728 client host without disrupting major portions of the network. 730 3.1.1 Limitations of Packet Capture 732 Even though packets are the only reliable way to provide data at the 733 needed granularity, there are limitations with collecting packet 734 traces in some situations. They are as follows: 736 3.1.2 Problem Scenario 1 738 1. Packets are captured for analysis at places like large core 739 switches. All packets are kept. Again, not necessarily for 740 diagnostic reasons but for regulatory ones. For example, records of 741 all stock trades may need to be kept for a certain number of years. 743 2. When there is a problem, an analyst extracts the needed 744 information. 746 3. If the extract is done incorrectly, as often happens, or the 747 packet capture itself is incorrect, then there may be false duplicate 748 packets which can be quite misleading and can lead to wrong 749 conclusions. Are these real TCP duplicates? Is there congestion on 750 the subnet? Are these retransmissions? Situations have been seen 751 where routers incorrectly send two packets instead of one - is this 752 such a situation? 754 4. This is the type of problem that can be solved by having an IP 755 packet sequence number. 757 3.1.2 Problem Scenario 2 759 1. In this scenario, packets are captured for analysis at places like 760 a middleware box. It may be because problems are suspected with the 761 box itself or it is a central point of the suspected failure. 763 2. The box may not offer any way to tailor the packet capture. "You 764 will get what we give you, how we give it to you!" is their 765 philosophy. 767 3. The packet capture incorrectly duplicates only packets going to 768 certain nodes. 770 4. Again, there are false duplicate packets which can be misleading 771 and can lead to wrong conclusions. Are these real TCP duplicates? Is 772 there congestion on the subnet? Situations have been seen where 773 routers incorrectly send two packets instead of one - is this such a 774 situation? 776 4 Rationale for Proposed Solution (PDM) 778 The current IPv6 specification does not provide a packet sequence 779 number or similar field in the IPv6 main header. One option might be 780 to force all IPv6 packets to contain a Fragment Header. In packets 781 which are entire in and of themselves, the fragment ID would be zero- 782 that is, an atomic fragment. Why was a new destination option header 783 defined rather than recommending that Fragment Header be used? 785 Our reasoning was that the PDM destination option header would 786 provide multiple benefits : the packet sequence number and the 787 timings to calculate response time. 789 As defined in RFC2460 [RFC2460], destination options are carried by 790 the IPv6 Destination Options extension header. Destination options 791 include optional information that need be examined only by the IPv6 792 node given as the destination address in the IPv6 header, not by 793 routers in between. 795 The PDM DOH will be carried by each packet in the network, if this is 796 configured. That is, the PDM DOH is optional. If the user of the OS 797 configures the PDM DOH to be used, then it will be carried in the 798 packet. 800 The metrics in the PDM are for 'real' or passive data. That is, they 801 are of the traffic actually traveling on the network. 803 5 Performance and Diagnostic Metrics Destination Option Layout 805 5.1 Destination Options Header 807 The IPv6 Destination Options Header is used to carry optional 808 information that need be examined only by a packet's destination 809 node(s). The Destination Options Header is identified by a Next 810 Header value of 60 in the immediately preceding header and is defined 811 in RFC2460 [RFC2460]. 813 5.2 PDM Types 815 The IPv6 Performance and Diagnostic Metrics Destination Option (PDM) 816 is an implementation of the Destination Options Header (Next Header 817 value = 60). Two types of PDM are defined. PDM type 1 requires time 818 synchronization. PDM type 2 does not require time synchronization. 820 PDM type 1 and PDM type 2 are mutually exclusive. That is, a 5-tuple 821 can either both send PDM type 1 or both send PDM type 2. 823 5.3 Performance and Diagnostic Metrics Destination Option (Type 1) 825 PDM type 1 is used to facilitate diagnostics by including a packet 826 sequence number and timestamp. 828 The PDM type 1 is encoded in type-length-value (TLV) format as 829 follows: 831 0 1 2 3 832 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 833 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 834 | Option Type | Option Length | PSN This Packet | 835 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 836 | | 837 + + 838 | | 839 + TimeStamp This Packet (64-bit) + 840 | | 841 + + 842 | | 843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 | PSN Last Packet | Reserved | 845 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 846 | | 847 + + 848 | | 849 + TimeStamp Last Packet (64-bit) + 850 | | 851 + + 852 | | 853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 855 Option Type 857 TBD = 0xXX (TBD) [To be assigned by IANA] [RFC2780] 859 Option Length 860 8-bit unsigned integer. Length of the option, in octets, excluding 861 the Option Type and Option Length fields. This field MUST be set to 862 22. 864 Packet Sequence Number This Packet (PSNTP) 866 16-bit unsigned integer. This field will wrap. It is intended for 867 human use. 869 Initialized at a random number and monotonically incremented for 870 packet on the 5-tuple. The 5-tuple consists of the source and 871 destination IP addresses, the source and destination ports, and the 872 upper layer protocol (ex. TCP, ICMP, etc). 874 Operating systems MUST implement a separate packet sequence number 875 counter per 5-tuple. Operating systems MUST NOT implement a single 876 counter for all connections. 878 Note: This is consistent with the current implementation of the IPID 879 field in IPv4 for many, but not all, stacks. 881 TimeStamp This Packet (TSTP) 883 A 64-bit unsigned integer field containing a timestamp that this 884 packet was sent by the source node. The value indicates the number 885 of seconds since January 1, 1970, 00:00 UTC, by using a fixed point 886 format. In this format, the integer number of seconds is contained 887 in the first 32 bits of the field, and the remaining 32 bits resolve 888 to picoseconds. 890 This follows timestamp formats used in Network Time Protocol (NTP) 891 [RFC5905] and SEND [RFC3971]. A discussion of how to implement NTP 892 for use with PDM header type 1 is in draft-ackermann- ntp-pdm-ntp- 893 usage-00 [NTPPDM]. 895 Implementation note: This format is compatible with the usual 896 representation of time under UNIX, although the number of bits 897 available for the integer and fraction parts in different Unix 898 implementations vary. 900 Packet Sequence Number Last Received (PSNLR) 902 16-bit unsigned integer. This is the PSN of the packet last received 903 on the 5-tuple. 905 TimeStamp Last Received (TSLR) 907 A 64-bit unsigned integer field containing a timestamp. This is the 908 timestamp of the packet last received on the 5-tuple. Format is the 909 same as TSTP. 911 5.4 Performance and Diagnostic Metrics Destination Option (Type 2) 913 The second type of IPv6 Performance and Diagnostic Metrics 914 Destination Option (PDM) is as follows. PDM type 1 and PDM type 2 915 are mutually exclusive. That is, a 5-tuple can either both send PDM 916 type 1 or both send PDM type 2. 918 PDM type 2 contains the following fields: 920 PSNTP : Packet Sequence Number This Packet 921 PSNLR : Packet Sequence Number Last Received 922 DELTALR : Delta Last Received 923 PSNLS : Packet Sequence Number Last Sent 924 DELTALS : Delta Last Sent 926 PDM destination option type 2 is encoded in type-length-value (TLV) 927 format as follows: 929 0 1 2 3 930 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 931 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 932 | Option Type | Option Length | PSN This Packet | 933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 934 | PSN Last Received | PSN Last Sent | 935 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 936 | Delta Last Received | Delta Last Sent | 937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 | TType | 939 +-+-+-+-+ 941 Option Type 943 TBD = 0xXX (TBD) [To be assigned by IANA] [RFC2780] 945 Option Length 947 8-bit unsigned integer. Length of the option, in octets, excluding 948 the Option Type and Option Length fields. This field MUST be set to 949 22. 951 Packet Sequence Number This Packet (PSNTP) 953 16-bit unsigned integer. This field will wrap. It is intended for 954 human use. 956 Initialized at a random number and monotonically incremented for 957 packet on the 5-tuple. The 5-tuple consists of the source and 958 destination IP addresses, the source and destination ports, and the 959 upper layer protocol (ex. TCP, ICMP, etc). 961 Operating systems MUST implement a separate packet sequence number 962 counter per 5-tuple. Operating systems MUST NOT implement a single 963 counter for all connections. 965 Note: This is consistent with the current implementation of the IPID 966 field in IPv4 for many, but not all, stacks. 968 Packet Sequence Number Last Received (PSNLR) 970 16-bit unsigned integer. This is the PSN of the packet last received 971 on the 5-tuple. 973 Packet Sequence Number Last Sent (PSNLS) 975 16-bit unsigned integer. This is the PSN of the packet last sent on 976 the 5-tuple. 978 Delta TimeStamp Type (TIMETYPE) 980 4-bit unsigned integer. This is the type of time contained in the 981 delta fields below. 983 0 - unknown 984 1 - time is in units of nanoseconds 985 2 - time is in units microseconds 986 3 - time is in units of milliseconds 987 4 - time is in units of seconds 988 5 - time is in units of minutes 989 6 - time is in units of hours 990 7 - time is in units of days 992 The values 5 - 7 are relevant for Delay Tolerant Networks (DTN) which 993 may operate with long delays between packets. 995 Delta Last Received (DELTALR) 996 A 16-bit unsigned integer field. This is server delay. 998 DELTALR = Send time packet 2 - Receive time packet 1 1000 The value is according to the scale in TIMETYPE. 1002 Delta Last Sent (DELTALS) 1004 A 16-bit unsigned integer field. This is round trip or end-to-end 1005 time. 1007 Delta Last Sent = Receive time packet 2 - Send time packet 1 1009 The value is in according to the scale in TIMETYPE. 1011 Option Type 1013 The two highest-order bits of the Option Type field are encoded to 1014 indicate specific processing of the option; for the PDM destination 1015 option, these two bits MUST be set to 00. This indicates the 1016 following processing requirements: 1018 00 - skip over this option and continue processing the header. 1020 RFC2460 [RFC2460] defines other values for the Option Type field. 1021 These MUST NOT be used in the PDM. The other values are as follows: 1023 01 - discard the packet. 1025 10 - discard the packet and, regardless of whether or not the 1026 packet's Destination Address was a multicast address, send an ICMP 1027 Parameter Problem, Code 2, message to the packet's Source Address, 1028 pointing to the unrecognized Option Type. 1030 11 - discard the packet and, only if the packet's Destination Address 1031 was not a multicast address, send an ICMP Parameter Problem, Code 2, 1032 message to the packet's Source Address, pointing to the unrecognized 1033 Option Type. 1035 In keeping with RFC2460 [RFC2460], the third-highest-order bit of the 1036 Option Type specifies whether or not the Option Data of that option 1037 can change en-route to the packet's final destination. 1039 In the PDM, the value of the third-highest-order bit MUST be 0. The 1040 possible values are as follows: 1042 0 - Option Data does not change en-route 1044 1 - Option Data may change en-route 1046 The three high-order bits described above are to be treated as part 1047 of the Option Type, not independent of the Option Type. That is, a 1048 particular option is identified by a full 8-bit Option Type, not just 1049 the low-order 5 bits of an Option Type. 1051 6 Use of the PDM 1053 6.1 Packet Identification Data 1055 Each packet contains information about the sender and receiver. In IP 1056 protocol the identifying information is called a "5-tuple". The 1057 flows described below are for the set of packets flowing between A 1058 and B without consideration of any other packets sent to any other 1059 device from Host A or Host B. 1061 The 5-tuple consists of: 1063 SADDR : IP address of the sender 1064 SPORT : Port for sender 1065 DADDR : IP address of the destination 1066 DPORT : Port for destination 1067 PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP, etc.) 1069 6.2 Data in the PDM Destination Option Headers 1071 The IPv6 Performance and Diagnostic Metrics Destination Option (PDM) 1072 is an implementation of the Destination Options Header (Next Header 1073 value = 60). Two types of PDM are defined. PDM type 1 requires time 1074 synchronization. PDM type 2 does not require time synchronization. 1076 PDM type 1 and PDM type 2 are mutually exclusive. That is, a 5-tuple 1077 can either both send PDM type 1 or both send PDM type 2. 1079 PDM type 1 contains the following fields: 1081 PSNTP : Packet Sequence Number This Packet 1082 TSTP : Timestamp This Packet 1083 PSNLR : Packet Sequence Number Last Received 1084 TSLR : Timestamp Last Received 1086 PDM type 2 contains the following fields: 1088 PSNTP : Packet Sequence Number This Packet 1089 PSNLR : Packet Sequence Number Last Received 1090 DELTALR : Delta Last Received 1091 PSNLS : Packet Sequence Number Last Sent 1092 DELTALS : Delta Last Sent 1094 The metrics which may be derived from these fields will be discussed 1095 in the following sections. 1097 7 Metrics Derived from the PDM Destination Options 1099 A number of metrics may be derived from the data contained in the 1100 PDM. Some are relationships between two packets, others require 1101 analysis of multiple packets or multiple protocols. 1103 These metrics fall into the following categories: 1105 1. Base derived metrics 1106 2. Metrics used for triage 1107 3. Metrics used for network diagnostics 1108 4. Metrics used for session classification 1109 5. Metrics used for end user performance optimization 1111 It must be understood that when a metric is discussed, it includes 1112 the average, median, and other statistical variations of that metric. 1114 In the next section, we will discuss the base metrics. In later 1115 sections, we will discuss the more advanced metrics and their uses. 1117 8 Base Derived Metrics 1119 The base metrics which may be derived from the PDM are: 1121 1. One-way delay 1122 2. Round-trip delay 1123 3. Server delay 1125 8.1 One-Way Delay 1127 One-way delay is the time taken to traverse the path one way between 1128 one network device to another. The path from A to B is distinguished 1129 from the path from B to A. For many reasons, the paths may have 1130 different characteristics and may have different delays. One-way 1131 delay is discussed in "A One-way Delay Metric for IPPM" [RFC2679]. 1133 8.2 Round-Trip Delay 1135 Round-trip delay is the time taken to traverse the path both ways 1136 between one network device to another. The entire delay to travel 1137 from A to B and B to A is used. Round-trip delay cannot tell if one 1138 path is quite different from another. Round-trip delay is discussed 1139 in "A Round-trip Delay Metric for IPPM" [RFC2681]. 1141 8.3 Server Delay 1143 Server delay is the interval between when a packet is received by a 1144 device and a subsequent packet is sent back in response. This may be 1145 "Server Processing Time". It may also be a delay caused by 1146 acknowledgements. Server processing time includes the time taken by 1147 the combination of the stack and application to return the response. 1149 9 Sample Implementation Flow (PDM Type 1) 1151 Following is a sample simple flow with one packet sent from Host A 1152 and one packet received by Host B. 1154 Time synchronization is required between Host A and Host B. See 1155 draft-ackermann-ntp-pdm-ntp-usage-00 [NTPPDM] for a description of 1156 how an NTP implementation may be set up to achieve good time 1157 synchronization. 1159 Each packet, in addition to the PDM, contains information on the 1160 sender and receiver. This is the 5-tuple consisting of: 1162 SADDR : IP address of the sender 1163 SPORT : Port for sender 1164 DADDR : IP address of the destination 1165 DPORT : Port for destination 1166 PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP, etc.) 1168 It should be understood that the packet identification information is 1169 in each packet. We will not repeat that in each of the following 1170 steps. 1172 9.1 Step 1 (PDM Type 1) 1174 Packet 1 is sent from Host A to Host B. The time for Host A is set 1175 initially to 10:00AM. 1177 The timestamp and packet sequence number are sent in the PDM. 1179 The initial PSNTP from Host A starts at a random number. In this 1180 case, 25. The sub-second portion of the timestamp has been omitted 1181 for the sake of simplicity. 1183 Packet 1 1185 +----------+ +----------+ 1186 | | | | 1187 | Host | ----------> | Host | 1188 | A | | B | 1189 | | | | 1190 +----------+ +----------+ 1192 PDM Contents: 1193 PSNTP : Packet Sequence Number This Packet: 25 1194 TSTP : Timestamp This Packet: 10:00:00 1195 PSNLR : Packet Sequence Number Last Received: - 1196 TSLR : Timestamp Last Received: - 1198 There are no derived statistics after packet 1. 1200 9.2 Step 2 (PDM Type 1) 1202 Packet 1 is received by Host B. The time for Host B was synchronized 1203 with Host A. Both were set initially to 10:00AM. 1205 The timestamp and PSN for the received packet are placed in the PSNLR 1206 and TSLR fields. These are from the point of view of B. That is, 1207 they indicate when the packet from A was received and which packet it 1208 was. 1210 The PDM is not sent at this point. It is only prepared. It will be 1211 sent when the response to packet 1 is sent by Host B. 1213 Packet 1 Received 1215 +----------+ +----------+ 1216 | | | | 1217 | Host | ----------> | Host | 1218 | A | | B | 1219 | | | | 1220 +----------+ +----------+ 1222 PDM Contents: 1224 PSNTP : Packet Sequence Number This Packet: - 1225 TSTP : Timestamp This Packet: - 1226 PSNLR : Packet Sequence Number Last Received: 25 1227 TSLR : Timestamp Last Received: 10:00:03 1229 At this point, the following metric may be derived: one-way delay. In 1230 fact, we now know the one-way delay and the path. We will call this 1231 path 1. This will be the outbound path from the point of view of 1232 Host A and the inbound path from the point of view of Host B. 1234 The calculation of one-way delay (path 1) is as follows: 1236 One-way delay (path 1) = Time packet 1 was received by B - Time 1237 Packet 1 was sent by A 1239 If we make the substitutions from our sample case above, then: 1241 One-way delay (path 1) = 10:00:03 - 10:00:00 or 3 seconds 1243 9.3 Step 3 (PDM Type 1) 1245 Packet 2 is sent from Host B to Host A. The initial PSNTP from Host 1246 B starts at a random number. In this case, 12. 1248 Packet 2 1250 +----------+ +----------+ 1251 | | | | 1252 | Host | <---------- | Host | 1253 | A | | B | 1254 | | | | 1255 +----------+ +----------+ 1257 PDM Contents: 1259 PSNTP : Packet Sequence Number This Packet: 12 1260 TSTP : Timestamp This Packet: 10:00:07 1261 PSNLR : Packet Sequence Number Last Received: 25 1262 TSLR : Timestamp Last Received: 10:00:03 1264 After Packet 2 is sent, the following metric may be derived: server 1265 delay. 1267 The calculation of server delay is as follows: 1269 Server delay = Time Packet 2 is sent by B - Time Packet 1 was 1270 received by B 1272 Again, making the substitutions from the sample case: Server delay = 1273 10:00:07 - 10:00:03 or 4 seconds 1275 Further elaborations of server delay may be done by limiting the data 1276 length to be greater than 1. Some protocols, for example, TCP, have 1277 acknowledgements with a data length of 0 or keep-alive packets with a 1278 data length of 1. An ACK may preceed the actual response data 1279 packet. Keep-alives may be interspersed within the data flow. 1281 9.4 Step 4 (PDM Type 1) 1283 Packet 2 is received by Host A. 1285 The timestamp and PSN for the received packet are placed in the PSNLR 1286 and TSLR fields. These are from the point of view of A. That is, 1287 they indicate when the packet from B was received and which packet it 1288 was. 1290 The PDM is not sent at this point. It is only prepared. It will be 1291 sent when the NEXT packet to Host B is sent by Host A. 1293 Packet 2 Received 1295 +----------+ +----------+ 1296 | | | | 1297 | Host | <---------- | Host | 1298 | A | | B | 1299 | | | | 1300 +----------+ +----------+ 1302 PDM Contents: 1304 PSNTP : Packet Sequence Number This Packet: - 1305 TSTP : Timestamp This Packet: - 1306 PSNLR : Packet Sequence Number Last Received: 12 1307 TSLR : Timestamp Last Received: 10:00:10 1309 However, at this point, the following metric may be derived: one-way 1310 delay (path 2). 1312 The calculation of one-way delay (path 2) is as follows: 1314 One-way delay (path 2) = Time packet 2 received by A - Time packet 2 1315 sent by B 1317 If we make the substitutions from our sample case above, then: 1319 One-way delay (path 2) = 10:00:10 - 10:00:07 or 3 seconds 1321 9.5 Step 5 (PDM Type 1) 1323 Packet 3 is sent from Host A to Host B. 1325 Packet 3 1327 +----------+ +----------+ 1328 | | | | 1329 | Host | ----------> | Host | 1330 | A | | B | 1331 | | | | 1332 +----------+ +----------+ 1334 PDM Contents: 1336 PSNTP : Packet Sequence Number This Packet: 26 1337 TSTP : Timestamp This Packet: 10:00:50 1338 PSNLR : Packet Sequence Number Last Received: 12 1339 TSLR : Timestamp Last Received: 10:00:10 1341 At this point the PDM flows across the network revealing the last 1342 received timestamp and PSN. 1344 10 Sample Implementation Flow (PDM 2) 1346 Following is a sample simple flow for PDM type 2 with one packet sent 1347 from Host A and one packet received by Host B. PDM type 2 does not 1348 require time synchronization between Host A and Host B. The 1349 calculations to derive meaningful metrics for network diagnostics is 1350 shown below each packet sent or received. 1352 Each packet, in addition to the PDM contains information on the 1353 sender and receiver. As discussed before, a 5- tuple consists of: 1355 SADDR : IP address of the sender 1356 SPORT : Port for sender 1357 DADDR : IP address of the destination 1358 DPORT : Port for destination 1359 PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP) 1361 It should be understood that the packet identification information is 1362 in each packet. We will not repeat that in each of the following 1363 steps. 1365 10.1 Step 1 (PDM Type 2) 1366 Packet 1 is sent from Host A to Host B. The time for Host A is set 1367 initially to 10:00AM. 1369 The timestamp and packet sequence number are noted by the sender 1370 internally. The packet sequence number and timestamp are sent in the 1371 packet. 1373 Packet 1 1375 +----------+ +----------+ 1376 | | | | 1377 | Host | ----------> | Host | 1378 | A | | B | 1379 | | | | 1380 +----------+ +----------+ 1382 PDM type 2 Contents: 1384 PSNTP : Packet Sequence Number This Packet: 25 1385 PSNLR : Packet Sequence Number Last Received: - 1386 DELTALR : Delta Last Received: - 1387 PSNLS : Packet Sequence Number Last Sent: - 1388 DELTALS : Delta Last Sent: - 1390 Internally, within the sender, Host A, it must keep: 1392 PSNTP : Packet Sequence Number This Packet: 25 1393 TSTP : Timestamp This Packet: 10:00:00 1395 Note, the initial PSNTP from Host A starts at a random number. In 1396 this case, 25. The sub-second portion of the timestamp has been 1397 omitted for the sake of simplicity. 1399 There are no derived statistics after packet 1. 1401 10.2 Step 2 (PDM Type 2) 1403 Packet 1 is received at Host B. His time is set to one hour later 1404 than Host A. In this case, 11:00AM 1406 Internally, within the receiver, Host B, it must keep: 1408 PSNLR : Packet Sequence Number Last Received: 25 1409 TSLR : Timestamp Last Received : 11:00:03 1411 Note, this timestamp is in Host B time. It has nothing whatsoever to 1412 do with Host A time. 1414 At this point, we have no derived statistics. In PDM type 1, the 1415 derived statistic one-way delay (path 1) could have been calculated. 1416 In PDM type 2, this is not possible because there is no time 1417 synchronization. 1419 10.3 Step 3 (PDM Type 2) 1421 Packet 2 is sent by Host B to Host A. Note, the initial PSNTP from 1422 Host B starts at a random number. In this case, 12. Before sending 1423 the packet, Host B does a calculation of deltas. Since Host B knows 1424 when it is sending the packet, and it knows when it received the 1425 previous packet, it can do the following calculation: 1427 Sending time (packet 2) - receive time (packet 1) 1429 We will call the result of this calculation: Delta Last Received. 1431 That is: 1433 DELTALR = Sending time (packet 2) - receive time (packet 1) 1435 Note, both sending time and receive time are saved internally in Host 1436 B. They do not travel in the packet. Only the Delta is in the 1437 packet. 1439 Assume that within Host B is the following: 1441 PSNLR : Packet Sequence Number Last Received: 25 1442 TSLR : Timestamp Last Received : 11:00:03 1443 PSNTP : Packet Sequence Number This Packet : 12 1444 TSTP : Timestamp This Packet : 11:00:07 1446 Hence, DELTALR becomes: 1448 4 seconds = 11:00:07 - 11:00:03 1450 Let us look at the PDM, and then we will look at the derived metrics 1451 at this point. 1453 Packet 2 1455 +----------+ +----------+ 1456 | | | | 1457 | Host | <---------- | Host | 1458 | A | | B | 1459 | | | | 1460 +----------+ +----------+ 1461 PDM Type 2 Contents: 1463 PSNTP : Packet Sequence Number This Packet: 12 1464 PSNLR : Packet Sequence Number Last Received: 25 1465 DELTALR : Delta Last Received: 4 1466 PSNLS : Packet Sequence Number Last Sent: - 1467 DELTALS : Delta Last Sent: - 1469 After Packet 2, the following metrics may be derived: 1471 Server delay = DELTALR 1473 Metrics left to be calculated are the path delay for path 2. This may 1474 be calculated when Packet 3 is sent. Clearly, if there is NO next 1475 packet for the 5-tuple, then this value will be missing. 1477 10.4 Step 4 (PDM Type 2) 1479 Packet 2 is received at Host A. Remember, its time is set to one 1480 hour earlier than Host B. It will keep internally: 1482 PSNLR : Packet Sequence Number Last Received: 12 1483 TSLR : Timestamp Last Received : 10:00:12 1485 Note, this timestamp is in Host A time. It has nothing whatsoever to 1486 do with Host B time. 1488 At this point, we have two derived metrics: 1490 1. Two-way delay or Round Trip time 1491 2. Total end-to-end time 1493 The formula for end-to-time is: 1495 Time Last Received - Time Last Sent 1497 For example, packet 25 was sent by Host A at 10:00:00. Packet 12 was 1498 received by Host A at 10:00:12 so: 1500 End-to-End response time = 10:00:12 - 10:00:00 or 12 1502 This derived metric we will call DELTALS or Delta Last Sent. 1504 To calculate two-way delay, the formula is: 1506 Two-way delay = DELTALS - DELTALR 1508 Or: 1510 Two-way delay = 12 - 4 or 8 1512 Now, the only problem is that at this point all metrics are in the 1513 Host and not exposed in a packet. To do that, we need a third packet. 1515 10.5 Step 5 (PDM Type 2) 1517 Packet 3 is sent from Host A to Host B. 1519 Packet 3 1521 +----------+ +----------+ 1522 | | | | 1523 | Host | ----------> | Host | 1524 | A | | B | 1525 | | | | 1526 +----------+ +----------+ 1528 PDM Type 2 Contents: 1530 PSNTP : Packet Sequence Number This Packet: 26 1531 PSNLR : Packet Sequence Number Last Received: 12 1532 DELTALR : Delta Last Received: * 1533 PSNLS : Packet Sequence Number Last Sent: 25 1534 DELTALS : Delta Last Sent: 12 1536 11 Derived Metrics : Advanced 1538 A number of more advanced metrics may be derived from the data 1539 contained in the PDM. Some are relationships between two packets, 1540 others require analysis of multiple packets. The more advanced 1541 metrics fall into the categories shown below: 1543 1. Metrics used for triage 1544 2. Metrics used for network diagnostics 1545 3. Metrics used for session classification 1546 4. Metrics used for end user performance optimization 1548 We will discuss each of these in turn. 1550 11.1 Advanced Derived Metrics : Triage 1552 In this case, triage means to distinguish between problems occurring 1553 on the network paths or the server. The PDM provides one-way delay 1554 and server delay. This will enable distinguishing which path is a 1555 bottleneck as well as whether the server is a bottleneck. 1557 11.2 Advanced Derived Metrics : Network Diagnostics 1559 The data provided by the PDM may be used in combination with data 1560 fields in other protocols. We will call this Inter-Protocol Network 1561 Diagnostics (IPND). 1563 The PDM also allows us to use only a single trace point for a number 1564 of diagnostic situations where today we need to trace at multiple 1565 points to get required data. In diagnostics, there is often the 1566 question of did the end device really send the packet and it got lost 1567 in the network or did it not send it at all. 1569 So, what is done is that diagnostic traces are run at both client and 1570 server to get the required data. With the data provided by the PDM, 1571 in a number of the cases, this will not be necessary. 1573 For example, taking PDM values along with data fields in the TCP 1574 protocol, the following may be found: 1576 1. Retransmit duplication (RD) 1577 2. ACK lag (AL) 1578 3. Third-party connection reset (TPCR) 1579 4. Elapsed time connection reset (ETCR) 1581 A description of these follows. 1583 11.2.1 Retransmit Duplication (RD) 1585 The TCP protocol will retransmit segments given indications from the 1586 partner that it has not received them. The retransmitted segments 1587 contain the TCP sequence number and acknowledgement. The sequence 1588 number is started at a random number and increased by the amount of 1589 data sent in each packet. 1591 Consider the following scenario. There is a packet sequence number 1592 in the packet at the IP layer. This is in the PDM that we have 1593 defined. The TCP sequence number already exists in the protocol. 1595 Host A sends the following packets: 1597 IP PSN 20, TCP SEQ 10 1598 IP PSN 21, TCP SEQ 11 1599 IP PSN 22, TCP SEQ 12 1601 Host B receives: 1603 IP PSN 20, TCP SEQ 10 1604 IP PSN 22, TCP SEQ 12 1606 Host B indicates to Host A to resend packet with TCP SEQ 2. 1607 Retransmits are done at the TCP layer. 1609 Host A sends the following packet: 1611 IP PSN 23, TCP SEQ 11 1613 The packet never reaches B. B waits until a timeout for retransmits 1614 expires. It asks for the packet again. 1616 Host A sends the following packet: 1618 IP PSN 24, TCP SEQ 11 1620 This time, it reaches Host B. Having the combination of PSN (as 1621 provided in the PDM) and the TCP sequence number allows us to see 1622 whether the problem is that the network is losing the packet or 1623 somehow, the sender is not sending the packet correctly. 1625 As we said before, this also allows us a single trace point rather 1626 than at the client and server to get the required data. 1628 11.2.2 ACK Lag (AL) 1630 Some protocols, such as TCP, acknowledge packets. The PDM will allow 1631 or a calculation of rate of ACKs. Clients can be reconfigured to 1632 optimize acknowledgements and to speed traffic flow. 1634 11.2.3 Third-party Connection Reset (TPCR) 1636 Connections may be aborted by a packet containing a particular flag. 1637 In the TCP protocol, this is the RESET flag. Sometimes a third- 1638 party, for example, a VPN router, will abort the connection. This 1639 may happen because the router is overloaded, the traffic is too 1640 noisy, or other reasons. This can also be quite hard to detect 1641 because the third-party will spoof the address of the sender. 1643 Much time can be spent by the two endpoints pointing fingers at the 1644 other for having dropped the connection. 1646 Such a third-party spoofer would likely not have the PDM Destination 1647 Option. Routers and other middle boxes are not required to support 1648 the Destination Options Extension Header. Even if a PDM DOH was 1649 generated, it would most likely violate the pattern of PSNs and time 1650 stamps being used. This would be a clue to the diagnostician that 1651 the TPCR event has occurred. 1653 11.2.4 Potential Hang (PH) 1655 Connections may be aborted by a packet containing a particular flag. 1656 In the TCP protocol, this is the RESET flag. Sometimes this is done 1657 because a set amount of time has elapsed without activity. The PSN in 1658 the PDM can be used to determine the last packet sent by the partner 1659 and if a response is required -- a "hang" situation. 1661 This can be distinguished from connections which are set to be 1662 aborted after a certain period of inactivity. 1664 11.3 Advanced Metrics : Session Classification 1666 The PDM may be used to classify sessions as follows: 1668 One way traffic flow 1669 Two way traffic flow 1670 One way traffic flow with keep-alive 1671 Two way traffic flow with keep-alive 1672 Multiple send traffic flow 1673 Multiple receive traffic flow 1674 Full duplex traffic flow 1675 Half duplex traffic flow 1677 Immediate ACK data flow 1678 Delayed ACK data flow 1679 Proxied ACK data flow 1681 A session classification system will assist the network 1682 diagnostician. This system will also help in categorizing the server 1683 delay. 1685 12 Use Cases 1687 The scheme outlined above can also handle the following types of 1688 cases: 1690 1. Host clocks not synchronized (shown above) 1691 2. IP fragmentation 1692 3. Multiple sends from one side (multiple segments) 1693 4. Out of order segments 1694 5. Retransmits 1695 6. One-way transmit only (ex. FTP) 1696 7. One-way transmit only 1697 (e.g.real time transports and streaming protocols) 1698 8. Duplicate ACKs 1699 9. Duplicate segments 1700 10. Delayed ACKs 1701 11. ACKs preceeding send for another reason 1702 12. Proxy servers 1703 13. Full duplex traffic 1704 14. Keep alive (0 / 1 byte segments, larger segments) 1705 15. No response from other side 1706 16. Drop without retransmit (real time transports) 1707 17. Looped packets (where the same packet may pass the same point 1708 multiple times without duplication) 1709 18. Multihoming via SHIM6 1711 13 Security Considerations 1713 There are no security considerations. 1715 14 IANA Considerations 1717 There are no IANA considerations. 1719 15 References 1721 15.1 Normative References 1723 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1724 1981. 1726 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 1727 793, September 1981. 1729 [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 1730 for High Performance", RFC 1323, May 1992. 1732 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1733 Requirement Levels", BCP 14, RFC 2119, March 1997. 1735 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1736 (IPv6) Specification", RFC 2460, December 1998. 1738 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 1739 Delay Metric for IPPM", RFC 2679, September 1999. 1741 [RFC2681] Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip 1742 Delay Metric for IPPM", RFC 2681, September 1999. 1744 [RFC2780] Bradner, S. and V. Paxson, "IANA Allocation Guidelines For 1745 Values In the Internet Protocol and Related Headers", BCP 37, RFC 1746 2780, March 2000. 1748 [RFC3971] Arkko, J., Ed., Kempf, J., Zill, B., and P. Nikander, 1749 "SEcure Neighbor Discovery (SEND)", RFC 3971, March 2005. 1751 [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, 1752 "Network Time Protocol Version 4: Protocol and Algorithms 1753 Specification", RFC 5905, June 2010. 1755 15.2 Informative References 1757 [NTPPDM] Ackermann, M., "draft-ackermann-ntp-pdm-ntp-usage-00", 1758 Internet Draft, January 2014. 1760 [ELKPDM] Elkins, N., "draft-elkins-6man-ipv6-pdm-dest-option-05", 1761 Internet Draft, January 2014. 1763 [IEEE1588] IEEE 1588-2002 standard, "Standard for a Precision Clock 1764 Synchronization Protocol for Networked Measurement and Control 1765 Systems" 1767 16 Acknowledgments 1769 The authors would like to thank Al Morton, Brian Trammel, David 1770 Boyes, and Rick Troth for their comments and assistance. 1772 Authors' Addresses 1774 Nalini Elkins 1775 Inside Products, Inc. 1776 36A Upper Circle 1777 Carmel Valley, CA 93924 1778 United States 1779 Phone: +1 831 659 8360 1780 Email: nalini.elkins@insidethestack.com 1781 http://www.insidethestack.com 1783 William Jouris 1784 Inside Products, Inc. 1785 36A Upper Circle 1786 Carmel Valley, CA 93924 1787 United States 1788 Phone: +1 925 855 9512 1789 Email: bill.jouris@insidethestack.com 1790 http://www.insidethestack.com 1792 Michael S. Ackermann 1793 Blue Cross Blue Shield of Michigan 1794 P.O. Box 2888 1795 Detroit, Michigan 48231 1796 United States 1797 Phone: +1 310 460 4080 1798 Email: mackermann@bcbsmi.com 1799 http://www.bcbsmi.com 1801 Keven Haining 1802 US Bank 1803 16900 W Capitol Drive 1804 Brookfield, WI 53005 1805 United States 1806 Phone: +1 262 790 3551 1807 Email: keven.haining@usbank.com 1808 http://www.usbank.com