idnits 2.17.1 draft-ietf-ippm-6man-pdm-option-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2601 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT N. Elkins 3 Inside Products 4 R. Hamilton 5 Chemical Abstracts Service 6 M. Ackermann 7 Intended Status: Proposed Standard BCBS Michigan 8 Expires: September 14, 2017 March 13, 2017 10 IPv6 Performance and Diagnostic Metrics (PDM) Destination Option 11 draft-ietf-ippm-6man-pdm-option-09 13 Abstract 15 To assess performance problems, measurements based on optional 16 sequence numbers and timing may be embedded in each packet. Such 17 measurements may be interpreted in real-time or after the fact. An 18 implementation of the existing IPv6 Destination Options extension 19 header, the Performance and Diagnostic Metrics (PDM) Destination 20 Options extension header as well as the field limits, calculations, 21 and usage of the PDM in measurement are included in this document. 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as 31 Internet-Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/1id-abstracts.html 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 Copyright and License Notice 46 Copyright (c) 2017 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 50 3: This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.2 End User Quality of Service (QoS) . . . . . . . . . . . . . 4 65 1.3 Need for a Packet Sequence Number (PSN) . . . . . . . . . . 5 66 1.4 Rationale for defined solution . . . . . . . . . . . . . . . 5 67 1.5 PDM Works in Collaboration with Other Headers . . . . . . . 6 68 1.6 IPv6 Transition Technologies . . . . . . . . . . . . . . . . 7 69 2 Measurement Information Derived from PDM . . . . . . . . . . . . 7 70 2.1 Round-Trip Delay . . . . . . . . . . . . . . . . . . . . . . 7 71 2.2 Server Delay . . . . . . . . . . . . . . . . . . . . . . . . 8 72 3 Performance and Diagnostic Metrics Destination Option Layout . . 8 73 3.1 Destination Options Header . . . . . . . . . . . . . . . . . 8 74 3.2 Performance and Diagnostic Metrics Destination Option . . . 8 75 3.2.1 PDM Layout . . . . . . . . . . . . . . . . . . . . . . . 8 76 3.2.2 Base Unit for Time Measurement . . . . . . . . . . . . . 10 77 3.2.3 Considerations of this time-differential 78 representation . . . . . . . . . . . . . . . . . . . . . 11 79 3.2.3.1 Limitations with this encoding method . . . . . . . 11 80 3.2.3.2 Loss of precision induced by timer value 81 truncation . . . . . . . . . . . . . . . . . . . . . 12 82 3.3 Header Placement . . . . . . . . . . . . . . . . . . . . . . 13 83 3.4 Header Placement Using IPSec ESP Mode . . . . . . . . . . . 13 84 3.4.1 Using ESP Transport Mode . . . . . . . . . . . . . . . . 13 85 3.4.2 Using ESP Tunnel Mode . . . . . . . . . . . . . . . . . 14 86 3.5 Implementation Considerations . . . . . . . . . . . . . . . 15 87 3.5.1 PDM Activation . . . . . . . . . . . . . . . . . . . . . 15 88 3.5.2 PDM Timestamps . . . . . . . . . . . . . . . . . . . . . 15 89 3.6 Dynamic Configuration Options . . . . . . . . . . . . . . . 16 90 3.6 5-tuple Aging . . . . . . . . . . . . . . . . . . . . . . . 16 91 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 16 92 4.1. SYN Flood and Resource Consumption Attacks . . . . . . . . 16 93 4.2 Pervasive monitoring . . . . . . . . . . . . . . . . . . . 17 94 4.3 PDM as a Covert Channel . . . . . . . . . . . . . . . . . . 17 95 4.4 Timing Attacks . . . . . . . . . . . . . . . . . . . . . . . 18 96 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 18 97 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 98 6.1 Normative References . . . . . . . . . . . . . . . . . . . . 19 99 6.2 Informative References . . . . . . . . . . . . . . . . . . . 19 100 Appendix A : Timing Time Differential Calculations . . . . . . . . 20 101 Appendix B: Sample Packet Flows . . . . . . . . . . . . . . . . . 21 102 B.1 PDM Flow - Simple Client Server . . . . . . . . . . . . . . 21 103 B.1.1 Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . 21 104 B.1.2 Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . 22 105 B.1.3 Step 3 . . . . . . . . . . . . . . . . . . . . . . . . . 23 106 B.1.4 Step 4 . . . . . . . . . . . . . . . . . . . . . . . . . 24 107 B.1.5 Step 5 . . . . . . . . . . . . . . . . . . . . . . . . . 25 109 B.2 Other Flows . . . . . . . . . . . . . . . . . . . . . . . . 25 110 B.2.1 PDM Flow - One Way Traffic . . . . . . . . . . . . . . . 25 111 B.2.2 PDM Flow - Multiple Send Traffic . . . . . . . . . . . . 26 112 B.2.3 PDM Flow - Multiple Send with Errors . . . . . . . . . . 27 113 Appendix C: Potential Overhead Considerations . . . . . . . . . . 29 114 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 30 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 117 1 Background 119 To assess performance problems, measurements based on optional 120 sequence numbers and timing may be embedded in each packet. Such 121 measurements may be interpreted in real-time or after the fact. 123 As defined in RFC2460 [RFC2460], destination options are carried by 124 the IPv6 Destination Options extension header. Destination options 125 include optional information that need be examined only by the IPv6 126 node given as the destination address in the IPv6 header, not by 127 routers or other "middle boxes". This document specifies a new 128 destination option, the Performance and Diagnostic Metrics (PDM) 129 destination option. This document specifies the layout, field 130 limits, calculations, and usage of the PDM in measurement. 132 1.1 Terminology 134 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 135 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 136 document are to be interpreted as described in RFC 2119 [RFC2119]. 138 1.2 End User Quality of Service (QoS) 140 The timing values in the PDM embedded in the packet will be used to 141 estimate QoS as experienced by an end user device. 143 For many applications, the key user performance indicator is response 144 time. When the end user is an individual, he is generally 145 indifferent to what is happening along the network; what he really 146 cares about is how long it takes to get a response back. But this is 147 not just a matter of individuals' personal convenience. In many 148 cases, rapid response is critical to the business being conducted. 150 When the end user is a device (e.g. with the Internet of Things), 151 what matters is the speed with which requested data can be 152 transferred -- specifically, whether the requested data can be 153 transferred in time to accomplish the desired actions. This can be 154 important when the relevant external conditions are subject to rapid 155 change. 157 Low, reliable and acceptable response times are not just "nice to 158 have". On many networks, the impact can be financial hardship or can 159 endanger human life. In some cities, the emergency police contact 160 system operates over IP; law enforcement, at all levels, use IP 161 networks; transactions on our stock exchanges are settled using IP 162 networks. The critical nature of such activities to our daily lives 163 and financial well-being demand a simple solution to support response 164 time measurements. 166 1.3 Need for a Packet Sequence Number (PSN) 168 While performing network diagnostics of an end-to-end connection, it 169 often becomes necessary to isolate the factors along the network path 170 responsible for problems. Diagnostic data may be collected at 171 multiple places along the path (if possible), or at the source and 172 destination. Then, in post-collection processing, the diagnostic 173 data corresponding to each packet at different observation points 174 must be matched for proper measurements. A sequence number in each 175 packet provides sufficient basis for the matching process. If need 176 be, the timing fields may be used along with the sequence number to 177 ensure uniqueness. 179 This method of data collection along the path is of special use to 180 determine where packet loss or packet corruption is happening. 182 The packet sequence number needs to be unique in the context of the 183 session (5-tuple). See section 2 for a definition of 5-tuple. 185 1.4 Rationale for defined solution 187 The current IPv6 specification does not provide timing nor a similar 188 field in the IPv6 main header or in any extension header. So, we 189 define the IPv6 Performance and Diagnostic Metrics destination option 190 (PDM). 192 Advantages include: 194 1. Real measure of actual transactions. 196 2. Independence from transport layer protocols. 198 3. Ability to span organizational boundaries with consistent 199 instrumentation. 201 4. No time synchronization needed between session partners 203 5. Ability to handle all transport protocols (TCP, UDP, SCTP, etc) in 204 a uniform way 205 The PDM provides the ability to determine quickly if the (latency) 206 problem is in the network or in the server (application). That is, 207 it is a fast way to do triage. 209 One of the important functions of PDM is to allow you to do quickly 210 dispatch the right set of diagnosticians. Within network or server 211 latency, there may be many components. The job of the diagnostician 212 is to rule each one out until the culprit is found. 214 How PDM fits into this diagnostic picture is that PDM will quickly 215 tell you how to escalate. PDM will point to either the network area 216 or the server area. Within the server latency, PDM does not tell 217 you if the bottleneck is in the IP stack or the application or buffer 218 allocation. Within the network latency, PDM does not tell you which 219 of the network segments or middle boxes is at fault. 221 What PDM will tell you is whether the problem is in the network or 222 the server. In our experience, there is often a different group which 223 is involved to troubleshoot the problem depending on the nature of 224 the problem. That is, the problem may be escalated to the 225 application developers or the team that deals with the routers and 226 infrastructure. Both the network group and the application group 227 have quite a few specialized tools at their disposal to further 228 investigate their own areas. What is missing is the first step, 229 which PDM provides. 231 In our experience, valuable time is often lost at this first stage of 232 triage. PDM is expected to reduce this time substantially. 234 1.5 PDM Works in Collaboration with Other Headers 236 The purpose of the PDM is not to supplant all the variables present 237 in all other headers but to provide data which is not available or 238 very difficult to get. The way PDM would be used is by a technician 239 (or tool) looking at a packet capture. Within the packet capture, 240 they would have available to them the layer 2 header, IP header (v6 241 or v4), TCP, UCP, ICMP, SCTP or other headers. All information 242 would be looked at together to make sense of the packet flow. The 243 technician or processing tool could analyze, report or ignore the 244 data from PDM, as necessary. 246 For an example of how PDM can help with TCP retransmit problems, 247 please look at section 8. 249 1.6 IPv6 Transition Technologies 251 In the path to full implementation of IPv6, transition technologies 252 such as translation or tunneling may be employed. The PDM header is 253 not expected to work in such scenarios. It is likely that an IPv6 254 packet containing PDM will be dropped if using IPv6 transition 255 technologies. 257 2 Measurement Information Derived from PDM 259 Each packet contains information about the sender and receiver. In IP 260 protocol, the identifying information is called a "5-tuple". 262 The 5-tuple consists of: 264 SADDR : IP address of the sender 265 SPORT : Port for sender 266 DADDR : IP address of the destination 267 DPORT : Port for destination 268 PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP, etc.) 270 The PDM contains the following base fields: 272 PSNTP : Packet Sequence Number This Packet 273 PSNLR : Packet Sequence Number Last Received 274 DELTATLR : Delta Time Last Received 275 DELTATLS : Delta Time Last Sent 277 Other fields for storing time scaling factors are also in the PDM and 278 will be described in section 3. 280 This information, combined with the 5-tuple, allows the measurement 281 of the following metrics: 283 1. Round-trip delay 284 2. Server delay 286 2.1 Round-Trip Delay 288 Round-trip *Network* delay is the delay for packet transfer from a 289 source host to a destination host and then back to the source host. 290 This measurement has been defined, and the advantages and 291 disadvantages discussed in "A Round-trip Delay Metric for IPPM" 292 [RFC2681]. 294 2.2 Server Delay 296 Server delay is the interval between when a packet is received by a 297 device and the first corresponding packet is sent back in response. 298 This may be "Server Processing Time". It may also be a delay caused 299 by acknowledgements. Server processing time includes the time taken 300 by the combination of the stack and application to return the 301 response. The stack delay may be related to network performance. If 302 this aggregate time is seen as a problem, and there is a need to make 303 a clear distinction between application processing time and stack 304 delay, including that caused by the network, then more client based 305 measurements are needed. 307 3 Performance and Diagnostic Metrics Destination Option Layout 309 3.1 Destination Options Header 311 The IPv6 Destination Options Header is used to carry optional 312 information that needs to be examined only by a packet's destination 313 node(s). The Destination Options Header is identified by a Next 314 Header value of 60 in the immediately preceding header and is defined 315 in RFC2460 [RFC2460]. The IPv6 Performance and Diagnostic Metrics 316 Destination Option (PDM) is an implementation of the Destination 317 Options Header. The PDM does not require time synchronization. 319 3.2 Performance and Diagnostic Metrics Destination Option 321 3.2.1 PDM Layout 323 The IPv6 Performance and Diagnostic Metrics Destination Option (PDM) 324 contains the following fields: 326 SCALEDTLR: Scale for Delta Time Last Received 327 SCALEDTLS: Scale for Delta Time Last Sent 328 PSNTP : Packet Sequence Number This Packet 329 PSNLR : Packet Sequence Number Last Received 330 DELTATLR : Delta Time Last Received 331 DELTATLS : Delta Time Last Sent 333 The PDM destination option is encoded in type-length-value (TLV) 334 format as follows: 336 0 1 2 3 337 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 338 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 339 | Option Type | Option Length | ScaleDTLR | ScaleDTLS | 340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 341 | PSN This Packet | PSN Last Received | 342 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 | Delta Time Last Received | Delta Time Last Sent | 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 346 Option Type 348 TBD = 0xXX (TBD) [To be assigned by IANA] [RFC2780] 350 Option Length 352 8-bit unsigned integer. Length of the option, in octets, excluding 353 the Option Type and Option Length fields. This field MUST be set to 354 16. 356 Scale Delta Time Last Received (SCALEDTLR) 358 8-bit unsigned integer. This is the scaling value for the Delta Time 359 Last Received (DELTATLR) field. The possible values are from 0-255. 360 See Section 4 for further discussion on Timing Considerations and 361 formatting of the scaling values. 363 Scale Delta Time Last Sent (SCALEDTLS) 365 8-bit signed integer. This is the scaling value for the Delta Time 366 Last Sent (DELTATLS) field. The possible values are from 0 to 255. 368 Packet Sequence Number This Packet (PSNTP) 370 16-bit unsigned integer. This field will wrap. It is intended for 371 use while analyzing packet traces. 373 Initialized at a random number and incremented monotonically for each 374 packet of the session flow of the 5-tuple. The random number 375 initialization is intended to make it harder to spoof and insert such 376 packets. 378 Operating systems MUST implement a separate packet sequence number 379 counter per 5-tuple. 381 Packet Sequence Number Last Received (PSNLR) 383 16-bit unsigned integer. This is the PSNTP of the packet last 384 received on the 5-tuple. 386 Delta Time Last Received (DELTATLR) 388 A 16-bit unsigned integer field. The value is set according to the 389 scale in SCALEDTLR. 391 Delta Time Last Received = (Send time packet 2 - Receive time packet 392 1) 394 Delta Time Last Sent (DELTATLS) 396 A 16-bit unsigned integer field. The value is set according to the 397 scale in SCALEDTLS. 399 Delta Time Last Sent = (Receive time packet 2 - Send time packet 1) 401 Option Type 403 In keeping with RFC2460[RFC2460], the two high order bits of the 404 Option Type field are encoded to indicate specific processing of the 405 option; for the PDM destination option, these two bits MUST be set to 406 00. 408 The third high order bit of the Option Type specifies whether or not 409 the Option Data of that option can change en-route to the packet's 410 final destination. 412 In the PDM, the value of the third high order bit MUST be 0. 414 3.2.2 Base Unit for Time Measurement 416 A time differential is always a whole number in a CPU; it is the unit 417 specification -- hours, seconds, nanoseconds -- that determine what 418 the numeric value means. For PDM, we establish the base time unit as 419 1 attosecond (asec). This allows for a common unit and scaling of the 420 time differential among all IP stacks and hardware implementations. 422 Note that we are trying to provide the ability to measure both time 423 differentials that are extremely small, and time differentials in a 424 DTN-type environment where the delays may be very great. To store a 425 time differential in just 16 bits that must range in this way will 426 require some scaling of the time differential value. 428 One issue is the conversion from the native time base in the CPU 429 hardware of whatever device is in use to some number of attoseconds. 430 It might seem this will be an astronomical number, but the conversion 431 is straightforward. It involves multiplication by an appropriate 432 power of 10 to change the value into a number of attoseconds. Then, 433 to scale the value so that it fits into DELTATLR or DELTATLS, the 434 value is shifted by of a number of bits, retaining the 16 high-order 435 or most significant bits. The number of bits shifted becomes the 436 scaling factor, stored as SCALEDTLR or SCALEDTLS, respectively. For a 437 full description of this process, including examples, please see 438 Appendix A. 440 3.2.3 Considerations of this time-differential representation 442 There are a few considerations to be taken into account with this 443 representation of a time differential. The first is whether there are 444 any limitations on the maximum or minimum time differential that can 445 be expressed using method of a delta value and a scaling factor. The 446 second is the amount of imprecision introduced by this method. 448 3.2.3.1 Limitations with this encoding method 450 The DELTATLS and DELTATLR fields store only the 16 most-significant 451 bits of the time differential value. Thus the range, excluding the 452 scaling factor, is from 0 to 65535, or a maximum of 2**16-1. This 453 method is further described in [TRAM-TCPM]. 455 The actual magnitude of the time differential is determined by the 456 scaling factor. SCALEDTLR and SCALEDTLS are 8-bit unsigned integers, 457 so the scaling factor ranges from 0 to 255. The smallest number that 458 can be represented would have a value of 1 in the delta field and a 459 value of 0 in the associated scale field. This is the representation 460 for 1 attosecond. Clearly this allows PDM to measure extremely small 461 time differentials. 463 On the other end of the scale, the maximum delta value is 65535, or 464 FFFF in hexadecimal. If the maximum scale value of 255 is used, the 465 time differential represented is 65535*2**255, which is over 3*10**55 466 years, essentially, forever. So there appears to be no real 467 limitation to the time differential that can be represented. 469 3.2.3.2 Loss of precision induced by timer value truncation 471 As PDM specifies the DELTATLR and DELTATLS values as 16-bit unsigned 472 integers, any time the precision is greater than those 16 bits, there 473 will be truncation of the trailing bits, with an accompanying loss of 474 precision in the value. 476 Any time differential value smaller than 65536 asec can be stored 477 exactly in DELTATLR or DELTATLS, because the representation of this 478 value requires at most 16 bits. 480 Since the time differential values in PDM are measured in 481 attoseconds, the range of values that would be truncated to the same 482 encoded value is 2**(Scale)-1 asec. 484 For example, the smallest time differential that would be truncated 485 to fit into a delta field is 487 1 0000 0000 0000 0000 = 65536 asec 489 This value would be encoded as a delta value of 8000 (hexadecimal) 490 with a scaling factor of 1. The value 492 1 0000 0000 0000 0001 = 65537 asec 494 would also be encoded as a delta value of 8000 with a scaling factor 495 of 1. This actually is the largest value that would be truncated to 496 that same encoded value. When the scale value is 1, the value range 497 is calculated as 2**1 - 1, or 1 asec, which you can see is the 498 difference between these minimum and maximum values. 500 The scaling factor is defined as the number of low-order bits 501 truncated to reduce the size of the resulting value so it fits into a 502 16-bit delta field. If, for example, you had to truncate 12 bits, the 503 loss of precision would depend on the bits you truncated. The range 504 of these values would be 506 0000 0000 0000 = 0 asec 507 to 508 1111 1111 1111 = 4095 asec 510 So the minimum loss of precision would be 0 asec, where the delta 511 value exactly represents the time differential, and the maximum loss 512 of precision would be 4095 asec. As stated above, the scaling factor 513 of 12 means the maximum loss of precision is 2**12-1 asec, or 4095 514 asec. 516 Compare this loss of precision to the actual time differential. The 517 range of actual time differential values that would incur this loss 518 of precision is from 520 1000 0000 0000 0000 0000 0000 0000 = 2**27 asec or 134217728 asec 521 to 522 1111 1111 1111 1111 1111 1111 1111 = 2**28-1 asec or 268435455 asec 524 Granted, these are small values, but the point is, any value between 525 these two values will have a maximum loss of precision of 4095 asec, 526 or about 0.00305% for the first value, as encoded, and at most 527 0.001526% for the second. These maximum-loss percentages are 528 consistent for all scaling values. 530 3.3 Header Placement 532 The PDM Destination Option is placed as defined in RFC2460 [RFC2460]. 533 There may be a choice of where to place the Destination Options 534 header. If using ESP mode, please see section 3.4 of this document 535 for placement of the PDM Destination Options header. 537 For each IPv6 packet header, the PDM MUST NOT appear more than once. 538 However, an encapsulated packet MAY contain a separate PDM associated 539 with each encapsulated IPv6 header. 541 3.4 Header Placement Using IPSec ESP Mode 543 IPSec Encapsulating Security Payload (ESP) is defined in [RFC4303] 544 and is widely used. Section 3.1.1 of [RFC4303] discusses placement 545 of Destination Options Headers. 547 The placement of PDM is different depending on if ESP is used in 548 tunnel or transport mode. 550 3.4.1 Using ESP Transport Mode 552 Below is the diagram from [RFC4303] discussing placement of headers. 553 Note that Destination Options MAY be placed before or after ESP or 554 both. If using PDM in ESP transport mode, PDM MUST be placed after 555 the ESP header so as not to leak information. 557 BEFORE APPLYING ESP 558 --------------------------------------- 559 IPv6 | | ext hdrs | | | 560 | orig IP hdr |if present| TCP | Data | 561 --------------------------------------- 563 AFTER APPLYING ESP 564 --------------------------------------------------------- 565 IPv6 | orig |hop-by-hop,dest*,| |dest| | | ESP | ESP| 566 |IP hdr|routing,fragment.|ESP|opt*|TCP|Data|Trailer| ICV| 567 --------------------------------------------------------- 568 |<--- encryption ---->| 569 |<------ integrity ------>| 571 * = if present, could be before ESP, after ESP, or both 573 3.4.2 Using ESP Tunnel Mode 575 Below is the diagram from [RFC4303] discussing placement of headers. 577 Note that Destination Options MAY be placed before or after ESP or 578 both in both the outer set of IP headers and the inner set of IP 579 headers. 581 In ESP tunnel mode, PDM MAY be placed before or after the ESP header 582 or both. 584 BEFORE APPLYING ESP 586 --------------------------------------- 587 IPv6 | | ext hdrs | | | 588 | orig IP hdr |if present| TCP | Data | 589 --------------------------------------- 591 AFTER APPLYING ESP 593 ------------------------------------------------------------ 594 IPv6 | new* |new ext | | orig*|orig ext | | | ESP | ESP| 595 |IP hdr| hdrs* |ESP|IP hdr| hdrs * |TCP|Data|Trailer| ICV| 596 ------------------------------------------------------------ 597 |<--------- encryption ---------->| 598 |<------------ integrity ------------>| 600 * = if present, construction of outer IP hdr/extensions and 601 modification of inner IP hdr/extensions is discussed in 602 the Security Architecture document. 604 As a completely new IP packet will be made, it means that PDM 605 information for that packet does not contain any information from the 606 inner packet, i.e. the PDM information will NOT be based on the 607 transport layer (TCP, UDP, etc) ports etc in the inner header, but 608 will be specific to the ESP flow. 610 If PDM information for the inner packet is desired, the original host 611 sending the inner packet needs to put PDM header in the tunneled 612 packet, and then the PDM information will be specific for that 613 stream. 615 3.5 Implementation Considerations 617 3.5.1 PDM Activation 619 The PDM destination options extension header MUST be explicitly 620 turned on by each stack on a host node by administrative action. The 621 default value of PDM is off. 623 PDM MUST NOT be turned on merely if a packet is received with a PDM 624 header. The received packet could be spoofed by another device. 626 3.5.2 PDM Timestamps 628 The PDM timestamps are intended to isolate wire time from server or 629 host time, but may necessarily attribute some host processing time to 630 network latency. 632 RFC2330 [RFC2330] "Framework for IP Performance Metrics" describes 633 two notions of wire time in section 10.2. These notions are only 634 defined in terms of an Internet host H observing an Internet link L 635 at a particular location: 637 + For a given IP packet P, the 'wire arrival time' of P at H on L 638 is the first time T at which any bit of P has appeared at H's 639 observational position on L. 641 + For a given IP packet P, the 'wire exit time' of P at H on L is 642 the first time T at which all the bits of P have appeared at H's 643 observational position on L. 645 This specification does not define the exact H's observing position 646 on L. That is left for the deployment setups to define. However, the 647 position where PDM timestamps are taken SHOULD be as close to the 648 physical network interface as possible. Not all implementations will 649 be able to achieve the ideal level of measurement. 651 3.6 Dynamic Configuration Options 653 If implemented, each operating system MUST have a default 654 configuration parameter, e.g. diag_header_sys_default_value=yes/no. 655 The operating system MAY also have a dynamic configuration option to 656 change the configuration setting as needed. 658 If the PDM destination options extension header is used, then it MAY 659 be turned on for all packets flowing through the host, applied to an 660 upper-layer protocol (TCP, UDP, SCTP, etc), a local port, or IP 661 address only. These are at the discretion of the implementation. 663 3.6 5-tuple Aging 665 Within the operating system, metrics must be kept on a 5-tuple basis. 667 The question comes of when to stop keeping data or restarting the 668 numbering for a 5-tuple. For example, in the case of TCP, at some 669 point, the connection will terminate. Keeping data in control blocks 670 forever, will have unfortunate consequences for the operating system. 672 So, the recommendation is to use a known aging parameter such as Max 673 Segment Lifetime (MSL) as defined in Transmission Control Protocol 674 [RFC0793] to reuse or drop the control block. The choice of aging 675 parameter is left up to the implementation. 677 4 Security Considerations 679 PDM may introduce some new security weaknesses. 681 4.1. SYN Flood and Resource Consumption Attacks 683 PDM needs to calculate the deltas for time and keep track of the 684 sequence numbers. This means that control blocks must be kept at the 685 end hosts per 5-tuple. Any time a control block is kept, an 686 attacker can try to mis-use the control blocks such that there is a 687 compromise of the end host. 689 PDM is used only at the end hosts and the control blocks are only 690 kept at the end host and not at routers or middle boxes. Remember, 691 PDM is an implementation of the Destination Option extension header. 693 A "SYN flood" type of attack succeeds because a TCP SYN packet is 694 small but it causes the end host to start creating a place holder for 695 the session such that quite a bit of control block and other storage 696 is used. This is an asynchronous type of attack in that a small 697 amount of work by the attacker creates a large amount of work by the 698 resource attacked. 700 For PDM, the amount of data to be kept is quite small. That is, the 701 control block is quite lightweight. Concerns about SYN Flood and 702 other type of resource consumption attacks (memory, processing power, 703 etc) can be alleviated by having a limit on the number of control 704 block entries. 706 We recommend that implementation of PDM SHOULD have a limit on the 707 number of control block entries. 709 4.2 Pervasive monitoring 711 Since PDM passes in the clear, a concern arises as to whether the 712 data can be used to fingerprint the system or somehow obtain 713 information about the contents of the payload. 715 Let us discuss fingerprinting of the end host first. It is possible 716 that seeing the pattern of deltas or the absolute values could give 717 some information as to the speed of the end host - that is, if it is 718 a very fast system or an older, slow device. This may be useful to 719 the attacker. However, if the attacker has access to PDM, the 720 attacker also has access to the entire packet and could make such a 721 deduction based merely on the time frames elapsed between packets 722 WITHOUT PDM. 724 As far as deducing the content of the payload, it appears to us that 725 PDM is quite unhelpful in this regard. 727 4.3 PDM as a Covert Channel 729 PDM provides a set of fields in the packet which could be used to 730 leak data. But, there is no real reason to suspect that PDM would 731 be chosen rather than another part of the payload or another 732 Extension Header. 734 A firewall or another device could sanity check the fields within the 735 PDM but randomly assigned sequence numbers and delta times might be 736 expected to vary widely. The biggest problem though is how an 737 attacker would get access to PDM in the first place to leak data. 738 The attacker would have to either compromise the end host or have Man 739 in the Middle (MitM). It is possible that either one could change 740 the fields. But, then the other end host would get sequence numbers 741 and deltas that don't make any sense. Presumably, one is using PDM 742 and doing packet tracing for diagnostic purposes, so the changes 743 would be obvious. It is conceivable that someone could compromise 744 an end host and make it start sending packets with PDM without the 745 knowledge of the host. But, again, the bigger problem is the 746 compromise of the end host. Once that is done, the attacker 747 probably has better ways to leak data. 749 Having said that, an implementation SHOULD stop using PDM if it gets 750 some number of "nonsensical" sequence numbers. 752 4.4 Timing Attacks 754 The fact that PDM can help in the separation of node processing time 755 from network latency brings value to performance monitoring. Yet, it 756 is this very characteristic of PDM which may be misused to make 757 certain new type of timing attacks against protocols and 758 implementations possible. 760 Depending on the nature of the cryptographic protocol used, it may be 761 possible to leak the long term credentials of the device. For 762 example, if an attacker is able to create an attack which causes the 763 enterprise to turn on PDM to diagnose the attack, then the attacker 764 might use PDM during that debugging time to launch a timing attack 765 against the long term keying material used by the cryptographic 766 protocol. 768 An implementation may want to be sure that PDM is enabled only for 769 certain ip addresses, or only for some ports. Additionally, we 770 recommend that the implementation SHOULD require an explicit restart 771 of monitoring after a certain timeperiod (for example for 1 hour), to 772 make sure that PDM is not accidently left on after debugging has been 773 done etc. 775 Even so, if using PDM, we introduce the concept of user "Consent to 776 be Measured" as a pre-requisite for using PDM. Consent is common in 777 enterprises and with some subscription services. So, if with PDM, we 778 recommend that the user SHOULD consent to its use. 780 5 IANA Considerations 782 This draft requests an Option Type assignment in the Destination 783 Options and Hop-by-Hop Options sub-registry of Internet Protocol 784 Version 6 (IPv6) Parameters [ref to RFCs and URL below]. 786 http://www.iana.org/assignments/ipv6-parameters/ipv6- 787 parameters.xhtml#ipv6-parameters-2 788 Hex Value Binary Value Description Reference 789 act chg rest 790 ------------------------------------------------------------------- 791 TBD TBD Performance and [This draft] 792 Diagnostic Metrics 793 (PDM) 795 6 References 797 6.1 Normative References 799 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 800 793, September 1981. 802 [RFC1122] Braden, R., "Requirements for Internet Hosts -- 803 Communication Layers", RFC 1122, October 1989. 805 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 806 Requirement Levels", BCP 14, RFC 2119, March 1997. 808 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 809 (IPv6) Specification", RFC 2460, December 1998. 811 [RFC2681] Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip 812 Delay Metric for IPPM", RFC 2681, September 1999. 814 [RFC2780] Bradner, S. and V. Paxson, "IANA Allocation Guidelines 815 For Values In the Internet Protocol and Related Headers", BCP 37, RFC 816 2780, March 2000. 818 [RFC4303] Kent, S, "IP Encapsulating Security Payload (ESP)", RFC 819 4303, December 2005. 821 6.2 Informative References 823 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, 824 "Framework for IP Performance Metrics", RFC 2330, May 1998. 826 [TRAM-TCPM] Trammel, B., "Encoding of Time Intervals for the TCP 827 Timestamp Option-01", Internet Draft, July 2013. [Work in Progress] 829 Appendix A : Timing Time Differential Calculations 831 The time counter in a CPU is a binary whole number, representing a 832 number of milliseconds (msec), microseconds (usec) or even 833 picoseconds (psec). Representing one of these values as attoseconds 834 (asec) means multiplying by 10 raised to some exponent. Refer to this 835 table of equalities: 837 Base value = # of sec = # of asec 1000s of asec 838 --------------- ------------- ------------- ------------- 839 1 second 1 sec 10**18 asec 1000**6 asec 840 1 millisecond 10**-3 sec 10**15 asec 1000**5 asec 841 1 microsecond 10**-6 sec 10**12 asec 1000**4 asec 842 1 nanosecond 10**-9 sec 10**9 asec 1000**3 asec 843 1 picosecond 10**-12 sec 10**6 asec 1000**2 asec 844 1 femtosecond 10**-15 sec 10**3 asec 1000**1 asec 846 For example, if you have a time differential expressed in 847 microseconds, since each microsecond is 10**12 asec, you would 848 multiply your time value by 10**12 to obtain the number of 849 attoseconds. If you time differential is expressed in nanoseconds, 850 you would multiply by 10**9 to get the number of attoseconds. 852 The result is a binary value that will need to be shortened by a 853 number of bits so it will fit into the 16-bit PDM DELTA field. 855 The next step is to divide by 2 until the value is contained in just 856 16 significant bits. The exponent of the value in the last column of 857 of the table is useful here; the initial scaling factor is that 858 exponent multiplied by 10. This is the minimum number of low-order 859 bits to be shifted-out or discarded. It represents dividing the time 860 value by 1024 raised to that exponent. 862 The resulting value may still be too large to fit into 16 bits, but 863 can be normalized by shifting out more bits (dividing by 2) until the 864 value fits into the 16-bit DELTA field. The number of extra bits 865 shifted out is then added to the scaling factor. The scaling factor, 866 the total number of low-order bits dropped, is the SCALEDTL value. 868 For example: say an application has these start and finish timer 869 values (hexadecimal values, in microseconds): 871 Finish: 27C849234 usec (02:57:58.997556) 872 -Start: 27C83F696 usec (02:57:58.957718) 873 ========== ========= =============== 874 Difference 9B9E usec 00.039838 sec or 39838 usec 876 To convert this differential value to binary attoseconds, multiply 877 the number of microseconds by 10**12. Divide by 1024**4, or simply 878 discard 40 bits from the right. The result is 36232, or 8D88 in hex, 879 with a scaling factor or SCALEDTL value of 40. 881 For another example, presume the time differential is larger, say 882 32.311072 seconds, which is 32311072 usec. Each microsecond is 10**12 883 asec, so multiply by 10**12, giving the hexadecimal value 884 1C067FCCAE8120000. Using the initial scaling factor of 40, drop the 885 last 10 characters (40 bits) from that string, giving 1C067FC. This 886 will not fit into a DELTA field, as it is 25 bits long. Shifting the 887 value to the right another 9 bits results in a DELTA value of E033, 888 with a resulting scaling factor of 49. 890 When the time differential value is a small number, regardless of the 891 time unit, the exponent trick given above is not useful in 892 determining the proper scaling value. For example, if the time 893 differential is 3 seconds and you want to convert that directly, you 894 would follow this path: 896 3 seconds = 3*10**18 asec (decimal) 897 = 29A2241AF62C0000 asec (hexadecimal) 899 If you just truncate the last 60 bits, you end up with a delta value 900 of 2 and a scaling factor of 60, when what you really wanted was a 901 delta value with more significant digits. The most precision with 902 which you can store this value in 16 bits is A688, with a scaling 903 factor of 46. 905 Appendix B: Sample Packet Flows 907 B.1 PDM Flow - Simple Client Server 909 Following is a sample simple flow for the PDM with one packet sent 910 from Host A and one packet received by Host B. The PDM does not 911 require time synchronization between Host A and Host B. The 912 calculations to derive meaningful metrics for network diagnostics are 913 shown below each packet sent or received. 915 B.1.1 Step 1 917 Packet 1 is sent from Host A to Host B. The time for Host A is set 918 initially to 10:00AM. 920 The time and packet sequence number are saved by the sender 921 internally. The packet sequence number and delta times are sent in 922 the packet. 924 Packet 1 926 +----------+ +----------+ 927 | | | | 928 | Host | ----------> | Host | 929 | A | | B | 930 | | | | 931 +----------+ +----------+ 933 PDM Contents: 935 PSNTP : Packet Sequence Number This Packet: 25 936 PSNLR : Packet Sequence Number Last Received: - 937 DELTATLR : Delta Time Last Received: - 938 SCALEDTLR: Scale of Delta Time Last Received: 0 939 DELTATLS : Delta Time Last Sent: - 940 SCALEDTLS: Scale of Delta Time Last Sent: 0 942 Internally, within the sender, Host A, it must keep: 944 Packet Sequence Number of the last packet sent: 25 945 Time the last packet was sent: 10:00:00 947 Note, the initial PSNTP from Host A starts at a random number. In 948 this case, 25. The time in these examples is shown in seconds for 949 the sake of simplicity. 951 B.1.2 Step 2 953 Packet 1 is received at Host B. Its time is set to one hour later 954 than Host A. In this case, 11:00AM 956 Internally, within the receiver, Host B, it must note: 958 Packet Sequence Number of the last packet received: 25 959 Time the last packet was received : 11:00:03 961 Note, this timestamp is in Host B time. It has nothing whatsoever to 962 do with Host A time. The Packet Sequence Number of the last packet 963 received will become PSNLR which will be sent out in the packet sent 964 by Host B in the next step. The time last received will be used to 965 calculate the DELTALR value to be sent out in the packet sent by Host 966 B in the next step. 968 B.1.3 Step 3 970 Packet 2 is sent by Host B to Host A. Note, the initial packet 971 sequence number (PSNTP) from Host B starts at a random number. In 972 this case, 12. Before sending the packet, Host B does a calculation 973 of deltas. Since Host B knows when it is sending the packet, and it 974 knows when it received the previous packet, it can do the following 975 calculation: 977 Sending time : packet 2 - receive time : packet 1 979 We will call the result of this calculation: Delta Time Last Received 980 (DELTATLR) 982 Note, both sending time and receive time are saved internally in Host 983 B. They do not travel in the packet. Only the Delta is in the 984 packet. 986 Assume that within Host B is the following: 988 Packet Sequence Number of the last packet received: 25 989 Time the last packet was received: 11:00:03 990 Packet Sequence Number of this packet: 12 991 Time this packet is being sent: 11:00:07 993 We can now calculate a delta value to be sent out in the packet. 994 DELTATLR becomes: 996 4 seconds = 11:00:07 - 11:00:03 = 3782DACE9D900000 asec 998 This is the derived metric: Server Delay. The time and scaling 999 factor must be converted; in this case, the time differential is 1000 DE0B, and the scaling factor is 2E, or 46 in decimal. Then, these 1001 values, along with the packet sequence numbers will be sent to Host A 1002 as follows: 1004 Packet 2 1006 +----------+ +----------+ 1007 | | | | 1008 | Host | <---------- | Host | 1009 | A | | B | 1010 | | | | 1011 +----------+ +----------+ 1013 PDM Contents: 1015 PSNTP : Packet Sequence Number This Packet: 12 1016 PSNLR : Packet Sequence Number Last Received: 25 1017 DELTATLR : Delta Time Last Received: DE0B (4 seconds) 1018 SCALEDTLR: Scale of Delta Time Last Received: 2E (46 decimal) 1019 DELTATLS : Delta Time Last Sent: - 1020 SCALEDTLS: Scale of Delta Time Last Sent: 0 1022 The metric left to be calculated is the Round-Trip Delay. This will 1023 be calculated by Host A when it receives Packet 2. 1025 B.1.4 Step 4 1027 Packet 2 is received at Host A. Remember, its time is set to one 1028 hour earlier than Host B. Internally, it must note: 1030 Packet Sequence Number of the last packet received: 12 1031 Time the last packet was received : 10:00:12 1033 Note, this timestamp is in Host A time. It has nothing whatsoever to 1034 do with Host B time. 1036 So, now, Host A can calculate total end-to-end time. That is: 1038 End-to-End Time = Time Last Received - Time Last Sent 1040 For example, packet 25 was sent by Host A at 10:00:00. Packet 12 was 1041 received by Host A at 10:00:12 so: 1043 End-to-End time = 10:00:12 - 10:00:00 or 12 (Server and Network RT 1044 delay combined). This time may also be called total Overall Round- 1045 Trip Time (RTT) which includes Network RTT and Host Response Time. 1047 This derived metric we will call Delta Time Last Sent (DELTATLS) 1049 We can now also calculate round trip delay. The formula is: 1051 Round trip delay = (Delta Time Last Sent - Delta Time Last Received) 1053 Or: 1055 Round trip delay = 12 - 4 or 8 1057 Now, the only problem is that at this point all metrics are in Host A 1058 only and not exposed in a packet. To do that, we need a third packet. 1060 Note: this simple example assumes one send and one receive. That 1061 is done only for purposes of explaining the function of the PDM. In 1062 cases where there are multiple packets returned, one would take the 1063 time in the last packet in the sequence. The calculations of such 1064 timings and intelligent processing is the function of post-processing 1065 of the data. 1067 B.1.5 Step 5 1069 Packet 3 is sent from Host A to Host B. 1071 +----------+ +----------+ 1072 | | | | 1073 | Host | ----------> | Host | 1074 | A | | B | 1075 | | | | 1076 +----------+ +----------+ 1078 PDM Contents: 1080 PSNTP : Packet Sequence Number This Packet: 26 1081 PSNLR : Packet Sequence Number Last Received: 12 1082 DELTATLR : Delta Time Last Received: 0 1083 SCALEDTLS: Scale of Delta Time Last Received 0 1084 DELTATLS : Delta Time Last Sent: A688 (scaled value) 1085 SCALEDTLR: Scale of Delta Time Last Received: 30 (48 decimal) 1087 To calculate Two-Way Delay, any packet capture device may look at 1088 these packets and do what is necessary. 1090 B.2 Other Flows 1092 What we have discussed so far is a simple flow with one packet sent 1093 and one returned. Let's look at how PDM may be useful in other 1094 types of flows. 1096 B.2.1 PDM Flow - One Way Traffic 1098 The flow on a particular session may not be a send-receive paradigm. 1099 Let us consider some other situations. In the case of a one-way 1100 flow, one might see the following: 1102 Note: The time is expressed in generic units for simplicity. That 1103 is, these values do not represent a number of attoseconds, 1104 microseconds or any other real units of time. 1106 Packet Sender PSN PSN Delta Time Delta Time 1107 This Packet Last Recvd Last Recvd Last Sent 1108 ===================================================================== 1109 1 Server 1 0 0 0 1110 2 Server 2 0 0 5 1111 3 Server 3 0 0 12 1112 4 Server 4 0 0 20 1114 What does this mean and how is it useful? 1116 In a one-way flow, only the Delta Time Last Sent will be seen as 1117 used. Recall, Delta Time Last Sent is the difference between the 1118 send of one packet from a device and the next. This is a measure of 1119 throughput for the sender - according to the sender's point of view. 1120 That is, it is a measure of how fast is the application itself (with 1121 stack time included) able to send packets. 1123 How might this be useful? If one is having a performance issue at 1124 the client and sees that packet 2, for example, is sent after 5 time 1125 units from the server but takes 10 times that long to arrive at the 1126 destination, then one may safely conclude that there are delays in 1127 the path other than at the server which may be causing the delivery 1128 issue of that packet. Such delays may include the network links, 1129 middle-boxes, etc. 1131 Now, true one-way traffic is quite rare. What people often mean by 1132 "one-way" traffic is an application such as FTP where a group of 1133 packets (for example, a TCP window size worth) is sent, then the 1134 sender waits for acknowledgment. This type of flow would actually 1135 fall into the "multiple-send" traffic model. 1137 B.2.2 PDM Flow - Multiple Send Traffic 1139 Assume that two packets are sent for each ACK from the server. For 1140 example, a TCP flow will do this, per RFC1122 [RFC1122] Section- 1141 4.2.3. 1143 Packet Sender PSN PSN Delta Time Delta Time 1144 This Packet Last Recvd Last Recvd Last Sent 1145 ===================================================================== 1146 1 Server 1 0 0 0 1147 2 Server 2 0 0 5 1148 3 Client 1 2 20 0 1149 4 Server 3 1 10 15 1151 How might this be used? 1152 Notice that in packet 3, the client has a value of Delta Time Last 1153 received of 20. Recall that Delta Time Last Received is the Send 1154 time of packet 3 - receive time of packet 2. So, what does one know 1155 now? In this case, Delta Time Last Received is the processing time 1156 for the Client to send the next packet. 1158 How to interpret this depends on what is actually being sent. 1159 Remember, PDM is not being used in isolation, but to supplement the 1160 fields found in other headers. Let's take some examples: 1162 1. Client is sending a standalone TCP ACK. One would find this by 1163 looking at the payload length in the IPv6 header and the TCP 1164 Acknowledgement field in the TCP header. So, in this case, the 1165 client is taking 20 units to send back the ACK. This may or may not 1166 be interesting. 1168 2. Client is sending data with the packet. Again, one would find 1169 this by looking at the payload length in the IPv6 header and the TCP 1170 Acknowledgement field in the TCP header. So, in this case, the 1171 client is taking 20 units to send back data. This may represent 1172 "User Think Time". Again, this may or may not be interesting, in 1173 isolation. But, if there is a performance problem receiving data at 1174 the server, then taken in conjunction with RTT or other packet timing 1175 information, this information may be quite interesting. 1177 Of course, one also needs to look at the PSN Last Received field to 1178 make sure of the interpretation of this data. That is, to make 1179 sure that the Delta Last Received corresponds to the packet of 1180 interest. 1182 The benefits of PDM are that we have such information available in a 1183 uniform manner for all applications and all protocols without 1184 extensive changes required to applications. 1186 B.2.3 PDM Flow - Multiple Send with Errors 1188 Let us now look at a case of how PDM may be able to help in a case of 1189 TCP retransmission and add to the information that is sent in the TCP 1190 header. 1192 Assume that three packets are sent with each send from the server. 1194 From the server, this is what is seen. 1196 Pkt Sender PSN PSN Delta Time Delta Time TCP Data 1197 This Pkt LastRecvd LastRecvd LastSent SEQ Bytes 1198 ===================================================================== 1199 1 Server 1 0 0 0 123 100 1200 2 Server 2 0 0 5 223 100 1201 3 Server 3 0 0 5 333 100 1203 The client, however, does not receive all the packets. From the 1204 client, this is what is seen for the packets sent from the server. 1206 Pkt Sender PSN PSN Delta Time Delta Time TCP Data 1207 This Pkt LastRecvd LastRecvd LastSent SEQ Bytes 1208 ===================================================================== 1209 1 Server 1 0 0 0 123 100 1210 2 Server 3 0 0 5 333 100 1212 Let's assume that the server now retransmits the packet. (Obviously, 1213 a duplicate acknowledgment sequence for fast retransmit or a 1214 retransmit timeout would occur. To illustrate the point, these 1215 packets are being left out.) 1217 So, then if a TCP retransmission is done, then from the client, this 1218 is what is seen for the packets sent from the server. 1220 Pkt Sender PSN PSN Delta Time Delta Time TCP Data 1221 This Pkt LastRecvd LastRecvd LastSent SEQ Bytes 1222 ===================================================================== 1223 1 Server 4 0 0 30 223 100 1225 The server has resent the old packet 2 with TCP sequence number of 1226 223. The retransmitted packet now has a PSN This Packet value of 4. 1228 The Delta Last Sent is 30 - the time between sending the packet with 1229 PSN of 3 and this current packet. 1231 Let's say that packet 4 is lost again. Then, after some amount of 1232 time (RTO) then the packet with TCP sequence number of 223 is resent. 1234 From the client, this is what is seen for the packets sent from the 1235 server. 1237 Pkt Sender PSN PSN Delta Time Delta Time TCP Data 1238 This Pkt LastRecvd LastRecvd LastSent SEQ Bytes 1239 ===================================================================== 1240 1 Server 5 0 0 60 223 100 1241 If now, this packet arrives at the destination, one has a very good 1242 idea that packets exist which are being sent from the server as 1243 retransmissions and not arriving at the client. This is because the 1244 PSN of the resent packet from the server is 5 rather than 4. If we 1245 had used TCP sequence number alone, we would never have seen this 1246 situation. The TCP sequence number in all situations is 223. 1248 This situation would be experienced by the user of the application 1249 (the human being actually sitting somewhere) as a "hangs" or long 1250 delay between packets. On large networks, to diagnose problems such 1251 as these where packets are lost somewhere on the network, one has to 1252 take multiple traces to find out exactly where. 1254 The first thing is to start with doing a trace at the client and the 1255 server. So, we can see if the server sent a particular packet and 1256 the client received it. If the client did not receive it, then we 1257 start tracking back to trace points at the router right after the 1258 server and the router right before the client. Did they get these 1259 packets which the server has sent? This is a time consuming 1260 activity. 1262 With PDM, we can speed up the diagnostic time because we may be able 1263 to use only the trace taken at the client to see what the server is 1264 sending. 1266 Appendix C: Potential Overhead Considerations 1268 One might wonder as to the potential overhead of PDM. First, PDM is 1269 entirely optional. That is, a site may choose to implement PDM or 1270 not as they wish. If they are happy with the costs of PDM vs. the 1271 benefits, then the choice should be theirs. 1273 Below is a table outlining the potential overhead in terms of 1274 additional time to deliver the response to the end user for various 1275 assumed RTTs. 1277 Bytes RTT Bytes Bytes New Overhead 1278 in Packet Per Millisec in PDM RTT 1279 ===================================================================== 1280 1000 1000 milli 1 16 1016.000 16.000 milli 1281 1000 100 milli 10 16 101.600 1.600 milli 1282 1000 10 milli 100 16 10.160 .160 milli 1283 1000 1 milli 1000 16 1.016 .016 milli 1285 Below are some examples of actual RTTs for packets traversing large 1286 enterprise networks. The first example is for packets going to 1287 multiple business partners. 1289 Bytes RTT Bytes Bytes New Overhead 1290 in Packet Per Millisec in PDM RTT 1291 ===================================================================== 1292 1000 17 milli 58 16 17.360 .360 milli 1294 The second example is for packets at a large enterprise customer 1295 within a data center. Notice that the scale is now in microseconds 1296 rather than milliseconds. 1298 Bytes RTT Bytes Bytes New Overhead 1299 in Packet Per Microsec in PDM RTT 1300 ===================================================================== 1301 1000 20 micro 50 16 20.320 .320 micro 1303 Acknowledgments 1305 The authors would like to thank Keven Haining, Al Morton, Brian 1306 Trammel, David Boyes, Bill Jouris, Richard Scheffenegger, and Rick 1307 Troth for their comments and assistance. We would also like to thank 1308 Tero Kivinen and Jouni Korhonen for their detailed and perceptive 1309 reviews. 1311 Authors' Addresses 1313 Nalini Elkins 1314 Inside Products, Inc. 1315 36A Upper Circle 1316 Carmel Valley, CA 93924 1317 United States 1318 Phone: +1 831 659 8360 1319 Email: nalini.elkins@insidethestack.com 1320 http://www.insidethestack.com 1322 Robert M. Hamilton 1323 Chemical Abstracts Service 1324 A Division of the American Chemical Society 1325 2540 Olentangy River Road 1326 Columbus, Ohio 43202 1327 United States 1328 Phone: +1 614 447 3600 x2517 1329 Email: rhamilton@cas.org 1330 http://www.cas.org 1331 Michael S. Ackermann 1332 Blue Cross Blue Shield of Michigan 1333 P.O. Box 2888 1334 Detroit, Michigan 48231 1335 United States 1336 Phone: +1 310 460 4080 1337 Email: mackermann@bcbsm.com 1338 http://www.bcbsm.com