idnits 2.17.1 draft-ietf-ipsecme-iptfs-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1117 has weird spacing: '...4 any any...' == Line 1133 has weird spacing: '...4 any any...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In order for the sender to estimate it's "RTT" value, the sender places a timestamp value in the "TVal" header field. On first receipt of this "TVal", the receiver records the new "TVal" value along with the time it arrived locally, subsequent receipt of the same "TVal" MUST not update the recorded time. When the receiver sends it's CC header it places this latest recorded value in the "TEcho" header field, along with 2 delay values, "Echo Delay" and "Transmit Delay". The "Echo Delay" value is the time delta from the recorded arrival time of "TVal" and the current clock in microseconds. The second value, "Transmit Delay", is the receiver's current transmission delay on the tunnel (i.e., the average time between sending packets on it's half of the IP-TFS tunnel). When the sender receives back it's "TVal" in the "TEcho" header field it calculates 2 RTT estimates. The first is the actual delay found by subtracting the "TEcho" value from it's current clock and then subtracting "Echo Delay" as well. The second RTT estimate is found by adding the received "Transmit Delay" header value to the senders own transmission delay (i.e., the average time between sending packets on it's half of the IP-TFS tunnel). The larger of these 2 RTT estimates SHOULD be used as the "RTT" value. The two estimates are required to handle different combinations of faster or slower tunnel packet paths with faster or slower fixed tunnel rates. Choosing the larger of the two values guarantees that the "RTT" is never considered faster than the aggregate transmission delay based on the IP-TFS tunnel rate (the second estimate), as well as never being considered faster than the actual RTT along the tunnel packet path (the first estimate). -- The document date (December 18, 2020) is 1218 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '--800--' is mentioned on line 949, but not defined -- Looks like a reference, but probably isn't: '60' on line 949 == Missing Reference: '-240-' is mentioned on line 949, but not defined == Missing Reference: '--4000----------------------' is mentioned on line 949, but not defined Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Hopps 3 Internet-Draft LabN Consulting, L.L.C. 4 Intended status: Standards Track December 18, 2020 5 Expires: June 21, 2021 7 IP Traffic Flow Security Using Aggregation and Fragmentation 8 draft-ietf-ipsecme-iptfs-04 10 Abstract 12 This document describes a mechanism to enhance IPsec traffic flow 13 security by adding traffic flow confidentiality to encrypted IP 14 encapsulated traffic. Traffic flow confidentiality is provided by 15 obscuring the size and frequency of IP traffic using a fixed-sized, 16 constant-send-rate IPsec tunnel. The solution allows for congestion 17 control as well as non-constant send-rate usage. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on June 21, 2021. 36 Copyright Notice 38 Copyright (c) 2020 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Terminology & Concepts . . . . . . . . . . . . . . . . . 3 55 2. The IP-TFS Tunnel . . . . . . . . . . . . . . . . . . . . . . 4 56 2.1. Tunnel Content . . . . . . . . . . . . . . . . . . . . . 4 57 2.2. Payload Content . . . . . . . . . . . . . . . . . . . . . 5 58 2.2.1. Data Blocks . . . . . . . . . . . . . . . . . . . . . 6 59 2.2.2. No Implicit End Padding Required . . . . . . . . . . 6 60 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 6 61 2.2.4. Empty Payload . . . . . . . . . . . . . . . . . . . . 7 62 2.2.5. IP Header Value Mapping . . . . . . . . . . . . . . . 7 63 2.3. Exclusive SA Use . . . . . . . . . . . . . . . . . . . . 7 64 2.4. Modes of Operation . . . . . . . . . . . . . . . . . . . 7 65 2.4.1. Non-Congestion Controlled Mode . . . . . . . . . . . 8 66 2.4.2. Congestion Controlled Mode . . . . . . . . . . . . . 8 67 3. Congestion Information . . . . . . . . . . . . . . . . . . . 9 68 3.1. ECN Support . . . . . . . . . . . . . . . . . . . . . . . 10 69 4. Configuration . . . . . . . . . . . . . . . . . . . . . . . . 11 70 4.1. Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . 11 71 4.2. Fixed Packet Size . . . . . . . . . . . . . . . . . . . . 11 72 4.3. Congestion Control . . . . . . . . . . . . . . . . . . . 11 73 5. IKEv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 5.1. USE_AGGFRAG Notification Message . . . . . . . . . . . . 11 75 6. Packet and Data Formats . . . . . . . . . . . . . . . . . . . 12 76 6.1. AGGFRAG_PAYLOAD Payload . . . . . . . . . . . . . . . . . 12 77 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 13 78 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format . . 13 79 6.1.3. Data Blocks . . . . . . . . . . . . . . . . . . . . . 15 80 6.1.4. IKEv2 USE_AGGFRAG Notification Message . . . . . . . 17 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 82 7.1. AGGFRAG_PAYLOAD Sub-Type Registry . . . . . . . . . . . . 18 83 7.2. USE_AGGFRAG Notify Message Status Type . . . . . . . . . 18 84 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 85 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 9.1. Normative References . . . . . . . . . . . . . . . . . . 19 87 9.2. Informative References . . . . . . . . . . . . . . . . . 19 88 Appendix A. Example Of An Encapsulated IP Packet Flow . . . . . 21 89 Appendix B. A Send and Loss Event Rate Calculation . . . . . . . 21 90 Appendix C. Comparisons of IP-TFS . . . . . . . . . . . . . . . 22 91 C.1. Comparing Overhead . . . . . . . . . . . . . . . . . . . 22 92 C.1.1. IP-TFS Overhead . . . . . . . . . . . . . . . . . . . 22 93 C.1.2. ESP with Padding Overhead . . . . . . . . . . . . . . 23 94 C.2. Overhead Comparison . . . . . . . . . . . . . . . . . . . 24 95 C.3. Comparing Available Bandwidth . . . . . . . . . . . . . . 24 96 C.3.1. Ethernet . . . . . . . . . . . . . . . . . . . . . . 25 97 Appendix D. Acknowledgements . . . . . . . . . . . . . . . . . . 27 98 Appendix E. Contributors . . . . . . . . . . . . . . . . . . . . 27 99 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 27 101 1. Introduction 103 Traffic Analysis ([RFC4301], [AppCrypt]) is the act of extracting 104 information about data being sent through a network. While one may 105 directly obscure the data through the use of encryption [RFC4303], 106 the traffic pattern itself exposes information due to variations in 107 it's shape and timing ([I-D.iab-wire-image], [AppCrypt]). Hiding the 108 size and frequency of traffic is referred to as Traffic Flow 109 Confidentiality (TFC) per [RFC4303]. 111 [RFC4303] provides for TFC by allowing padding to be added to 112 encrypted IP packets and allowing for transmission of all-pad packets 113 (indicated using protocol 59). This method has the major limitation 114 that it can significantly under-utilize the available bandwidth. 116 The IP-TFS solution provides for full TFC without the aforementioned 117 bandwidth limitation. This is accomplished by using a constant-send- 118 rate IPsec [RFC4303] tunnel with fixed-sized encapsulating packets; 119 however, these fixed-sized packets can contain partial, whole or 120 multiple IP packets to maximize the bandwidth of the tunnel. A non- 121 constant send-rate is allowed, but the confidentiality properties of 122 its use are outside the scope of this document. 124 For a comparison of the overhead of IP-TFS with the RFC4303 125 prescribed TFC solution see Appendix C. 127 Additionally, IP-TFS provides for dealing with network congestion 128 [RFC2914]. This is important for when the IP-TFS user is not in full 129 control of the domain through which the IP-TFS tunnel path flows. 131 1.1. Terminology & Concepts 133 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 135 "OPTIONAL" in this document are to be interpreted as described in 136 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, 137 as shown here. 139 This document assumes familiarity with IP security concepts described 140 in [RFC4301]. 142 2. The IP-TFS Tunnel 144 As mentioned in Section 1 IP-TFS utilizes an IPsec [RFC4303] tunnel 145 (SA) as it's transport. To provide for full TFC, fixed-sized 146 encapsulating packets are sent at a constant rate on the tunnel. 148 The primary input to the tunnel algorithm is the requested bandwidth 149 used by the tunnel. Two values are then required to provide for this 150 bandwidth, the fixed size of the encapsulating packets, and rate at 151 which to send them. 153 The fixed packet size may either be specified manually or can be 154 determined through the use of Path MTU discovery [RFC1191] and 155 [RFC8201]. 157 Given the encapsulating packet size and the requested tunnel used 158 bandwidth, the corresponding packet send rate can be calculated. The 159 packet send rate is the requested bandwidth divided by the size of 160 the encapsulating packet. 162 The egress of the IP-TFS tunnel MUST allow for and expect the ingress 163 (sending) side of the IP-TFS tunnel to vary the size and rate of sent 164 encapsulating packets, unless constrained by other policy. 166 2.1. Tunnel Content 168 As previously mentioned, one issue with the TFC padding solution in 169 [RFC4303] is the large amount of wasted bandwidth as only one IP 170 packet can be sent per encapsulating packet. In order to maximize 171 bandwidth IP-TFS breaks this one-to-one association. 173 IP-TFS aggregates as well as fragments the inner IP traffic flow into 174 fixed-sized encapsulating IPsec tunnel packets. Padding is only 175 added to the the tunnel packets if there is no data available to be 176 sent at the time of tunnel packet transmission, or if fragmentation 177 has been disabled by the receiver. 179 This is accomplished using a new Encapsulating Security Payload (ESP, 180 [RFC4303]) type which is identified by the number AGGFRAG_PAYLOAD 181 (Section 6.1). 183 Other non-IP-TFS uses of this aggregation and fragmentation 184 encapsulation have been identified, such as increased performance 185 through packet aggregation, as well as handling MTU issues using 186 fragmentation. These uses are not defined here, but are also not 187 restricted by this document. 189 2.2. Payload Content 191 The AGGFRAG_PAYLOAD payload content defined in this document is 192 comprised of a 4 or 24 octet header followed by either a partial, a 193 full or multiple partial or full data blocks. The following diagram 194 illustrates this payload within the ESP packet. See Section 6.1 for 195 the exact formats of the AGGFRAG_PAYLOAD payload. 197 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 . Outer Encapsulating Header ... . 199 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 . ESP Header... . 201 +---------------------------------------------------------------+ 202 | ... : BlockOffset | 203 +---------------------------------------------------------------+ 204 : [Optional Congestion Info] : 205 +---------------------------------------------------------------+ 206 | DataBlocks ... ~ 207 ~ ~ 208 ~ | 209 +---------------------------------------------------------------| 210 . ESP Trailer... . 211 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Figure 1: Layout of an IP-TFS IPsec Packet 215 The "BlockOffset" value is either zero or some offset into or past 216 the end of the "DataBlocks" data. 218 If the "BlockOffset" value is zero it means that the "DataBlocks" 219 data begins with a new data block. 221 Conversely, if the "BlockOffset" value is non-zero it points to the 222 start of the new data block, and the initial "DataBlocks" data 223 belongs to a previous data block that is still being re-assembled. 225 The "BlockOffset" can point past the end of the "DataBlocks" data 226 which indicates that the next data block occurs in a subsequent 227 encapsulating packet. 229 Having the "BlockOffset" always point at the next available data 230 block allows for recovering the next full inner packet in the 231 presence of outer encapsulating packet loss. 233 An example IP-TFS packet flow can be found in Appendix A. 235 2.2.1. Data Blocks 237 +---------------------------------------------------------------+ 238 | Type | rest of IPv4, IPv6 or pad. 239 +-------- 241 Figure 2: Layout of IP-TFS data block 243 A data block is defined by a 4-bit type code followed by the data 244 block data. The type values have been carefully chosen to coincide 245 with the IPv4/IPv6 version field values so that no per-data block 246 type overhead is required to encapsulate an IP packet. Likewise, the 247 length of the data block is extracted from the encapsulated IPv4 or 248 IPv6 packet's length field. 250 2.2.2. No Implicit End Padding Required 252 It's worth noting that since a data block type is identified by its 253 first octet there is never a need for an implicit pad at the end of 254 an encapsulating packet. Even when the start of a data block occurs 255 near the end of a encapsulating packet such that there is no room for 256 the length field of the encapsulated header to be included in the 257 current encapsulating packet, the fact that the length comes at a 258 known location and is guaranteed to be present is enough to fetch the 259 length field from the subsequent encapsulating packet payload. Only 260 when there is no data to encapsulated is end padding required, and 261 then an explicit "Pad Data Block" would be used to identify the 262 padding. 264 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 266 In order for a receiver to be able to reassemble fragmented inner- 267 packets, the sender MUST send the inner-packet fragments back-to-back 268 in the logical outer packet stream (i.e., using consecutive ESP 269 sequence numbers). However, the sender is allowed to insert "all- 270 pad" payloads (i.e., payloads with a "BlockOffset" of zero and a 271 single pad "DataBlock") in between the packets carrying the inner- 272 packet fragment payloads. This possible interleaving of all-pad 273 payloads allows the sender to always be able to send a tunnel packet, 274 regardless of the encapsulation computational requirements. 276 When a receiver is reassembling an inner-packet, and it receives an 277 "all-pad" payload, it increments the expected sequence number that 278 the next inner-packet fragment is expected to arrive in. 280 2.2.4. Empty Payload 282 In order to support reporting of congestion control information 283 (described later) on a non-AGGFRAG_PAYLOAD enabled SA, IP-TFS allows 284 for the sending of an AGGFRAG_PAYLOAD payload with no data blocks 285 (i.e., the ESP payload length is equal to the AGGFRAG_PAYLOAD header 286 length). This special payload is called an empty payload. 288 2.2.5. IP Header Value Mapping 290 [RFC4301] provides some direction on when and how to map various 291 values from an inner IP header to the outer encapsulating header, 292 namely the Don't-Fragment (DF) bit ([RFC0791] and [RFC8200]), the 293 Differentiated Services (DS) field [RFC2474] and the Explicit 294 Congestion Notification (ECN) field [RFC3168]. Unlike [RFC4301], IP- 295 TFS may and often will be encapsulating more than one IP packet per 296 ESP packet. To deal with this, these mappings are restricted 297 further. In particular IP-TFS never maps the inner DF bit as it is 298 unrelated to the IP-TFS tunnel functionality; IP-TFS never IP 299 fragments the inner packets and the inner packets will not affect the 300 fragmentation of the outer encapsulation packets. Likewise, the ECN 301 value need not be mapped as any congestion related to the constant- 302 send-rate IP-TFS tunnel is unrelated (by design!) to the inner 303 traffic flow. Finally, by default the DS field SHOULD NOT be copied 304 although an implementation MAY choose to allow for configuration to 305 override this behavior. An implementation SHOULD also allow the DS 306 value to be set by configuration. 308 2.3. Exclusive SA Use 310 It is not the intention of this specification to allow for mixed use 311 of an AGGFRAG_PAYLOAD enabled SA. In other words, an SA that has 312 AGGFRAG_PAYLOAD enabled MUST NOT have non-AGGFRAG_PAYLOAD payloads 313 such as IP (IP protocol 4), TCP transport (IP protocol 6), or ESP pad 314 packets (protocol 59) intermixed with non-empty AGGFRAG_PAYLOAD 315 payloads. While it's possible to envision making the algorithm work 316 in the presence of sequence number skips in the AGGFRAG_PAYLOAD 317 payload stream, the added complexity is not deemed worthwhile. Other 318 IPsec uses can configure and use their own SAs. 320 2.4. Modes of Operation 322 Just as with normal IPsec/ESP tunnels, IP-TFS tunnels are 323 unidirectional. Bidirectional IP-TFS functionality is achieved by 324 setting up 2 IP-TFS tunnels, one in either direction. 326 An IP-TFS tunnel can operate in 2 modes, a non-congestion controlled 327 mode and congestion controlled mode. 329 2.4.1. Non-Congestion Controlled Mode 331 In the non-congestion controlled mode IP-TFS sends fixed-sized 332 packets at a constant rate. The packet send rate is constant and is 333 not automatically adjusted regardless of any network congestion 334 (e.g., packet loss). 336 For similar reasons as given in [RFC7510] the non-congestion 337 controlled mode should only be used where the user has full 338 administrative control over the path the tunnel will take. This is 339 required so the user can guarantee the bandwidth and also be sure as 340 to not be negatively affecting network congestion [RFC2914]. In this 341 case packet loss should be reported to the administrator (e.g., via 342 syslog, YANG notification, SNMP traps, etc) so that any failures due 343 to a lack of bandwidth can be corrected. 345 2.4.2. Congestion Controlled Mode 347 With the congestion controlled mode, IP-TFS adapts to network 348 congestion by lowering the packet send rate to accommodate the 349 congestion, as well as raising the rate when congestion subsides. 350 Since overhead is per packet, by allowing for maximal fixed-size 351 packets and varying the send rate transport overhead is minimized. 353 The output of the congestion control algorithm will adjust the rate 354 at which the ingress sends packets. While this document does not 355 require a specific congestion control algorithm, best current 356 practice RECOMMENDS that the algorithm conform to [RFC5348]. 357 Congestion control principles are documented in [RFC2914] as well. 358 An example of an implementation of the [RFC5348] algorithm which 359 matches the requirements of IP-TFS (i.e., designed for fixed-size 360 packet and send rate varied based on congestion) is documented in 361 [RFC4342]. 363 The required inputs for the TCP friendly rate control algorithm 364 described in [RFC5348] are the receiver's loss event rate and the 365 sender's estimated round-trip time (RTT). These values are provided 366 by IP-TFS using the congestion information header fields described in 367 Section 3. In particular these values are sufficient to implement 368 the algorithm described in [RFC5348]. 370 At a minimum, the congestion information must be sent, from the 371 receiver and from the sender, at least once per RTT. Prior to 372 establishing an RTT the information SHOULD be sent constantly from 373 the sender and the receiver so that an RTT estimate can be 374 established. The lack of receiving this information over multiple 375 consecutive RTT intervals should be considered a congestion event 376 that causes the sender to adjust it's sending rate lower. For 377 example, [RFC4342] calls this the "no feedback timeout" and it is 378 equal to 4 RTT intervals. When a "no feedback timeout" has occurred 379 [RFC4342] halves the sending rate. 381 An implementation MAY choose to always include the congestion 382 information in it's IP-TFS payload header if sending on an IP-TFS 383 enabled SA. Since IP-TFS normally will operate with a large packet 384 size, the congestion information should represent a small portion of 385 the available tunnel bandwidth. An implementation choosing to always 386 send the data MAY also choose to only update the "LossEventRate" and 387 "RTT" header field values it sends every "RTT" though. 389 When an implementation is choosing a congestion control algorithm (or 390 a selection of algorithms) one should remember that IP-TFS is not 391 providing for reliable delivery of IP traffic, and so per packet ACKs 392 are not required and are not provided. 394 It's worth noting that the variable send-rate of a congestion 395 controlled IP-TFS tunnel, is not private; however, this send-rate is 396 being driven by network congestion, and as long as the encapsulated 397 (inner) traffic flow shape and timing are not directly affecting the 398 (outer) network congestion, the variations in the tunnel rate will 399 not weaken the provided inner traffic flow confidentiality. 401 2.4.2.1. Circuit Breakers 403 In additional to congestion control, implementations MAY choose to 404 define and implement circuit breakers [RFC8084] as a recovery method 405 of last resort. Enabling circuit breakers is also a reason a user 406 may wish to enable congestion information reports even when using the 407 non-congestion controlled mode of operation. The definition of 408 circuit breakers are outside the scope of this document. 410 3. Congestion Information 412 In order to support the congestion control mode, the sender needs to 413 know the loss event rate and also be able to approximate the RTT 414 ([RFC5348]). In order to obtain these values the receiver sends 415 congestion control information on it's SA back to the sender. Thus, 416 in order to support congestion control the receiver must have a 417 paired SA back to the sender (this is always the case when the tunnel 418 was created using IKEv2). If the SA back to the sender is a non- 419 AGGFRAG_PAYLOAD enabled SA then an AGGFRAG_PAYLOAD empty payload 420 (i.e., header only) is used to convey the information. 422 In order to calculate a loss event rate compatible with [RFC5348], 423 the receiver needs to have a round-trip time estimate. Thus the 424 sender communicates this estimate in the "RTT" header field. On 425 startup this value will be zero as no RTT estimate is yet known. 427 In order for the sender to estimate it's "RTT" value, the sender 428 places a timestamp value in the "TVal" header field. On first 429 receipt of this "TVal", the receiver records the new "TVal" value 430 along with the time it arrived locally, subsequent receipt of the 431 same "TVal" MUST not update the recorded time. When the receiver 432 sends it's CC header it places this latest recorded value in the 433 "TEcho" header field, along with 2 delay values, "Echo Delay" and 434 "Transmit Delay". The "Echo Delay" value is the time delta from the 435 recorded arrival time of "TVal" and the current clock in 436 microseconds. The second value, "Transmit Delay", is the receiver's 437 current transmission delay on the tunnel (i.e., the average time 438 between sending packets on it's half of the IP-TFS tunnel). When the 439 sender receives back it's "TVal" in the "TEcho" header field it 440 calculates 2 RTT estimates. The first is the actual delay found by 441 subtracting the "TEcho" value from it's current clock and then 442 subtracting "Echo Delay" as well. The second RTT estimate is found 443 by adding the received "Transmit Delay" header value to the senders 444 own transmission delay (i.e., the average time between sending 445 packets on it's half of the IP-TFS tunnel). The larger of these 2 446 RTT estimates SHOULD be used as the "RTT" value. The two estimates 447 are required to handle different combinations of faster or slower 448 tunnel packet paths with faster or slower fixed tunnel rates. 449 Choosing the larger of the two values guarantees that the "RTT" is 450 never considered faster than the aggregate transmission delay based 451 on the IP-TFS tunnel rate (the second estimate), as well as never 452 being considered faster than the actual RTT along the tunnel packet 453 path (the first estimate). 455 The receiver also calculates, and communicates in the "LossEventRate" 456 header field, the loss event rate for use by the sender. This is 457 slightly different from [RFC4342] which periodically sends all the 458 loss interval data back to the sender so that it can do the 459 calculation. See Appendix B for a suggested way to calculate the 460 loss event rate value. Initially this value will be zero (indicating 461 no loss) until enough data has been collected by the receiver to 462 update it. 464 3.1. ECN Support 466 In additional to normal packet loss information IP-TFS supports use 467 of the ECN bits in the encapsulating IP header [RFC3168] for 468 identifying congestion. If ECN use is enabled and a packet arrives 469 at the egress endpoint with the Congestion Experienced (CE) value 470 set, then the receiver considers that packet as being dropped, 471 although it does not drop it. The receiver MUST set the E bit in any 472 AGGFRAG_PAYLOAD payload header containing a "LossEventRate" value 473 derived from a CE value being considered. 475 As noted in [RFC3168] the ECN bits are not protected by IPsec and 476 thus may constitute a covert channel. For this reason ECN use SHOULD 477 NOT be enabled by default. 479 4. Configuration 481 IP-TFS is meant to be deployable with a minimal amount of 482 configuration. All IP-TFS specific configuration should be able to 483 be specified at the unidirectional tunnel ingress (sending) side. It 484 is intended that non-IKEv2 operation is supported, at least, with 485 local static configuration. 487 4.1. Bandwidth 489 Bandwidth is a local configuration option. For non-congestion 490 controlled mode the bandwidth SHOULD be configured. For congestion 491 controlled mode one can configure the bandwidth or have no 492 configuration and let congestion control discover the maximum 493 bandwidth available. No standardized configuration method is 494 required. 496 4.2. Fixed Packet Size 498 The fixed packet size to be used for the tunnel encapsulation packets 499 can be configured manually or can be automatically determined using 500 Path MTU discovery (see [RFC1191] and [RFC8201]). No standardized 501 configuration method is required. 503 4.3. Congestion Control 505 Congestion control is a local configuration option. No standardized 506 configuration method is required. 508 5. IKEv2 510 5.1. USE_AGGFRAG Notification Message 512 As mentioned previously IP-TFS tunnels utilize ESP payloads of type 513 AGGFRAG_PAYLOAD. 515 When using IKEv2, a new "USE_AGGFRAG" Notification Message is used to 516 enable use of the AGGFRAG_PAYLOAD payload on a child SA pair. The 517 method used is similar to how USE_TRANSPORT_MODE is negotiated, as 518 described in [RFC7296]. 520 To request using the AGGFRAG_PAYLOAD payload on the Child SA pair, 521 the initiator includes the USE_AGGFRAG notification in an SA payload 522 requesting a new Child SA (either during the initial IKE_AUTH or 523 during non-rekeying CREATE_CHILD_SA exchanges). If the request is 524 accepted then response MUST also include a notification of type 525 USE_AGGFRAG. If the responder declines the request the child SA will 526 be established without AGGFRAG_PAYLOAD payload use enabled. If this 527 is unacceptable to the initiator, the initiator MUST delete the child 528 SA. 530 The USE_AGGFRAG notification MUST NOT be sent, and MUST be ignored, 531 during a CREATE_CHILD_SA rekeying exchange as it is not allowed to 532 change use of the AGGFRAG_PAYLOAD payload type during rekeying. 534 The USE_AGGFRAG notification contains a 1 octet payload of flags that 535 specify any requirements from the sender of the message. If any 536 requirement flags are not understood or cannot be supported by the 537 receiver then the receiver should not enable use of AGGFRAG_PAYLOAD 538 payload type (either by not responding with the USE_AGGFRAG 539 notification, or in the case of the initiator, by deleting the child 540 SA if the now established non-AGGFRAG_PAYLOAD using SA is 541 unacceptable). 543 The notification type and payload flag values are defined in 544 Section 6.1.4. 546 6. Packet and Data Formats 548 6.1. AGGFRAG_PAYLOAD Payload 550 ESP Payload Type: 0x5 552 An IP-TFS payload is identified by the ESP payload type 553 AGGFRAG_PAYLOAD which has the value 0x5. The first octet of this 554 payload indicates the format of the remaining payload data. 556 0 1 2 3 4 5 6 7 557 +-+-+-+-+-+-+-+-+-+-+- 558 | Sub-type | ... 559 +-+-+-+-+-+-+-+-+-+-+- 561 Sub-type: 562 An 8 bit value indicating the payload format. 564 This specification defines 2 payload sub-types. These payload 565 formats are defined in the following sections. 567 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 569 The non-congestion control AGGFRAG_PAYLOAD payload is comprised of a 570 4 octet header followed by a variable amount of "DataBlocks" data as 571 shown below. 573 1 2 3 574 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | Sub-Type (0) | Reserved | BlockOffset | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | DataBlocks ... 579 +-+-+-+-+-+-+-+-+-+-+- 581 Sub-type: 582 An octet indicating the payload format. For this non-congestion 583 control format, the value is 0. 585 Reserved: 586 An octet set to 0 on generation, and ignored on receipt. 588 BlockOffset: 589 A 16 bit unsigned integer counting the number of octets of 590 "DataBlocks" data before the start of a new data block. 591 "BlockOffset" can count past the end of the "DataBlocks" data in 592 which case all the "DataBlocks" data belongs to the previous data 593 block being re-assembled. If the "BlockOffset" extends into 594 subsequent packets it continues to only count subsequent 595 "DataBlocks" data (i.e., it does not count subsequent packets 596 non-"DataBlocks" octets). 598 DataBlocks: 599 Variable number of octets that begins with the start of a data 600 block, or the continuation of a previous data block, followed by 601 zero or more additional data blocks. 603 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format 605 The congestion control AGGFRAG_PAYLOAD payload is comprised of a 24 606 octet header followed by a variable amount of "DataBlocks" data as 607 shown below. 609 1 2 3 610 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 612 | Sub-type (1) | Reserved |E| BlockOffset | 613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 614 | LossEventRate | 615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 616 | RTT | Echo Delay ... 617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 618 ... Echo Delay | Transmit Delay | 619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 | TVal | 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 | TEcho | 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 624 | DataBlocks ... 625 +-+-+-+-+-+-+-+-+-+-+- 627 Sub-type: 628 An octet indicating the payload format. For this congestion 629 control format, the value is 1. 631 Reserved: 632 A 7 bit field set to 0 on generation, and ignored on receipt. 634 E: 635 A 1 bit value if set indicates that Congestion Experienced (CE) 636 ECN bits were received and used in deriving the reported 637 "LossEventRate". 639 BlockOffset: 640 The same value as the non-congestion controlled payload format 641 value. 643 LossEventRate: 644 A 32 bit value specifying the inverse of the current loss event 645 rate as calculated by the receiver. A value of zero indicates no 646 loss. Otherwise the loss event rate is "1/LossEventRate". 648 RTT: 649 A 22 bit value specifying the sender's current round-trip time 650 estimate in microseconds. The value MAY be zero prior to the 651 sender having calculated a round-trip time estimate. The value 652 SHOULD be set to zero on non-AGGFRAG_PAYLOAD enabled SAs. If the 653 value is equal to or larger than "0x3FFFFF" it MUST be set to 654 "0x3FFFFF". 656 Echo Delay: 658 A 21 bit value specifying the delay in microseconds incurred 659 between the receiver first receiving the "TVal" value which it is 660 sending back in "TEcho". If the value is equal to or larger than 661 "0x1FFFFF" it MUST be set to "0x1FFFFF". 663 Transmit Delay: 664 A 21 bit value specifying the transmission delay in microseconds. 665 This is the fixed (or average) delay on the receiver between it 666 sending packets on the IPTFS tunnel. If the value is equal to or 667 larger than "0x1FFFFF" it MUST be set to "0x1FFFFF". 669 TVal: 670 An opaque 32 bit value that will be echoed back by the receiver in 671 later packets in the "TEcho" field, along with an "Echo Delay" 672 value of how long that echo took. 674 TEcho: 675 The opaque 32 bit value from a received packet's "TVal" field. 676 The received "TVal" is placed in "TEcho" along with an "Echo 677 Delay" value indicating how long it has been since receiving the 678 "TVal" value. 680 DataBlocks: 681 Variable number of octets that begins with the start of a data 682 block, or the continuation of a previous data block, followed by 683 zero or more additional data blocks. For the special case of 684 sending congestion control information on an non-IP-TFS enabled SA 685 this value MUST be empty (i.e., be zero octets long). 687 6.1.3. Data Blocks 689 1 2 3 690 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 | Type | IPv4, IPv6 or pad... 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 695 Type: 696 A 4 bit field where 0x0 identifies a pad data block, 0x4 indicates 697 an IPv4 data block, and 0x6 indicates an IPv6 data block. 699 6.1.3.1. IPv4 Data Block 700 1 2 3 701 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 | 0x4 | IHL | TypeOfService | TotalLength | 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 | Rest of the inner packet ... 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 708 These values are the actual values within the encapsulated IPv4 709 header. In other words, the start of this data block is the start of 710 the encapsulated IP packet. 712 Type: 713 A 4 bit value of 0x4 indicating IPv4 (i.e., first nibble of the 714 IPv4 packet). 716 TotalLength: 717 The 16 bit unsigned integer "Total Length" field of the IPv4 inner 718 packet. 720 6.1.3.2. IPv6 Data Block 722 1 2 3 723 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 725 | 0x6 | TrafficClass | FlowLabel | 726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 727 | PayloadLength | Rest of the inner packet ... 728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 730 These values are the actual values within the encapsulated IPv6 731 header. In other words, the start of this data block is the start of 732 the encapsulated IP packet. 734 Type: 735 A 4 bit value of 0x6 indicating IPv6 (i.e., first nibble of the 736 IPv6 packet). 738 PayloadLength: 739 The 16 bit unsigned integer "Payload Length" field of the inner 740 IPv6 inner packet. 742 6.1.3.3. Pad Data Block 743 1 2 3 744 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 746 | 0x0 | Padding ... 747 +-+-+-+-+-+-+-+-+-+-+- 749 Type: 750 A 4 bit value of 0x0 indicating a padding data block. 752 Padding: 753 extends to end of the encapsulating packet. 755 6.1.4. IKEv2 USE_AGGFRAG Notification Message 757 As discussed in Section 5.1 a notification message USE_AGGFRAG is 758 used to negotiate use of the ESP AGGFRAG_PAYLOAD payload type. 760 The USE_AGGFRAG Notification Message State Type is (TBD2). 762 The notification payload contains 1 octet of requirement flags. 763 There are currently 2 requirement flags defined. This may be revised 764 by later specifications. 766 +-+-+-+-+-+-+-+-+ 767 |0|0|0|0|0|0|C|D| 768 +-+-+-+-+-+-+-+-+ 770 0: 771 6 bits - reserved, MUST be zero on send, unless defined by later 772 specifications. 774 C: 775 Congestion Control bit. If set, then the sender is requiring that 776 congestion control information MUST be returned to it periodically 777 as defined in Section 3. 779 D: 780 Don't Fragment bit, if set indicates the sender of the notify 781 message does not support receiving packet fragments (i.e., inner 782 packets MUST be sent using a single "Data Block"). This value 783 only applies to what the sender is capable of receiving; the 784 sender MAY still send packet fragments unless similarly restricted 785 by the receiver in it's USE_AGGFRAG notification. 787 7. IANA Considerations 789 7.1. AGGFRAG_PAYLOAD Sub-Type Registry 791 This document requests IANA create a registry called "AGGFRAG_PAYLOAD 792 Sub-Type Registry" under a new category named "ESP AGGFRAG_PAYLOAD 793 Parameters". The registration policy for this registry is "Standards 794 Action" ([RFC8126] and [RFC7120]). 796 Name: 797 AGGFRAG_PAYLOAD Sub-Type Registry 799 Description: 800 AGGFRAG_PAYLOAD Payload Formats. 802 Reference: 803 This document 805 This initial content for this registry is as follows: 807 Sub-Type Name Reference 808 -------------------------------------------------------- 809 0 Non-Congestion Control Format This document 810 1 Congestion Control Format This document 811 3-255 Reserved 813 7.2. USE_AGGFRAG Notify Message Status Type 815 This document requests a status type USE_AGGFRAG be allocated from 816 the "IKEv2 Notify Message Types - Status Types" registry. 818 Value: 819 TBD2 821 Name: 822 USE_AGGFRAG 824 Reference: 825 This document 827 8. Security Considerations 829 This document describes a mechanism to add Traffic Flow 830 Confidentiality to IP traffic. Use of this mechanism is expected to 831 increase the security of the traffic being transported. Other than 832 the additional security afforded by using this mechanism, IP-TFS 833 utilizes the security protocols [RFC4303] and [RFC7296] and so their 834 security considerations apply to IP-TFS as well. 836 As noted previously in Section 2.4.2, for TFC to be fully maintained 837 the encapsulated traffic flow should not be affecting network 838 congestion in a predictable way, and if it would be then non- 839 congestion controlled mode use should be considered instead. 841 9. References 843 9.1. Normative References 845 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 846 Requirement Levels", BCP 14, RFC 2119, 847 DOI 10.17487/RFC2119, March 1997, 848 . 850 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 851 RFC 4303, DOI 10.17487/RFC4303, December 2005, 852 . 854 [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. 855 Kivinen, "Internet Key Exchange Protocol Version 2 856 (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October 857 2014, . 859 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 860 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 861 May 2017, . 863 9.2. Informative References 865 [AppCrypt] 866 Schneier, B., "Applied Cryptography: Protocols, 867 Algorithms, and Source Code in C", 11 2017. 869 [I-D.iab-wire-image] 870 Trammell, B. and M. Kuehlewind, "The Wire Image of a 871 Network Protocol", draft-iab-wire-image-01 (work in 872 progress), November 2018. 874 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 875 DOI 10.17487/RFC0791, September 1981, 876 . 878 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 879 DOI 10.17487/RFC1191, November 1990, 880 . 882 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 883 "Definition of the Differentiated Services Field (DS 884 Field) in the IPv4 and IPv6 Headers", RFC 2474, 885 DOI 10.17487/RFC2474, December 1998, 886 . 888 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 889 RFC 2914, DOI 10.17487/RFC2914, September 2000, 890 . 892 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 893 of Explicit Congestion Notification (ECN) to IP", 894 RFC 3168, DOI 10.17487/RFC3168, September 2001, 895 . 897 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 898 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 899 December 2005, . 901 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 902 Datagram Congestion Control Protocol (DCCP) Congestion 903 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 904 DOI 10.17487/RFC4342, March 2006, 905 . 907 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 908 Friendly Rate Control (TFRC): Protocol Specification", 909 RFC 5348, DOI 10.17487/RFC5348, September 2008, 910 . 912 [RFC7120] Cotton, M., "Early IANA Allocation of Standards Track Code 913 Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January 914 2014, . 916 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 917 "Encapsulating MPLS in UDP", RFC 7510, 918 DOI 10.17487/RFC7510, April 2015, 919 . 921 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 922 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 923 . 925 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 926 Writing an IANA Considerations Section in RFCs", BCP 26, 927 RFC 8126, DOI 10.17487/RFC8126, June 2017, 928 . 930 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 931 (IPv6) Specification", STD 86, RFC 8200, 932 DOI 10.17487/RFC8200, July 2017, 933 . 935 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 936 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 937 DOI 10.17487/RFC8201, July 2017, 938 . 940 Appendix A. Example Of An Encapsulated IP Packet Flow 942 Below an example inner IP packet flow within the encapsulating tunnel 943 packet stream is shown. Notice how encapsulated IP packets can start 944 and end anywhere, and more than one or less than 1 may occur in a 945 single encapsulating packet. 947 Offset: 0 Offset: 100 Offset: 2900 Offset: 1400 948 [ ESP1 (1500) ][ ESP2 (1500) ][ ESP3 (1500) ][ ESP4 (1500) ] 949 [--800--][--800--][60][-240-][--4000----------------------][pad] 951 Figure 3: Inner and Outer Packet Flow 953 The encapsulated IP packet flow (lengths include IP header and 954 payload) is as follows: an 800 octet packet, an 800 octet packet, a 955 60 octet packet, a 240 octet packet, a 4000 octet packet. 957 The "BlockOffset" values in the 4 IP-TFS payload headers for this 958 packet flow would thus be: 0, 100, 2900, 1400 respectively. The 959 first encapsulating packet ESP1 has a zero "BlockOffset" which points 960 at the IP data block immediately following the IP-TFS header. The 961 following packet ESP2s "BlockOffset" points inward 100 octets to the 962 start of the 60 octet data block. The third encapsulating packet 963 ESP3 contains the middle portion of the 4000 octet data block so the 964 offset points past its end and into the forth encapsulating packet. 965 The fourth packet ESP4s offset is 1400 pointing at the padding which 966 follows the completion of the continued 4000 octet packet. 968 Appendix B. A Send and Loss Event Rate Calculation 970 The current best practice indicates that congestion control SHOULD be 971 done in a TCP friendly way. A TCP friendly congestion control 972 algorithm is described in [RFC5348]. For this IP-TFS use case (as 973 with [RFC4342]) the (fixed) packet size is used as the segment size 974 for the algorithm. The main formula in the algorithm for the send 975 rate is then as follows: 977 1 978 X = ----------------------------------------------- 979 R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2)) 981 Where "X" is the send rate in packets per second, "R" is the round 982 trip time estimate and "p" is the loss event rate (the inverse of 983 which is provided by the receiver). 985 In addition the algorithm in [RFC5348] also uses an "X_recv" value 986 (the receiver's receive rate). For IP-TFS one MAY set this value 987 according to the sender's current tunnel send-rate ("X"). 989 The IP-TFS receiver, having the RTT estimate from the sender can use 990 the same method as described in [RFC5348] and [RFC4342] to collect 991 the loss intervals and calculate the loss event rate value using the 992 weighted average as indicated. The receiver communicates the inverse 993 of this value back to the sender in the AGGFRAG_PAYLOAD payload 994 header field "LossEventRate". 996 The IP-TFS sender now has both the "R" and "p" values and can 997 calculate the correct sending rate. If following [RFC5348] the 998 sender SHOULD also use the slow start mechanism described therein 999 when the IP-TFS SA is first established. 1001 Appendix C. Comparisons of IP-TFS 1003 C.1. Comparing Overhead 1005 C.1.1. IP-TFS Overhead 1007 The overhead of IP-TFS is 40 bytes per outer packet. Therefore the 1008 octet overhead per inner packet is 40 divided by the number of outer 1009 packets required (fractional allowed). The overhead as a percentage 1010 of inner packet size is a constant based on the Outer MTU size. 1012 OH = 40 / Outer Payload Size / Inner Packet Size 1013 OH % of Inner Packet Size = 100 * OH / Inner Packet Size 1014 OH % of Inner Packet Size = 4000 / Outer Payload Size 1015 Type IP-TFS IP-TFS IP-TFS 1016 MTU 576 1500 9000 1017 PSize 536 1460 8960 1018 ------------------------------- 1019 40 7.46% 2.74% 0.45% 1020 576 7.46% 2.74% 0.45% 1021 1500 7.46% 2.74% 0.45% 1022 9000 7.46% 2.74% 0.45% 1024 Figure 4: IP-TFS Overhead as Percentage of Inner Packet Size 1026 C.1.2. ESP with Padding Overhead 1028 The overhead per inner packet for constant-send-rate padded ESP 1029 (i.e., traditional IPsec TFC) is 36 octets plus any padding, unless 1030 fragmentation is required. 1032 When fragmentation of the inner packet is required to fit in the 1033 outer IPsec packet, overhead is the number of outer packets required 1034 to carry the fragmented inner packet times both the inner IP overhead 1035 (20) and the outer packet overhead (36) minus the initial inner IP 1036 overhead plus any required tail padding in the last encapsulation 1037 packet. The required tail padding is the number of required packets 1038 times the difference of the Outer Payload Size and the IP Overhead 1039 minus the Inner Payload Size. So: 1041 Inner Paylaod Size = IP Packet Size - IP Overhead 1042 Outer Payload Size = MTU - IPsec Overhead 1044 Inner Payload Size 1045 NF0 = ---------------------------------- 1046 Outer Payload Size - IP Overhead 1048 NF = CEILING(NF0) 1050 OH = NF * (IP Overhead + IPsec Overhead) 1051 - IP Overhead 1052 + NF * (Outer Payload Size - IP Overhead) 1053 - Inner Payload Size 1055 OH = NF * (IPsec Overhead + Outer Payload Size) 1056 - (IP Overhead + Inner Payload Size) 1058 OH = NF * (IPsec Overhead + Outer Payload Size) 1059 - Inner Packet Size 1061 C.2. Overhead Comparison 1063 The following tables collect the overhead values for some common L3 1064 MTU sizes in order to compare them. The first table is the number of 1065 octets of overhead for a given L3 MTU sized packet. The second table 1066 is the percentage of overhead in the same MTU sized packet. 1068 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1069 L3 MTU 576 1500 9000 576 1500 9000 1070 PSize 540 1464 8964 536 1460 8960 1071 ----------------------------------------------------------- 1072 40 500 1424 8924 3.0 1.1 0.2 1073 128 412 1336 8836 9.6 3.5 0.6 1074 256 284 1208 8708 19.1 7.0 1.1 1075 536 4 928 8428 40.0 14.7 2.4 1076 576 576 888 8388 43.0 15.8 2.6 1077 1460 268 4 7504 109.0 40.0 6.5 1078 1500 228 1500 7464 111.9 41.1 6.7 1079 8960 1408 1540 4 668.7 245.5 40.0 1080 9000 1368 1500 9000 671.6 246.6 40.2 1082 Figure 5: Overhead comparison in octets 1084 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1085 MTU 576 1500 9000 576 1500 9000 1086 PSize 540 1464 8964 536 1460 8960 1087 ----------------------------------------------------------- 1088 40 1250.0% 3560.0% 22310.0% 7.46% 2.74% 0.45% 1089 128 321.9% 1043.8% 6903.1% 7.46% 2.74% 0.45% 1090 256 110.9% 471.9% 3401.6% 7.46% 2.74% 0.45% 1091 536 0.7% 173.1% 1572.4% 7.46% 2.74% 0.45% 1092 576 100.0% 154.2% 1456.2% 7.46% 2.74% 0.45% 1093 1460 18.4% 0.3% 514.0% 7.46% 2.74% 0.45% 1094 1500 15.2% 100.0% 497.6% 7.46% 2.74% 0.45% 1095 8960 15.7% 17.2% 0.0% 7.46% 2.74% 0.45% 1096 9000 15.2% 16.7% 100.0% 7.46% 2.74% 0.45% 1098 Figure 6: Overhead as Percentage of Inner Packet Size 1100 C.3. Comparing Available Bandwidth 1102 Another way to compare the two solutions is to look at the amount of 1103 available bandwidth each solution provides. The following sections 1104 consider and compare the percentage of available bandwidth. For the 1105 sake of providing a well understood baseline normal (unencrypted) 1106 Ethernet as well as normal ESP values are included. 1108 C.3.1. Ethernet 1110 In order to calculate the available bandwidth the per packet overhead 1111 is calculated first. The total overhead of Ethernet is 14+4 octets 1112 of header and CRC plus and additional 20 octets of framing (preamble, 1113 start, and inter-packet gap) for a total of 38 octets. Additionally 1114 the minimum payload is 46 octets. 1116 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1117 MTU 590 1514 9014 590 1514 9014 any any 1118 OH 74 74 74 78 78 78 38 74 1119 ------------------------------------------------------------ 1120 40 614 1538 9038 45 42 40 84 114 1121 128 614 1538 9038 146 134 129 166 202 1122 256 614 1538 9038 293 269 258 294 330 1123 536 614 1538 9038 614 564 540 574 610 1124 576 1228 1538 9038 659 606 581 614 650 1125 1460 1842 1538 9038 1672 1538 1472 1498 1534 1126 1500 1842 3076 9038 1718 1580 1513 1538 1574 1127 8960 11052 10766 9038 10263 9438 9038 8998 9034 1128 9000 11052 10766 18076 10309 9480 9078 9038 9074 1130 Figure 7: L2 Octets Per Packet 1132 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1133 MTU 590 1514 9014 590 1514 9014 any any 1134 OH 74 74 74 78 78 78 38 74 1135 -------------------------------------------------------------- 1136 40 2.0M 0.8M 0.1M 27.3M 29.7M 31.0M 14.9M 11.0M 1137 128 2.0M 0.8M 0.1M 8.5M 9.3M 9.7M 7.5M 6.2M 1138 256 2.0M 0.8M 0.1M 4.3M 4.6M 4.8M 4.3M 3.8M 1139 536 2.0M 0.8M 0.1M 2.0M 2.2M 2.3M 2.2M 2.0M 1140 576 1.0M 0.8M 0.1M 1.9M 2.1M 2.2M 2.0M 1.9M 1141 1460 678K 812K 138K 747K 812K 848K 834K 814K 1142 1500 678K 406K 138K 727K 791K 826K 812K 794K 1143 8960 113K 116K 138K 121K 132K 138K 138K 138K 1144 9000 113K 116K 69K 121K 131K 137K 138K 137K 1146 Figure 8: Packets Per Second on 10G Ethernet 1148 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1149 590 1514 9014 590 1514 9014 any any 1150 74 74 74 78 78 78 38 74 1151 ---------------------------------------------------------------------- 1152 40 6.51% 2.60% 0.44% 87.30% 94.93% 99.14% 47.62% 35.09% 1153 128 20.85% 8.32% 1.42% 87.30% 94.93% 99.14% 77.11% 63.37% 1154 256 41.69% 16.64% 2.83% 87.30% 94.93% 99.14% 87.07% 77.58% 1155 536 87.30% 34.85% 5.93% 87.30% 94.93% 99.14% 93.38% 87.87% 1156 576 46.91% 37.45% 6.37% 87.30% 94.93% 99.14% 93.81% 88.62% 1157 1460 79.26% 94.93% 16.15% 87.30% 94.93% 99.14% 97.46% 95.18% 1158 1500 81.43% 48.76% 16.60% 87.30% 94.93% 99.14% 97.53% 95.30% 1159 8960 81.07% 83.22% 99.14% 87.30% 94.93% 99.14% 99.58% 99.18% 1160 9000 81.43% 83.60% 49.79% 87.30% 94.93% 99.14% 99.58% 99.18% 1162 Figure 9: Percentage of Bandwidth on 10G Ethernet 1164 A sometimes unexpected result of using IP-TFS (or any packet 1165 aggregating tunnel) is that, for small to medium sized packets, the 1166 available bandwidth is actually greater than native Ethernet. This 1167 is due to the reduction in Ethernet framing overhead. This increased 1168 bandwidth is paid for with an increase in latency. This latency is 1169 the time to send the unrelated octets in the outer tunnel frame. The 1170 following table illustrates the latency for some common values on a 1171 10G Ethernet link. The table also includes latency introduced by 1172 padding if using ESP with padding. 1174 ESP+Pad ESP+Pad IP-TFS IP-TFS 1175 1500 9000 1500 9000 1177 ------------------------------------------ 1178 40 1.14 us 7.14 us 1.17 us 7.17 us 1179 128 1.07 us 7.07 us 1.10 us 7.10 us 1180 256 0.97 us 6.97 us 1.00 us 7.00 us 1181 536 0.74 us 6.74 us 0.77 us 6.77 us 1182 576 0.71 us 6.71 us 0.74 us 6.74 us 1183 1460 0.00 us 6.00 us 0.04 us 6.04 us 1184 1500 1.20 us 5.97 us 0.00 us 6.00 us 1186 Figure 10: Added Latency 1188 Notice that the latency values are very similar between the two 1189 solutions; however, whereas IP-TFS provides for constant high 1190 bandwidth, in some cases even exceeding native Ethernet, ESP with 1191 padding often greatly reduces available bandwidth. 1193 Appendix D. Acknowledgements 1195 We would like to thank Don Fedyk for help in reviewing and editing 1196 this work. We would also like to thank Valery Smyslov for reviews 1197 and suggestions for improvements. 1199 Appendix E. Contributors 1201 The following people made significant contributions to this document. 1203 Lou Berger 1204 LabN Consulting, L.L.C. 1206 Email: lberger@labn.net 1208 Author's Address 1210 Christian Hopps 1211 LabN Consulting, L.L.C. 1213 Email: chopps@chopps.org