idnits 2.17.1 draft-ietf-ipsecme-iptfs-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1229 has weird spacing: '...4 any any...' == Line 1245 has weird spacing: '...4 any any...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In order for the sender to estimate it's "RTT" value, the sender places a timestamp value in the "TVal" header field. On first receipt of this "TVal", the receiver records the new "TVal" value along with the time it arrived locally, subsequent receipt of the same "TVal" MUST not update the recorded time. When the receiver sends it's CC header it places this latest recorded value in the "TEcho" header field, along with 2 delay values, "Echo Delay" and "Transmit Delay". The "Echo Delay" value is the time delta from the recorded arrival time of "TVal" and the current clock in microseconds. The second value, "Transmit Delay", is the receiver's current transmission delay on the tunnel (i.e., the average time between sending packets on it's half of the IP-TFS tunnel). When the sender receives back it's "TVal" in the "TEcho" header field it calculates 2 RTT estimates. The first is the actual delay found by subtracting the "TEcho" value from it's current clock and then subtracting "Echo Delay" as well. The second RTT estimate is found by adding the received "Transmit Delay" header value to the senders own transmission delay (i.e., the average time between sending packets on it's half of the IP-TFS tunnel). The larger of these 2 RTT estimates SHOULD be used as the "RTT" value. The two estimates are required to handle different combinations of faster or slower tunnel packet paths with faster or slower fixed tunnel rates. Choosing the larger of the two values guarantees that the "RTT" is never considered faster than the aggregate transmission delay based on the IP-TFS tunnel rate (the second estimate), as well as never being considered faster than the actual RTT along the tunnel packet path (the first estimate). -- The document date (January 19, 2021) is 1186 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '--800--' is mentioned on line 1061, but not defined -- Looks like a reference, but probably isn't: '60' on line 1061 == Missing Reference: '-240-' is mentioned on line 1061, but not defined == Missing Reference: '--4000----------------------' is mentioned on line 1061, but not defined Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Hopps 3 Internet-Draft LabN Consulting, L.L.C. 4 Intended status: Standards Track January 19, 2021 5 Expires: July 23, 2021 7 IP-TFS: IP Traffic Flow Security Using Aggregation and Fragmentation 8 draft-ietf-ipsecme-iptfs-06 10 Abstract 12 This document describes a mechanism to enhance IPsec traffic flow 13 security by adding traffic flow confidentiality to encrypted IP 14 encapsulated traffic. Traffic flow confidentiality is provided by 15 obscuring the size and frequency of IP traffic using a fixed-sized, 16 constant-send-rate IPsec tunnel. The solution allows for congestion 17 control as well as non-constant send-rate usage. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on July 23, 2021. 36 Copyright Notice 38 Copyright (c) 2021 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Terminology & Concepts . . . . . . . . . . . . . . . . . 3 55 2. The IP-TFS Tunnel . . . . . . . . . . . . . . . . . . . . . . 4 56 2.1. Tunnel Content . . . . . . . . . . . . . . . . . . . . . 4 57 2.2. Payload Content . . . . . . . . . . . . . . . . . . . . . 5 58 2.2.1. Data Blocks . . . . . . . . . . . . . . . . . . . . . 6 59 2.2.2. No Implicit End Padding Required . . . . . . . . . . 6 60 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 6 61 2.2.4. Empty Payload . . . . . . . . . . . . . . . . . . . . 8 62 2.2.5. IP Header Value Mapping . . . . . . . . . . . . . . . 8 63 2.2.6. IP Time-To-Live (TTL) and Tunnel errors . . . . . . . 9 64 2.2.7. Effective MTU of the Tunnel . . . . . . . . . . . . . 9 65 2.3. Exclusive SA Use . . . . . . . . . . . . . . . . . . . . 9 66 2.4. Modes of Operation . . . . . . . . . . . . . . . . . . . 9 67 2.4.1. Non-Congestion Controlled Mode . . . . . . . . . . . 9 68 2.4.2. Congestion Controlled Mode . . . . . . . . . . . . . 10 69 3. Congestion Information . . . . . . . . . . . . . . . . . . . 11 70 3.1. ECN Support . . . . . . . . . . . . . . . . . . . . . . . 12 71 4. Configuration . . . . . . . . . . . . . . . . . . . . . . . . 13 72 4.1. Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . 13 73 4.2. Fixed Packet Size . . . . . . . . . . . . . . . . . . . . 13 74 4.3. Congestion Control . . . . . . . . . . . . . . . . . . . 13 75 5. IKEv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 5.1. USE_AGGFRAG Notification Message . . . . . . . . . . . . 13 77 6. Packet and Data Formats . . . . . . . . . . . . . . . . . . . 14 78 6.1. AGGFRAG_PAYLOAD Payload . . . . . . . . . . . . . . . . . 14 79 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 15 80 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format . . 15 81 6.1.3. Data Blocks . . . . . . . . . . . . . . . . . . . . . 17 82 6.1.4. IKEv2 USE_AGGFRAG Notification Message . . . . . . . 19 83 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 84 7.1. AGGFRAG_PAYLOAD Sub-Type Registry . . . . . . . . . . . . 20 85 7.2. USE_AGGFRAG Notify Message Status Type . . . . . . . . . 20 86 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20 87 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 88 9.1. Normative References . . . . . . . . . . . . . . . . . . 21 89 9.2. Informative References . . . . . . . . . . . . . . . . . 21 90 Appendix A. Example Of An Encapsulated IP Packet Flow . . . . . 23 91 Appendix B. A Send and Loss Event Rate Calculation . . . . . . . 24 92 Appendix C. Comparisons of IP-TFS . . . . . . . . . . . . . . . 24 93 C.1. Comparing Overhead . . . . . . . . . . . . . . . . . . . 24 94 C.1.1. IP-TFS Overhead . . . . . . . . . . . . . . . . . . . 24 95 C.1.2. ESP with Padding Overhead . . . . . . . . . . . . . . 25 97 C.2. Overhead Comparison . . . . . . . . . . . . . . . . . . . 26 98 C.3. Comparing Available Bandwidth . . . . . . . . . . . . . . 26 99 C.3.1. Ethernet . . . . . . . . . . . . . . . . . . . . . . 27 100 Appendix D. Acknowledgements . . . . . . . . . . . . . . . . . . 29 101 Appendix E. Contributors . . . . . . . . . . . . . . . . . . . . 29 102 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 29 104 1. Introduction 106 Traffic Analysis ([RFC4301], [AppCrypt]) is the act of extracting 107 information about data being sent through a network. While one may 108 directly obscure the data through the use of encryption [RFC4303], 109 the traffic pattern itself exposes information due to variations in 110 it's shape and timing ([I-D.iab-wire-image], [AppCrypt]). Hiding the 111 size and frequency of traffic is referred to as Traffic Flow 112 Confidentiality (TFC) per [RFC4303]. 114 [RFC4303] provides for TFC by allowing padding to be added to 115 encrypted IP packets and allowing for transmission of all-pad packets 116 (indicated using protocol 59). This method has the major limitation 117 that it can significantly under-utilize the available bandwidth. 119 The IP-TFS solution provides for full TFC without the aforementioned 120 bandwidth limitation. This is accomplished by using a constant-send- 121 rate IPsec [RFC4303] tunnel with fixed-sized encapsulating packets; 122 however, these fixed-sized packets can contain partial, whole or 123 multiple IP packets to maximize the bandwidth of the tunnel. A non- 124 constant send-rate is allowed, but the confidentiality properties of 125 its use are outside the scope of this document. 127 For a comparison of the overhead of IP-TFS with the RFC4303 128 prescribed TFC solution see Appendix C. 130 Additionally, IP-TFS provides for dealing with network congestion 131 [RFC2914]. This is important for when the IP-TFS user is not in full 132 control of the domain through which the IP-TFS tunnel path flows. 134 1.1. Terminology & Concepts 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 138 "OPTIONAL" in this document are to be interpreted as described in 139 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, 140 as shown here. 142 This document assumes familiarity with IP security concepts described 143 in [RFC4301]. 145 2. The IP-TFS Tunnel 147 As mentioned in Section 1 IP-TFS utilizes an IPsec [RFC4303] tunnel 148 (SA) as it's transport. To provide for full TFC, fixed-sized 149 encapsulating packets are sent at a constant rate on the tunnel. 151 The primary input to the tunnel algorithm is the requested bandwidth 152 used by the tunnel. Two values are then required to provide for this 153 bandwidth, the fixed size of the encapsulating packets, and rate at 154 which to send them. 156 The fixed packet size MAY either be specified manually or could be 157 determined through the other methods such as the Packetization Layer 158 MTU Discovery (PLMTUD) ([RFC4821], [RFC8899]) or Path MTU discovery 159 (PMTUD) ([RFC1191], [RFC8201]). PMTUD is known to have issues so 160 PLMTUD is considered the more robust option. 162 Given the encapsulating packet size and the requested tunnel used 163 bandwidth, the corresponding packet send rate can be calculated. The 164 packet send rate is the requested bandwidth divided by the size of 165 the encapsulating packet. 167 The egress of the IP-TFS tunnel MUST allow for and expect the ingress 168 (sending) side of the IP-TFS tunnel to vary the size and rate of sent 169 encapsulating packets, unless constrained by other policy. 171 2.1. Tunnel Content 173 As previously mentioned, one issue with the TFC padding solution in 174 [RFC4303] is the large amount of wasted bandwidth as only one IP 175 packet can be sent per encapsulating packet. In order to maximize 176 bandwidth IP-TFS breaks this one-to-one association. 178 IP-TFS aggregates as well as fragments the inner IP traffic flow into 179 fixed-sized encapsulating IPsec tunnel packets. Padding is only 180 added to the the tunnel packets if there is no data available to be 181 sent at the time of tunnel packet transmission, or if fragmentation 182 has been disabled by the receiver. 184 This is accomplished using a new Encapsulating Security Payload (ESP, 185 [RFC4303]) type which is identified by the number AGGFRAG_PAYLOAD 186 (Section 6.1). 188 Other non-IP-TFS uses of this aggregation and fragmentation 189 encapsulation have been identified, such as increased performance 190 through packet aggregation, as well as handling MTU issues using 191 fragmentation. These uses are not defined here, but are also not 192 restricted by this document. 194 2.2. Payload Content 196 The AGGFRAG_PAYLOAD payload content defined in this document is 197 comprised of a 4 or 24 octet header followed by either a partial, a 198 full or multiple partial or full data blocks. The following diagram 199 illustrates this payload within the ESP packet. See Section 6.1 for 200 the exact formats of the AGGFRAG_PAYLOAD payload. 202 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 . Outer Encapsulating Header ... . 204 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 . ESP Header... . 206 +---------------------------------------------------------------+ 207 | [AGGFRAG subtype/flags] : BlockOffset | 208 +---------------------------------------------------------------+ 209 : [Optional Congestion Info] : 210 +---------------------------------------------------------------+ 211 | DataBlocks ... ~ 212 ~ ~ 213 ~ | 214 +---------------------------------------------------------------| 215 . ESP Trailer... . 216 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Figure 1: Layout of an IP-TFS IPsec Packet 220 The "BlockOffset" value is either zero or some offset into or past 221 the end of the "DataBlocks" data. 223 If the "BlockOffset" value is zero it means that the "DataBlocks" 224 data begins with a new data block. 226 Conversely, if the "BlockOffset" value is non-zero it points to the 227 start of the new data block, and the initial "DataBlocks" data 228 belongs to a previous data block that is still being re-assembled. 230 The "BlockOffset" can point past the end of the "DataBlocks" data 231 which indicates that the next data block occurs in a subsequent 232 encapsulating packet. 234 Having the "BlockOffset" always point at the next available data 235 block allows for recovering the next inner packet in the presence of 236 outer encapsulating packet loss. 238 An example IP-TFS packet flow can be found in Appendix A. 240 2.2.1. Data Blocks 242 +---------------------------------------------------------------+ 243 | Type | rest of IPv4, IPv6 or pad. 244 +-------- 246 Figure 2: Layout of IP-TFS data block 248 A data block is defined by a 4-bit type code followed by the data 249 block data. The type values have been carefully chosen to coincide 250 with the IPv4/IPv6 version field values so that no per-data block 251 type overhead is required to encapsulate an IP packet. Likewise, the 252 length of the data block is extracted from the encapsulated IPv4 or 253 IPv6 packet's length field. 255 2.2.2. No Implicit End Padding Required 257 It's worth noting that since a data block type is identified by its 258 first octet there is never a need for an implicit pad at the end of 259 an encapsulating packet. Even when the start of a data block occurs 260 near the end of a encapsulating packet such that there is no room for 261 the length field of the encapsulated header to be included in the 262 current encapsulating packet, the fact that the length comes at a 263 known location and is guaranteed to be present is enough to fetch the 264 length field from the subsequent encapsulating packet payload. Only 265 when there is no data to encapsulated is end padding required, and 266 then an explicit "Pad Data Block" would be used to identify the 267 padding. 269 2.2.3. Fragmentation, Sequence Numbers and All-Pad Payloads 271 In order for a receiver to be able to reassemble fragmented inner- 272 packets, the sender MUST send the inner-packet fragments back-to-back 273 in the logical outer packet stream (i.e., using consecutive ESP 274 sequence numbers). However, the sender is allowed to insert "all- 275 pad" payloads (i.e., payloads with a "BlockOffset" of zero and a 276 single pad "DataBlock") in between the packets carrying the inner- 277 packet fragment payloads. This possible interleaving of all-pad 278 payloads allows the sender to always be able to send a tunnel packet, 279 regardless of the encapsulation computational requirements. 281 When a receiver is reassembling an inner-packet, and it receives an 282 "all-pad" payload, it increments the expected sequence number that 283 the next inner-packet fragment is expected to arrive in. 285 Given the above, the receiver will need to handle out-of-order 286 arrival of outer ESP packets prior to reassembly processing. ESP 287 already provides for optionally detecting replay attacks. Detecting 288 replay attacks normally utilizes a window method. A similar sequence 289 number based sliding window can be used to correct re-ordering of the 290 outer packet stream. Receiving a larger (newer) sequence number 291 packet advances the window, and received older ESP packets whose 292 sequence numbers the window has passed by are dropped. A good choice 293 for the size of this window depends on the amount of re-ordering the 294 user may normally experience. 296 As the amount of reordering that may be present is hard to predict 297 the window size SHOULD be configurable by the user. Implementations 298 MAY also dynamically adjust the reordering window based on actual 299 reordering seen in arriving packets. Finally, we note that as IP-TFS 300 is sending a continuous stream of packets there is no requirement for 301 timers (although there's no prohibition either) as newly arrived 302 packets will cause the window to advance and older packets will then 303 be processed as they leave the window. Implementations that are 304 concerned about memory use when packets are delayed (e.g., when an SA 305 deletion is delayed) can of course use timers to drop packets as 306 well. 308 While ESP guarantees an increasing sequence number with subsequently 309 sent packets, it does not actually require the sequence numbers to be 310 generated with no gaps (e.g., sending only even numbered sequence 311 numbers would be allowed as long as they are always increasing). 312 Gaps in the sequence numbers will not work for this specification so 313 the sequence number stream is further restricted to not contain gaps 314 (i.e., each subsequent outer packet must be sent with the sequence 315 number incremented by 1). 317 When using the AGGFRAG_PAYLOAD in conjunction with replay detection, 318 the window size for both MAY be reduced to share the smaller of the 319 two window sizes. This is b/c packets outside of the smaller window 320 but inside the larger would still be dropped by the mechanism with 321 the smaller window size. 323 Finally, as sequence numbers are reset when switching SAs (e.g., when 324 re-keying a child SA), an implementation SHOULD NOT send initial 325 fragments of an inner packet using one SA and subsequent fragments in 326 a different SA. 328 2.2.3.1. Optional Extra Padding 330 When the tunnel bandwidth is not being fully utilized, an 331 implementation MAY pad-out the current encapsulating packet in order 332 to deliver an inner packet un-fragmented in the following outer 333 packet. The benefit would be to avoid inner-packet fragmentation in 334 the presence of a bursty offered load (non-bursty traffic will 335 naturally not fragment). An implementation MAY also choose to allow 336 for a minimum fragment size to be configured (e.g., as a percentage 337 of the AGGFRAG_PAYLOAD payload size) to avoid fragmentation at the 338 cost of tunnel bandwidth. The cost with these methods is complexity 339 and added delay of inner traffic. The main advantage to avoiding 340 fragmentation is to minimize inner packet loss in the presence of 341 outer packet loss. When this is worthwhile (e.g., how much loss and 342 what type of loss is required, given different inner traffic shapes 343 and utilization, for this to make sense), and what values to use for 344 the allowable/added delay may be worth researching, but is outside 345 the scope of this document. 347 While use of padding to avoid fragmentation does not impact 348 interoperability, used inappropriately it can reduce the effective 349 throughput of a tunnel. Implementations implementing either of the 350 above approaches will need to take care to not reduce the effective 351 capacity, and overall utility, of the tunnel through the overuse of 352 padding. 354 2.2.4. Empty Payload 356 In order to support reporting of congestion control information 357 (described later) on a non-AGGFRAG_PAYLOAD enabled SA, IP-TFS allows 358 for the sending of an AGGFRAG_PAYLOAD payload with no data blocks 359 (i.e., the ESP payload length is equal to the AGGFRAG_PAYLOAD header 360 length). This special payload is called an empty payload. 362 2.2.5. IP Header Value Mapping 364 [RFC4301] provides some direction on when and how to map various 365 values from an inner IP header to the outer encapsulating header, 366 namely the Don't-Fragment (DF) bit ([RFC0791] and [RFC8200]), the 367 Differentiated Services (DS) field [RFC2474] and the Explicit 368 Congestion Notification (ECN) field [RFC3168]. Unlike [RFC4301], IP- 369 TFS may and often will be encapsulating more than one IP packet per 370 ESP packet. To deal with this, these mappings are restricted 371 further. In particular IP-TFS never maps the inner DF bit as it is 372 unrelated to the IP-TFS tunnel functionality; IP-TFS never IP 373 fragments the inner packets and the inner packets will not affect the 374 fragmentation of the outer encapsulation packets. Likewise, the ECN 375 value need not be mapped as any congestion related to the constant- 376 send-rate IP-TFS tunnel is unrelated (by design!) to the inner 377 traffic flow. Finally, by default the DS field SHOULD NOT be copied 378 although an implementation MAY choose to allow for configuration to 379 override this behavior. An implementation SHOULD also allow the DS 380 value to be set by configuration. 382 It is worth noting that an implementation MAY still set the ECN value 383 of inner packets based on the normal ECN specification ([RFC3168]). 385 2.2.6. IP Time-To-Live (TTL) and Tunnel errors 387 [RFC4301] specifies how to modify the inner packet TTL ([RFC0791]). 389 Any errors (e.g., ICMP errors arriving back at the tunnel ingress due 390 to tunnel traffic) should be handled the same as with non IP-TFS 391 IPsec tunnels. 393 2.2.7. Effective MTU of the Tunnel 395 Unlike [RFC4301], there is normally no effective MTU (EMTU) on an IP- 396 TFS tunnel as all IP packet sizes are properly transmitted without 397 requiring IP fragmentation prior to tunnel ingress. That said, an 398 implementation MAY allow for explicitly configuring an MTU for the 399 tunnel. 401 If IP-TFS fragmentation has been disabled, then the tunnel's EMTU and 402 behaviors are the same as normal IPsec tunnels ([RFC4301]). 404 2.3. Exclusive SA Use 406 It is not the intention of this specification to allow for mixed use 407 of an AGGFRAG_PAYLOAD enabled SA. In other words, an SA that has 408 AGGFRAG_PAYLOAD enabled MUST NOT have non-AGGFRAG_PAYLOAD payloads 409 such as IP (IP protocol 4), TCP transport (IP protocol 6), or ESP pad 410 packets (protocol 59) intermixed with non-empty AGGFRAG_PAYLOAD 411 payloads. Empty AGGFRAG_PAYLOAD payloads (Section 2.2.4) are used to 412 transmit congestion control information on non-IP-TFS enabled SAs, so 413 intermixing is allowed in this specific case. While it's possible to 414 envision making the algorithm work in the presence of sequence number 415 skips in the AGGFRAG_PAYLOAD payload stream, the added complexity is 416 not deemed worthwhile. Other IPsec uses can configure and use their 417 own SAs. 419 2.4. Modes of Operation 421 Just as with normal IPsec/ESP tunnels, IP-TFS tunnels are 422 unidirectional. Bidirectional IP-TFS functionality is achieved by 423 setting up 2 IP-TFS tunnels, one in either direction. 425 An IP-TFS tunnel can operate in 2 modes, a non-congestion controlled 426 mode and congestion controlled mode. 428 2.4.1. Non-Congestion Controlled Mode 430 In the non-congestion controlled mode IP-TFS sends fixed-sized 431 packets at a constant rate. The packet send rate is constant and is 432 not automatically adjusted regardless of any network congestion 433 (e.g., packet loss). 435 For similar reasons as given in [RFC7510] the non-congestion 436 controlled mode should only be used where the user has full 437 administrative control over the path the tunnel will take. This is 438 required so the user can guarantee the bandwidth and also be sure as 439 to not be negatively affecting network congestion [RFC2914]. In this 440 case packet loss should be reported to the administrator (e.g., via 441 syslog, YANG notification, SNMP traps, etc) so that any failures due 442 to a lack of bandwidth can be corrected. 444 2.4.2. Congestion Controlled Mode 446 With the congestion controlled mode, IP-TFS adapts to network 447 congestion by lowering the packet send rate to accommodate the 448 congestion, as well as raising the rate when congestion subsides. 449 Since overhead is per packet, by allowing for maximal fixed-size 450 packets and varying the send rate transport overhead is minimized. 452 The output of the congestion control algorithm will adjust the rate 453 at which the ingress sends packets. While this document does not 454 require a specific congestion control algorithm, best current 455 practice RECOMMENDS that the algorithm conform to [RFC5348]. 456 Congestion control principles are documented in [RFC2914] as well. 457 An example of an implementation of the [RFC5348] algorithm which 458 matches the requirements of IP-TFS (i.e., designed for fixed-size 459 packet and send rate varied based on congestion) is documented in 460 [RFC4342]. 462 The required inputs for the TCP friendly rate control algorithm 463 described in [RFC5348] are the receiver's loss event rate and the 464 sender's estimated round-trip time (RTT). These values are provided 465 by IP-TFS using the congestion information header fields described in 466 Section 3. In particular these values are sufficient to implement 467 the algorithm described in [RFC5348]. 469 At a minimum, the congestion information must be sent, from the 470 receiver and from the sender, at least once per RTT. Prior to 471 establishing an RTT the information SHOULD be sent constantly from 472 the sender and the receiver so that an RTT estimate can be 473 established. The lack of receiving this information over multiple 474 consecutive RTT intervals should be considered a congestion event 475 that causes the sender to adjust it's sending rate lower. For 476 example, [RFC4342] calls this the "no feedback timeout" and it is 477 equal to 4 RTT intervals. When a "no feedback timeout" has occurred 478 [RFC4342] halves the sending rate. 480 An implementation MAY choose to always include the congestion 481 information in it's IP-TFS payload header if sending on an IP-TFS 482 enabled SA. Since IP-TFS normally will operate with a large packet 483 size, the congestion information should represent a small portion of 484 the available tunnel bandwidth. An implementation choosing to always 485 send the data MAY also choose to only update the "LossEventRate" and 486 "RTT" header field values it sends every "RTT" though. 488 When an implementation is choosing a congestion control algorithm (or 489 a selection of algorithms) one should remember that IP-TFS is not 490 providing for reliable delivery of IP traffic, and so per packet ACKs 491 are not required and are not provided. 493 It's worth noting that the variable send-rate of a congestion 494 controlled IP-TFS tunnel, is not private; however, this send-rate is 495 being driven by network congestion, and as long as the encapsulated 496 (inner) traffic flow shape and timing are not directly affecting the 497 (outer) network congestion, the variations in the tunnel rate will 498 not weaken the provided inner traffic flow confidentiality. 500 2.4.2.1. Circuit Breakers 502 In additional to congestion control, implementations MAY choose to 503 define and implement circuit breakers [RFC8084] as a recovery method 504 of last resort. Enabling circuit breakers is also a reason a user 505 may wish to enable congestion information reports even when using the 506 non-congestion controlled mode of operation. The definition of 507 circuit breakers are outside the scope of this document. 509 3. Congestion Information 511 In order to support the congestion control mode, the sender needs to 512 know the loss event rate and also be able to approximate the RTT 513 ([RFC5348]). In order to obtain these values the receiver sends 514 congestion control information on it's SA back to the sender. Thus, 515 in order to support congestion control the receiver must have a 516 paired SA back to the sender (this is always the case when the tunnel 517 was created using IKEv2). If the SA back to the sender is a non- 518 AGGFRAG_PAYLOAD enabled SA then an AGGFRAG_PAYLOAD empty payload 519 (i.e., header only) is used to convey the information. 521 In order to calculate a loss event rate compatible with [RFC5348], 522 the receiver needs to have a round-trip time estimate. Thus the 523 sender communicates this estimate in the "RTT" header field. On 524 startup this value will be zero as no RTT estimate is yet known. 526 In order for the sender to estimate it's "RTT" value, the sender 527 places a timestamp value in the "TVal" header field. On first 528 receipt of this "TVal", the receiver records the new "TVal" value 529 along with the time it arrived locally, subsequent receipt of the 530 same "TVal" MUST not update the recorded time. When the receiver 531 sends it's CC header it places this latest recorded value in the 532 "TEcho" header field, along with 2 delay values, "Echo Delay" and 533 "Transmit Delay". The "Echo Delay" value is the time delta from the 534 recorded arrival time of "TVal" and the current clock in 535 microseconds. The second value, "Transmit Delay", is the receiver's 536 current transmission delay on the tunnel (i.e., the average time 537 between sending packets on it's half of the IP-TFS tunnel). When the 538 sender receives back it's "TVal" in the "TEcho" header field it 539 calculates 2 RTT estimates. The first is the actual delay found by 540 subtracting the "TEcho" value from it's current clock and then 541 subtracting "Echo Delay" as well. The second RTT estimate is found 542 by adding the received "Transmit Delay" header value to the senders 543 own transmission delay (i.e., the average time between sending 544 packets on it's half of the IP-TFS tunnel). The larger of these 2 545 RTT estimates SHOULD be used as the "RTT" value. The two estimates 546 are required to handle different combinations of faster or slower 547 tunnel packet paths with faster or slower fixed tunnel rates. 548 Choosing the larger of the two values guarantees that the "RTT" is 549 never considered faster than the aggregate transmission delay based 550 on the IP-TFS tunnel rate (the second estimate), as well as never 551 being considered faster than the actual RTT along the tunnel packet 552 path (the first estimate). 554 The receiver also calculates, and communicates in the "LossEventRate" 555 header field, the loss event rate for use by the sender. This is 556 slightly different from [RFC4342] which periodically sends all the 557 loss interval data back to the sender so that it can do the 558 calculation. See Appendix B for a suggested way to calculate the 559 loss event rate value. Initially this value will be zero (indicating 560 no loss) until enough data has been collected by the receiver to 561 update it. 563 3.1. ECN Support 565 In additional to normal packet loss information IP-TFS supports use 566 of the ECN bits in the encapsulating IP header [RFC3168] for 567 identifying congestion. If ECN use is enabled and a packet arrives 568 at the egress endpoint with the Congestion Experienced (CE) value 569 set, then the receiver considers that packet as being dropped, 570 although it does not drop it. The receiver MUST set the E bit in any 571 AGGFRAG_PAYLOAD payload header containing a "LossEventRate" value 572 derived from a CE value being considered. 574 As noted in [RFC3168] the ECN bits are not protected by IPsec and 575 thus may constitute a covert channel. For this reason ECN use SHOULD 576 NOT be enabled by default. 578 4. Configuration 580 IP-TFS is meant to be deployable with a minimal amount of 581 configuration. All IP-TFS specific configuration should be able to 582 be specified at the unidirectional tunnel ingress (sending) side. It 583 is intended that non-IKEv2 operation is supported, at least, with 584 local static configuration. 586 4.1. Bandwidth 588 Bandwidth is a local configuration option. For non-congestion 589 controlled mode the bandwidth SHOULD be configured. For congestion 590 controlled mode one can configure the bandwidth or have no 591 configuration and let congestion control discover the maximum 592 bandwidth available. No standardized configuration method is 593 required. 595 4.2. Fixed Packet Size 597 The fixed packet size to be used for the tunnel encapsulation packets 598 MAY be configured manually or can be automatically determined using 599 other methods such as PLMTUD ([RFC4821], [RFC8899]) or PMTUD 600 ([RFC1191], [RFC8201]). As PMTUD is known to have issues, PLMTUD is 601 considered the more robust option. No standardized configuration 602 method is required. 604 4.3. Congestion Control 606 Congestion control is a local configuration option. No standardized 607 configuration method is required. 609 5. IKEv2 611 5.1. USE_AGGFRAG Notification Message 613 As mentioned previously IP-TFS tunnels utilize ESP payloads of type 614 AGGFRAG_PAYLOAD. 616 When using IKEv2, a new "USE_AGGFRAG" Notification Message is used to 617 enable use of the AGGFRAG_PAYLOAD payload on a child SA pair. The 618 method used is similar to how USE_TRANSPORT_MODE is negotiated, as 619 described in [RFC7296]. 621 To request using the AGGFRAG_PAYLOAD payload on the Child SA pair, 622 the initiator includes the USE_AGGFRAG notification in an SA payload 623 requesting a new Child SA (either during the initial IKE_AUTH or 624 during non-rekeying CREATE_CHILD_SA exchanges). If the request is 625 accepted then response MUST also include a notification of type 626 USE_AGGFRAG. If the responder declines the request the child SA will 627 be established without AGGFRAG_PAYLOAD payload use enabled. If this 628 is unacceptable to the initiator, the initiator MUST delete the child 629 SA. 631 The USE_AGGFRAG notification MUST NOT be sent, and MUST be ignored, 632 during a CREATE_CHILD_SA rekeying exchange as it is not allowed to 633 change use of the AGGFRAG_PAYLOAD payload type during rekeying. A 634 new child SA due to re-keying inherits the use of AGGFRAG_PAYLOAD 635 from the re-keyed child SA. 637 The USE_AGGFRAG notification contains a 1 octet payload of flags that 638 specify any requirements from the sender of the message. If any 639 requirement flags are not understood or cannot be supported by the 640 receiver then the receiver should not enable use of AGGFRAG_PAYLOAD 641 payload type (either by not responding with the USE_AGGFRAG 642 notification, or in the case of the initiator, by deleting the child 643 SA if the now established non-AGGFRAG_PAYLOAD using SA is 644 unacceptable). 646 The notification type and payload flag values are defined in 647 Section 6.1.4. 649 6. Packet and Data Formats 651 6.1. AGGFRAG_PAYLOAD Payload 653 ESP Payload Type: 0x5 655 An IP-TFS payload is identified by the ESP payload type 656 AGGFRAG_PAYLOAD which has the value 0x5. The first octet of this 657 payload indicates the format of the remaining payload data. 659 0 1 2 3 4 5 6 7 660 +-+-+-+-+-+-+-+-+-+-+- 661 | Sub-type | ... 662 +-+-+-+-+-+-+-+-+-+-+- 664 Sub-type: 665 An 8 bit value indicating the payload format. 667 This specification defines 2 payload sub-types. These payload 668 formats are defined in the following sections. 670 6.1.1. Non-Congestion Control AGGFRAG_PAYLOAD Payload Format 672 The non-congestion control AGGFRAG_PAYLOAD payload is comprised of a 673 4 octet header followed by a variable amount of "DataBlocks" data as 674 shown below. 676 1 2 3 677 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 679 | Sub-Type (0) | Reserved | BlockOffset | 680 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 681 | DataBlocks ... 682 +-+-+-+-+-+-+-+-+-+-+- 684 Sub-type: 685 An octet indicating the payload format. For this non-congestion 686 control format, the value is 0. 688 Reserved: 689 An octet set to 0 on generation, and ignored on receipt. 691 BlockOffset: 692 A 16 bit unsigned integer counting the number of octets of 693 "DataBlocks" data before the start of a new data block. 694 "BlockOffset" can count past the end of the "DataBlocks" data in 695 which case all the "DataBlocks" data belongs to the previous data 696 block being re-assembled. If the "BlockOffset" extends into 697 subsequent packets it continues to only count subsequent 698 "DataBlocks" data (i.e., it does not count subsequent packets 699 non-"DataBlocks" octets). 701 DataBlocks: 702 Variable number of octets that begins with the start of a data 703 block, or the continuation of a previous data block, followed by 704 zero or more additional data blocks. 706 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format 708 The congestion control AGGFRAG_PAYLOAD payload is comprised of a 24 709 octet header followed by a variable amount of "DataBlocks" data as 710 shown below. 712 1 2 3 713 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 715 | Sub-type (1) | Reserved |E| BlockOffset | 716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 717 | LossEventRate | 718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 719 | RTT | Echo Delay ... 720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 721 ... Echo Delay | Transmit Delay | 722 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 | TVal | 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 725 | TEcho | 726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 727 | DataBlocks ... 728 +-+-+-+-+-+-+-+-+-+-+- 730 Sub-type: 731 An octet indicating the payload format. For this congestion 732 control format, the value is 1. 734 Reserved: 735 A 7 bit field set to 0 on generation, and ignored on receipt. 737 E: 738 A 1 bit value if set indicates that Congestion Experienced (CE) 739 ECN bits were received and used in deriving the reported 740 "LossEventRate". 742 BlockOffset: 743 The same value as the non-congestion controlled payload format 744 value. 746 LossEventRate: 747 A 32 bit value specifying the inverse of the current loss event 748 rate as calculated by the receiver. A value of zero indicates no 749 loss. Otherwise the loss event rate is "1/LossEventRate". 751 RTT: 752 A 22 bit value specifying the sender's current round-trip time 753 estimate in microseconds. The value MAY be zero prior to the 754 sender having calculated a round-trip time estimate. The value 755 SHOULD be set to zero on non-AGGFRAG_PAYLOAD enabled SAs. If the 756 value is equal to or larger than "0x3FFFFF" it MUST be set to 757 "0x3FFFFF". 759 Echo Delay: 761 A 21 bit value specifying the delay in microseconds incurred 762 between the receiver first receiving the "TVal" value which it is 763 sending back in "TEcho". If the value is equal to or larger than 764 "0x1FFFFF" it MUST be set to "0x1FFFFF". 766 Transmit Delay: 767 A 21 bit value specifying the transmission delay in microseconds. 768 This is the fixed (or average) delay on the receiver between it 769 sending packets on the IPTFS tunnel. If the value is equal to or 770 larger than "0x1FFFFF" it MUST be set to "0x1FFFFF". 772 TVal: 773 An opaque 32 bit value that will be echoed back by the receiver in 774 later packets in the "TEcho" field, along with an "Echo Delay" 775 value of how long that echo took. 777 TEcho: 778 The opaque 32 bit value from a received packet's "TVal" field. 779 The received "TVal" is placed in "TEcho" along with an "Echo 780 Delay" value indicating how long it has been since receiving the 781 "TVal" value. 783 DataBlocks: 784 Variable number of octets that begins with the start of a data 785 block, or the continuation of a previous data block, followed by 786 zero or more additional data blocks. For the special case of 787 sending congestion control information on an non-IP-TFS enabled SA 788 this value MUST be empty (i.e., be zero octets long). 790 6.1.3. Data Blocks 792 1 2 3 793 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 795 | Type | IPv4, IPv6 or pad... 796 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 798 Type: 799 A 4 bit field where 0x0 identifies a pad data block, 0x4 indicates 800 an IPv4 data block, and 0x6 indicates an IPv6 data block. 802 6.1.3.1. IPv4 Data Block 803 1 2 3 804 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 805 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 806 | 0x4 | IHL | TypeOfService | TotalLength | 807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 808 | Rest of the inner packet ... 809 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 811 These values are the actual values within the encapsulated IPv4 812 header. In other words, the start of this data block is the start of 813 the encapsulated IP packet. 815 Type: 816 A 4 bit value of 0x4 indicating IPv4 (i.e., first nibble of the 817 IPv4 packet). 819 TotalLength: 820 The 16 bit unsigned integer "Total Length" field of the IPv4 inner 821 packet. 823 6.1.3.2. IPv6 Data Block 825 1 2 3 826 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 827 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 828 | 0x6 | TrafficClass | FlowLabel | 829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 830 | PayloadLength | Rest of the inner packet ... 831 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 833 These values are the actual values within the encapsulated IPv6 834 header. In other words, the start of this data block is the start of 835 the encapsulated IP packet. 837 Type: 838 A 4 bit value of 0x6 indicating IPv6 (i.e., first nibble of the 839 IPv6 packet). 841 PayloadLength: 842 The 16 bit unsigned integer "Payload Length" field of the inner 843 IPv6 inner packet. 845 6.1.3.3. Pad Data Block 846 1 2 3 847 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 848 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 849 | 0x0 | Padding ... 850 +-+-+-+-+-+-+-+-+-+-+- 852 Type: 853 A 4 bit value of 0x0 indicating a padding data block. 855 Padding: 856 extends to end of the encapsulating packet. 858 6.1.4. IKEv2 USE_AGGFRAG Notification Message 860 As discussed in Section 5.1 a notification message USE_AGGFRAG is 861 used to negotiate use of the ESP AGGFRAG_PAYLOAD payload type. 863 The USE_AGGFRAG Notification Message State Type is (TBD2). 865 The notification payload contains 1 octet of requirement flags. 866 There are currently 2 requirement flags defined. This may be revised 867 by later specifications. 869 +-+-+-+-+-+-+-+-+ 870 |0|0|0|0|0|0|C|D| 871 +-+-+-+-+-+-+-+-+ 873 0: 874 6 bits - reserved, MUST be zero on send, unless defined by later 875 specifications. 877 C: 878 Congestion Control bit. If set, then the sender is requiring that 879 congestion control information MUST be returned to it periodically 880 as defined in Section 3. 882 D: 883 Don't Fragment bit, if set indicates the sender of the notify 884 message does not support receiving packet fragments (i.e., inner 885 packets MUST be sent using a single "Data Block"). This value 886 only applies to what the sender is capable of receiving; the 887 sender MAY still send packet fragments unless similarly restricted 888 by the receiver in it's USE_AGGFRAG notification. 890 7. IANA Considerations 892 7.1. AGGFRAG_PAYLOAD Sub-Type Registry 894 This document requests IANA create a registry called "AGGFRAG_PAYLOAD 895 Sub-Type Registry" under a new category named "ESP AGGFRAG_PAYLOAD 896 Parameters". The registration policy for this registry is "Standards 897 Action" ([RFC8126] and [RFC7120]). 899 Name: 900 AGGFRAG_PAYLOAD Sub-Type Registry 902 Description: 903 AGGFRAG_PAYLOAD Payload Formats. 905 Reference: 906 This document 908 This initial content for this registry is as follows: 910 Sub-Type Name Reference 911 -------------------------------------------------------- 912 0 Non-Congestion Control Format This document 913 1 Congestion Control Format This document 914 3-255 Reserved 916 7.2. USE_AGGFRAG Notify Message Status Type 918 This document requests a status type USE_AGGFRAG be allocated from 919 the "IKEv2 Notify Message Types - Status Types" registry. 921 Value: 922 TBD2 924 Name: 925 USE_AGGFRAG 927 Reference: 928 This document 930 8. Security Considerations 932 This document describes a mechanism to add Traffic Flow 933 Confidentiality to IP traffic. Use of this mechanism is expected to 934 increase the security of the traffic being transported. Other than 935 the additional security afforded by using this mechanism, IP-TFS 936 utilizes the security protocols [RFC4303] and [RFC7296] and so their 937 security considerations apply to IP-TFS as well. 939 As noted previously in Section 2.4.2, for TFC to be fully maintained 940 the encapsulated traffic flow should not be affecting network 941 congestion in a predictable way, and if it would be then non- 942 congestion controlled mode use should be considered instead. 944 9. References 946 9.1. Normative References 948 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 949 Requirement Levels", BCP 14, RFC 2119, 950 DOI 10.17487/RFC2119, March 1997, 951 . 953 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 954 RFC 4303, DOI 10.17487/RFC4303, December 2005, 955 . 957 [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. 958 Kivinen, "Internet Key Exchange Protocol Version 2 959 (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October 960 2014, . 962 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 963 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 964 May 2017, . 966 9.2. Informative References 968 [AppCrypt] 969 Schneier, B., "Applied Cryptography: Protocols, 970 Algorithms, and Source Code in C", 11 2017. 972 [I-D.iab-wire-image] 973 Trammell, B. and M. Kuehlewind, "The Wire Image of a 974 Network Protocol", draft-iab-wire-image-01 (work in 975 progress), November 2018. 977 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 978 DOI 10.17487/RFC0791, September 1981, 979 . 981 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 982 DOI 10.17487/RFC1191, November 1990, 983 . 985 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 986 "Definition of the Differentiated Services Field (DS 987 Field) in the IPv4 and IPv6 Headers", RFC 2474, 988 DOI 10.17487/RFC2474, December 1998, 989 . 991 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 992 RFC 2914, DOI 10.17487/RFC2914, September 2000, 993 . 995 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 996 of Explicit Congestion Notification (ECN) to IP", 997 RFC 3168, DOI 10.17487/RFC3168, September 2001, 998 . 1000 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1001 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1002 December 2005, . 1004 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1005 Datagram Congestion Control Protocol (DCCP) Congestion 1006 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1007 DOI 10.17487/RFC4342, March 2006, 1008 . 1010 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 1011 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 1012 . 1014 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1015 Friendly Rate Control (TFRC): Protocol Specification", 1016 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1017 . 1019 [RFC7120] Cotton, M., "Early IANA Allocation of Standards Track Code 1020 Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January 1021 2014, . 1023 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1024 "Encapsulating MPLS in UDP", RFC 7510, 1025 DOI 10.17487/RFC7510, April 2015, 1026 . 1028 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 1029 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1030 . 1032 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 1033 Writing an IANA Considerations Section in RFCs", BCP 26, 1034 RFC 8126, DOI 10.17487/RFC8126, June 2017, 1035 . 1037 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1038 (IPv6) Specification", STD 86, RFC 8200, 1039 DOI 10.17487/RFC8200, July 2017, 1040 . 1042 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1043 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1044 DOI 10.17487/RFC8201, July 2017, 1045 . 1047 [RFC8899] Fairhurst, G., Jones, T., Tuexen, M., Ruengeler, I., and 1048 T. Voelker, "Packetization Layer Path MTU Discovery for 1049 Datagram Transports", RFC 8899, DOI 10.17487/RFC8899, 1050 September 2020, . 1052 Appendix A. Example Of An Encapsulated IP Packet Flow 1054 Below an example inner IP packet flow within the encapsulating tunnel 1055 packet stream is shown. Notice how encapsulated IP packets can start 1056 and end anywhere, and more than one or less than 1 may occur in a 1057 single encapsulating packet. 1059 Offset: 0 Offset: 100 Offset: 2900 Offset: 1400 1060 [ ESP1 (1500) ][ ESP2 (1500) ][ ESP3 (1500) ][ ESP4 (1500) ] 1061 [--800--][--800--][60][-240-][--4000----------------------][pad] 1063 Figure 3: Inner and Outer Packet Flow 1065 The encapsulated IP packet flow (lengths include IP header and 1066 payload) is as follows: an 800 octet packet, an 800 octet packet, a 1067 60 octet packet, a 240 octet packet, a 4000 octet packet. 1069 The "BlockOffset" values in the 4 IP-TFS payload headers for this 1070 packet flow would thus be: 0, 100, 2900, 1400 respectively. The 1071 first encapsulating packet ESP1 has a zero "BlockOffset" which points 1072 at the IP data block immediately following the IP-TFS header. The 1073 following packet ESP2s "BlockOffset" points inward 100 octets to the 1074 start of the 60 octet data block. The third encapsulating packet 1075 ESP3 contains the middle portion of the 4000 octet data block so the 1076 offset points past its end and into the forth encapsulating packet. 1077 The fourth packet ESP4s offset is 1400 pointing at the padding which 1078 follows the completion of the continued 4000 octet packet. 1080 Appendix B. A Send and Loss Event Rate Calculation 1082 The current best practice indicates that congestion control SHOULD be 1083 done in a TCP friendly way. A TCP friendly congestion control 1084 algorithm is described in [RFC5348]. For this IP-TFS use case (as 1085 with [RFC4342]) the (fixed) packet size is used as the segment size 1086 for the algorithm. The main formula in the algorithm for the send 1087 rate is then as follows: 1089 1 1090 X = ----------------------------------------------- 1091 R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2)) 1093 Where "X" is the send rate in packets per second, "R" is the round 1094 trip time estimate and "p" is the loss event rate (the inverse of 1095 which is provided by the receiver). 1097 In addition the algorithm in [RFC5348] also uses an "X_recv" value 1098 (the receiver's receive rate). For IP-TFS one MAY set this value 1099 according to the sender's current tunnel send-rate ("X"). 1101 The IP-TFS receiver, having the RTT estimate from the sender can use 1102 the same method as described in [RFC5348] and [RFC4342] to collect 1103 the loss intervals and calculate the loss event rate value using the 1104 weighted average as indicated. The receiver communicates the inverse 1105 of this value back to the sender in the AGGFRAG_PAYLOAD payload 1106 header field "LossEventRate". 1108 The IP-TFS sender now has both the "R" and "p" values and can 1109 calculate the correct sending rate. If following [RFC5348] the 1110 sender SHOULD also use the slow start mechanism described therein 1111 when the IP-TFS SA is first established. 1113 Appendix C. Comparisons of IP-TFS 1115 C.1. Comparing Overhead 1117 C.1.1. IP-TFS Overhead 1119 The overhead of IP-TFS is 40 bytes per outer packet. Therefore the 1120 octet overhead per inner packet is 40 divided by the number of outer 1121 packets required (fractional allowed). The overhead as a percentage 1122 of inner packet size is a constant based on the Outer MTU size. 1124 OH = 40 / Outer Payload Size / Inner Packet Size 1125 OH % of Inner Packet Size = 100 * OH / Inner Packet Size 1126 OH % of Inner Packet Size = 4000 / Outer Payload Size 1127 Type IP-TFS IP-TFS IP-TFS 1128 MTU 576 1500 9000 1129 PSize 536 1460 8960 1130 ------------------------------- 1131 40 7.46% 2.74% 0.45% 1132 576 7.46% 2.74% 0.45% 1133 1500 7.46% 2.74% 0.45% 1134 9000 7.46% 2.74% 0.45% 1136 Figure 4: IP-TFS Overhead as Percentage of Inner Packet Size 1138 C.1.2. ESP with Padding Overhead 1140 The overhead per inner packet for constant-send-rate padded ESP 1141 (i.e., traditional IPsec TFC) is 36 octets plus any padding, unless 1142 fragmentation is required. 1144 When fragmentation of the inner packet is required to fit in the 1145 outer IPsec packet, overhead is the number of outer packets required 1146 to carry the fragmented inner packet times both the inner IP overhead 1147 (20) and the outer packet overhead (36) minus the initial inner IP 1148 overhead plus any required tail padding in the last encapsulation 1149 packet. The required tail padding is the number of required packets 1150 times the difference of the Outer Payload Size and the IP Overhead 1151 minus the Inner Payload Size. So: 1153 Inner Paylaod Size = IP Packet Size - IP Overhead 1154 Outer Payload Size = MTU - IPsec Overhead 1156 Inner Payload Size 1157 NF0 = ---------------------------------- 1158 Outer Payload Size - IP Overhead 1160 NF = CEILING(NF0) 1162 OH = NF * (IP Overhead + IPsec Overhead) 1163 - IP Overhead 1164 + NF * (Outer Payload Size - IP Overhead) 1165 - Inner Payload Size 1167 OH = NF * (IPsec Overhead + Outer Payload Size) 1168 - (IP Overhead + Inner Payload Size) 1170 OH = NF * (IPsec Overhead + Outer Payload Size) 1171 - Inner Packet Size 1173 C.2. Overhead Comparison 1175 The following tables collect the overhead values for some common L3 1176 MTU sizes in order to compare them. The first table is the number of 1177 octets of overhead for a given L3 MTU sized packet. The second table 1178 is the percentage of overhead in the same MTU sized packet. 1180 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1181 L3 MTU 576 1500 9000 576 1500 9000 1182 PSize 540 1464 8964 536 1460 8960 1183 ----------------------------------------------------------- 1184 40 500 1424 8924 3.0 1.1 0.2 1185 128 412 1336 8836 9.6 3.5 0.6 1186 256 284 1208 8708 19.1 7.0 1.1 1187 536 4 928 8428 40.0 14.7 2.4 1188 576 576 888 8388 43.0 15.8 2.6 1189 1460 268 4 7504 109.0 40.0 6.5 1190 1500 228 1500 7464 111.9 41.1 6.7 1191 8960 1408 1540 4 668.7 245.5 40.0 1192 9000 1368 1500 9000 671.6 246.6 40.2 1194 Figure 5: Overhead comparison in octets 1196 Type ESP+Pad ESP+Pad ESP+Pad IP-TFS IP-TFS IP-TFS 1197 MTU 576 1500 9000 576 1500 9000 1198 PSize 540 1464 8964 536 1460 8960 1199 ----------------------------------------------------------- 1200 40 1250.0% 3560.0% 22310.0% 7.46% 2.74% 0.45% 1201 128 321.9% 1043.8% 6903.1% 7.46% 2.74% 0.45% 1202 256 110.9% 471.9% 3401.6% 7.46% 2.74% 0.45% 1203 536 0.7% 173.1% 1572.4% 7.46% 2.74% 0.45% 1204 576 100.0% 154.2% 1456.2% 7.46% 2.74% 0.45% 1205 1460 18.4% 0.3% 514.0% 7.46% 2.74% 0.45% 1206 1500 15.2% 100.0% 497.6% 7.46% 2.74% 0.45% 1207 8960 15.7% 17.2% 0.0% 7.46% 2.74% 0.45% 1208 9000 15.2% 16.7% 100.0% 7.46% 2.74% 0.45% 1210 Figure 6: Overhead as Percentage of Inner Packet Size 1212 C.3. Comparing Available Bandwidth 1214 Another way to compare the two solutions is to look at the amount of 1215 available bandwidth each solution provides. The following sections 1216 consider and compare the percentage of available bandwidth. For the 1217 sake of providing a well understood baseline normal (unencrypted) 1218 Ethernet as well as normal ESP values are included. 1220 C.3.1. Ethernet 1222 In order to calculate the available bandwidth the per packet overhead 1223 is calculated first. The total overhead of Ethernet is 14+4 octets 1224 of header and CRC plus and additional 20 octets of framing (preamble, 1225 start, and inter-packet gap) for a total of 38 octets. Additionally 1226 the minimum payload is 46 octets. 1228 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1229 MTU 590 1514 9014 590 1514 9014 any any 1230 OH 74 74 74 78 78 78 38 74 1231 ------------------------------------------------------------ 1232 40 614 1538 9038 45 42 40 84 114 1233 128 614 1538 9038 146 134 129 166 202 1234 256 614 1538 9038 293 269 258 294 330 1235 536 614 1538 9038 614 564 540 574 610 1236 576 1228 1538 9038 659 606 581 614 650 1237 1460 1842 1538 9038 1672 1538 1472 1498 1534 1238 1500 1842 3076 9038 1718 1580 1513 1538 1574 1239 8960 11052 10766 9038 10263 9438 9038 8998 9034 1240 9000 11052 10766 18076 10309 9480 9078 9038 9074 1242 Figure 7: L2 Octets Per Packet 1244 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1245 MTU 590 1514 9014 590 1514 9014 any any 1246 OH 74 74 74 78 78 78 38 74 1247 -------------------------------------------------------------- 1248 40 2.0M 0.8M 0.1M 27.3M 29.7M 31.0M 14.9M 11.0M 1249 128 2.0M 0.8M 0.1M 8.5M 9.3M 9.7M 7.5M 6.2M 1250 256 2.0M 0.8M 0.1M 4.3M 4.6M 4.8M 4.3M 3.8M 1251 536 2.0M 0.8M 0.1M 2.0M 2.2M 2.3M 2.2M 2.0M 1252 576 1.0M 0.8M 0.1M 1.9M 2.1M 2.2M 2.0M 1.9M 1253 1460 678K 812K 138K 747K 812K 848K 834K 814K 1254 1500 678K 406K 138K 727K 791K 826K 812K 794K 1255 8960 113K 116K 138K 121K 132K 138K 138K 138K 1256 9000 113K 116K 69K 121K 131K 137K 138K 137K 1258 Figure 8: Packets Per Second on 10G Ethernet 1260 Size E + P E + P E + P IPTFS IPTFS IPTFS Enet ESP 1261 590 1514 9014 590 1514 9014 any any 1262 74 74 74 78 78 78 38 74 1263 ---------------------------------------------------------------------- 1264 40 6.51% 2.60% 0.44% 87.30% 94.93% 99.14% 47.62% 35.09% 1265 128 20.85% 8.32% 1.42% 87.30% 94.93% 99.14% 77.11% 63.37% 1266 256 41.69% 16.64% 2.83% 87.30% 94.93% 99.14% 87.07% 77.58% 1267 536 87.30% 34.85% 5.93% 87.30% 94.93% 99.14% 93.38% 87.87% 1268 576 46.91% 37.45% 6.37% 87.30% 94.93% 99.14% 93.81% 88.62% 1269 1460 79.26% 94.93% 16.15% 87.30% 94.93% 99.14% 97.46% 95.18% 1270 1500 81.43% 48.76% 16.60% 87.30% 94.93% 99.14% 97.53% 95.30% 1271 8960 81.07% 83.22% 99.14% 87.30% 94.93% 99.14% 99.58% 99.18% 1272 9000 81.43% 83.60% 49.79% 87.30% 94.93% 99.14% 99.58% 99.18% 1274 Figure 9: Percentage of Bandwidth on 10G Ethernet 1276 A sometimes unexpected result of using IP-TFS (or any packet 1277 aggregating tunnel) is that, for small to medium sized packets, the 1278 available bandwidth is actually greater than native Ethernet. This 1279 is due to the reduction in Ethernet framing overhead. This increased 1280 bandwidth is paid for with an increase in latency. This latency is 1281 the time to send the unrelated octets in the outer tunnel frame. The 1282 following table illustrates the latency for some common values on a 1283 10G Ethernet link. The table also includes latency introduced by 1284 padding if using ESP with padding. 1286 ESP+Pad ESP+Pad IP-TFS IP-TFS 1287 1500 9000 1500 9000 1289 ------------------------------------------ 1290 40 1.14 us 7.14 us 1.17 us 7.17 us 1291 128 1.07 us 7.07 us 1.10 us 7.10 us 1292 256 0.97 us 6.97 us 1.00 us 7.00 us 1293 536 0.74 us 6.74 us 0.77 us 6.77 us 1294 576 0.71 us 6.71 us 0.74 us 6.74 us 1295 1460 0.00 us 6.00 us 0.04 us 6.04 us 1296 1500 1.20 us 5.97 us 0.00 us 6.00 us 1298 Figure 10: Added Latency 1300 Notice that the latency values are very similar between the two 1301 solutions; however, whereas IP-TFS provides for constant high 1302 bandwidth, in some cases even exceeding native Ethernet, ESP with 1303 padding often greatly reduces available bandwidth. 1305 Appendix D. Acknowledgements 1307 We would like to thank Don Fedyk for help in reviewing and editing 1308 this work. We would also like to thank Valery Smyslov for reviews 1309 and suggestions for improvements as well as Joseph Touch for the 1310 transport area review and suggested improvements. 1312 Appendix E. Contributors 1314 The following people made significant contributions to this document. 1316 Lou Berger 1317 LabN Consulting, L.L.C. 1319 Email: lberger@labn.net 1321 Author's Address 1323 Christian Hopps 1324 LabN Consulting, L.L.C. 1326 Email: chopps@chopps.org