idnits 2.17.1 

draft-ietf-ipsecme-iptfs-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1229 has weird spacing: '...4   any   any...'

  == Line 1245 has weird spacing: '...4   any    any...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     In order for the sender to estimate it's "RTT" value, the sender
     places a timestamp value in the "TVal" header field.  On first receipt of
     this "TVal", the receiver records the new "TVal" value along with the
     time it arrived locally, subsequent receipt of the same "TVal" MUST not
     update the recorded time.  When the receiver sends it's CC header it
     places this latest recorded value in the "TEcho" header field, along with
     2 delay values, "Echo Delay" and "Transmit Delay".  The "Echo Delay"
     value is the time delta from the recorded arrival time of "TVal" and the
     current clock in microseconds.  The second value, "Transmit Delay", is
     the receiver's current transmission delay on the tunnel (i.e., the
     average time between sending packets on it's half of the IP-TFS tunnel). 
     When the sender receives back it's "TVal" in the "TEcho" header field it
     calculates 2 RTT estimates.  The first is the actual delay found by
     subtracting the "TEcho" value from it's current clock and then
     subtracting "Echo Delay" as well.  The second RTT estimate is found by
     adding the received "Transmit Delay" header value to the senders own
     transmission delay (i.e., the average time between sending packets on
     it's half of the IP-TFS tunnel).  The larger of these 2 RTT estimates
     SHOULD be used as the "RTT" value.  The two estimates are required to
     handle different combinations of faster or slower tunnel packet paths
     with faster or slower fixed tunnel rates. Choosing the larger of the two
     values guarantees that the "RTT" is never considered faster than the
     aggregate transmission delay based on the IP-TFS tunnel rate (the second
     estimate), as well as never being considered faster than the actual RTT
     along the tunnel packet path (the first estimate).

  -- The document date (January 19, 2021) is 1186 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '--800--' is mentioned on line 1061, but not defined

  -- Looks like a reference, but probably isn't: '60' on line 1061

  == Missing Reference: '-240-' is mentioned on line 1061, but not defined

  == Missing Reference: '--4000----------------------' is mentioned on line
     1061, but not defined


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           C. Hopps
3	Internet-Draft                                   LabN Consulting, L.L.C.
4	Intended status: Standards Track                        January 19, 2021
5	Expires: July 23, 2021

7	  IP-TFS: IP Traffic Flow Security Using Aggregation and Fragmentation
8	                      draft-ietf-ipsecme-iptfs-06

10	Abstract

12	   This document describes a mechanism to enhance IPsec traffic flow
13	   security by adding traffic flow confidentiality to encrypted IP
14	   encapsulated traffic.  Traffic flow confidentiality is provided by
15	   obscuring the size and frequency of IP traffic using a fixed-sized,
16	   constant-send-rate IPsec tunnel.  The solution allows for congestion
17	   control as well as non-constant send-rate usage.

19	Status of This Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at https://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on July 23, 2021.

36	Copyright Notice

38	   Copyright (c) 2021 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (https://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
54	     1.1.  Terminology & Concepts  . . . . . . . . . . . . . . . . .   3
55	   2.  The IP-TFS Tunnel . . . . . . . . . . . . . . . . . . . . . .   4
56	     2.1.  Tunnel Content  . . . . . . . . . . . . . . . . . . . . .   4
57	     2.2.  Payload Content . . . . . . . . . . . . . . . . . . . . .   5
58	       2.2.1.  Data Blocks . . . . . . . . . . . . . . . . . . . . .   6
59	       2.2.2.  No Implicit End Padding Required  . . . . . . . . . .   6
60	       2.2.3.  Fragmentation, Sequence Numbers and All-Pad Payloads    6
61	       2.2.4.  Empty Payload . . . . . . . . . . . . . . . . . . . .   8
62	       2.2.5.  IP Header Value Mapping . . . . . . . . . . . . . . .   8
63	       2.2.6.  IP Time-To-Live (TTL) and Tunnel errors . . . . . . .   9
64	       2.2.7.  Effective MTU of the Tunnel . . . . . . . . . . . . .   9
65	     2.3.  Exclusive SA Use  . . . . . . . . . . . . . . . . . . . .   9
66	     2.4.  Modes of Operation  . . . . . . . . . . . . . . . . . . .   9
67	       2.4.1.  Non-Congestion Controlled Mode  . . . . . . . . . . .   9
68	       2.4.2.  Congestion Controlled Mode  . . . . . . . . . . . . .  10
69	   3.  Congestion Information  . . . . . . . . . . . . . . . . . . .  11
70	     3.1.  ECN Support . . . . . . . . . . . . . . . . . . . . . . .  12
71	   4.  Configuration . . . . . . . . . . . . . . . . . . . . . . . .  13
72	     4.1.  Bandwidth . . . . . . . . . . . . . . . . . . . . . . . .  13
73	     4.2.  Fixed Packet Size . . . . . . . . . . . . . . . . . . . .  13
74	     4.3.  Congestion Control  . . . . . . . . . . . . . . . . . . .  13
75	   5.  IKEv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .  13
76	     5.1.  USE_AGGFRAG Notification Message  . . . . . . . . . . . .  13
77	   6.  Packet and Data Formats . . . . . . . . . . . . . . . . . . .  14
78	     6.1.  AGGFRAG_PAYLOAD Payload . . . . . . . . . . . . . . . . .  14
79	       6.1.1.  Non-Congestion Control AGGFRAG_PAYLOAD Payload Format  15
80	       6.1.2.  Congestion Control AGGFRAG_PAYLOAD Payload Format . .  15
81	       6.1.3.  Data Blocks . . . . . . . . . . . . . . . . . . . . .  17
82	       6.1.4.  IKEv2 USE_AGGFRAG Notification Message  . . . . . . .  19
83	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  20
84	     7.1.  AGGFRAG_PAYLOAD Sub-Type Registry . . . . . . . . . . . .  20
85	     7.2.  USE_AGGFRAG Notify Message Status Type  . . . . . . . . .  20
86	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  20
87	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
88	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  21
89	     9.2.  Informative References  . . . . . . . . . . . . . . . . .  21
90	   Appendix A.  Example Of An Encapsulated IP Packet Flow  . . . . .  23
91	   Appendix B.  A Send and Loss Event Rate Calculation . . . . . . .  24
92	   Appendix C.  Comparisons of IP-TFS  . . . . . . . . . . . . . . .  24
93	     C.1.  Comparing Overhead  . . . . . . . . . . . . . . . . . . .  24
94	       C.1.1.  IP-TFS Overhead . . . . . . . . . . . . . . . . . . .  24
95	       C.1.2.  ESP with Padding Overhead . . . . . . . . . . . . . .  25

97	     C.2.  Overhead Comparison . . . . . . . . . . . . . . . . . . .  26
98	     C.3.  Comparing Available Bandwidth . . . . . . . . . . . . . .  26
99	       C.3.1.  Ethernet  . . . . . . . . . . . . . . . . . . . . . .  27
100	   Appendix D.  Acknowledgements . . . . . . . . . . . . . . . . . .  29
101	   Appendix E.  Contributors . . . . . . . . . . . . . . . . . . . .  29
102	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  29

104	1.  Introduction

106	   Traffic Analysis ([RFC4301], [AppCrypt]) is the act of extracting
107	   information about data being sent through a network.  While one may
108	   directly obscure the data through the use of encryption [RFC4303],
109	   the traffic pattern itself exposes information due to variations in
110	   it's shape and timing ([I-D.iab-wire-image], [AppCrypt]).  Hiding the
111	   size and frequency of traffic is referred to as Traffic Flow
112	   Confidentiality (TFC) per [RFC4303].

114	   [RFC4303] provides for TFC by allowing padding to be added to
115	   encrypted IP packets and allowing for transmission of all-pad packets
116	   (indicated using protocol 59).  This method has the major limitation
117	   that it can significantly under-utilize the available bandwidth.

119	   The IP-TFS solution provides for full TFC without the aforementioned
120	   bandwidth limitation.  This is accomplished by using a constant-send-
121	   rate IPsec [RFC4303] tunnel with fixed-sized encapsulating packets;
122	   however, these fixed-sized packets can contain partial, whole or
123	   multiple IP packets to maximize the bandwidth of the tunnel.  A non-
124	   constant send-rate is allowed, but the confidentiality properties of
125	   its use are outside the scope of this document.

127	   For a comparison of the overhead of IP-TFS with the RFC4303
128	   prescribed TFC solution see Appendix C.

130	   Additionally, IP-TFS provides for dealing with network congestion
131	   [RFC2914].  This is important for when the IP-TFS user is not in full
132	   control of the domain through which the IP-TFS tunnel path flows.

134	1.1.  Terminology & Concepts

136	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
137	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
138	   "OPTIONAL" in this document are to be interpreted as described in
139	   [RFC2119] [RFC8174] when, and only when, they appear in all capitals,
140	   as shown here.

142	   This document assumes familiarity with IP security concepts described
143	   in [RFC4301].

145	2.  The IP-TFS Tunnel

147	   As mentioned in Section 1 IP-TFS utilizes an IPsec [RFC4303] tunnel
148	   (SA) as it's transport.  To provide for full TFC, fixed-sized
149	   encapsulating packets are sent at a constant rate on the tunnel.

151	   The primary input to the tunnel algorithm is the requested bandwidth
152	   used by the tunnel.  Two values are then required to provide for this
153	   bandwidth, the fixed size of the encapsulating packets, and rate at
154	   which to send them.

156	   The fixed packet size MAY either be specified manually or could be
157	   determined through the other methods such as the Packetization Layer
158	   MTU Discovery (PLMTUD) ([RFC4821], [RFC8899]) or Path MTU discovery
159	   (PMTUD) ([RFC1191], [RFC8201]).  PMTUD is known to have issues so
160	   PLMTUD is considered the more robust option.

162	   Given the encapsulating packet size and the requested tunnel used
163	   bandwidth, the corresponding packet send rate can be calculated.  The
164	   packet send rate is the requested bandwidth divided by the size of
165	   the encapsulating packet.

167	   The egress of the IP-TFS tunnel MUST allow for and expect the ingress
168	   (sending) side of the IP-TFS tunnel to vary the size and rate of sent
169	   encapsulating packets, unless constrained by other policy.

171	2.1.  Tunnel Content

173	   As previously mentioned, one issue with the TFC padding solution in
174	   [RFC4303] is the large amount of wasted bandwidth as only one IP
175	   packet can be sent per encapsulating packet.  In order to maximize
176	   bandwidth IP-TFS breaks this one-to-one association.

178	   IP-TFS aggregates as well as fragments the inner IP traffic flow into
179	   fixed-sized encapsulating IPsec tunnel packets.  Padding is only
180	   added to the the tunnel packets if there is no data available to be
181	   sent at the time of tunnel packet transmission, or if fragmentation
182	   has been disabled by the receiver.

184	   This is accomplished using a new Encapsulating Security Payload (ESP,
185	   [RFC4303]) type which is identified by the number AGGFRAG_PAYLOAD
186	   (Section 6.1).

188	   Other non-IP-TFS uses of this aggregation and fragmentation
189	   encapsulation have been identified, such as increased performance
190	   through packet aggregation, as well as handling MTU issues using
191	   fragmentation.  These uses are not defined here, but are also not
192	   restricted by this document.

194	2.2.  Payload Content

196	   The AGGFRAG_PAYLOAD payload content defined in this document is
197	   comprised of a 4 or 24 octet header followed by either a partial, a
198	   full or multiple partial or full data blocks.  The following diagram
199	   illustrates this payload within the ESP packet.  See Section 6.1 for
200	   the exact formats of the AGGFRAG_PAYLOAD payload.

202	    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203	    . Outer Encapsulating Header ...                                .
204	    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205	    . ESP Header...                                                 .
206	    +---------------------------------------------------------------+
207	    |   [AGGFRAG subtype/flags]    :           BlockOffset          |
208	    +---------------------------------------------------------------+
209	    :                  [Optional Congestion Info]                   :
210	    +---------------------------------------------------------------+
211	    |       DataBlocks ...                                          ~
212	    ~                                                               ~
213	    ~                                                               |
214	    +---------------------------------------------------------------|
215	    . ESP Trailer...                                                .
216	    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

218	                Figure 1: Layout of an IP-TFS IPsec Packet

220	   The "BlockOffset" value is either zero or some offset into or past
221	   the end of the "DataBlocks" data.

223	   If the "BlockOffset" value is zero it means that the "DataBlocks"
224	   data begins with a new data block.

226	   Conversely, if the "BlockOffset" value is non-zero it points to the
227	   start of the new data block, and the initial "DataBlocks" data
228	   belongs to a previous data block that is still being re-assembled.

230	   The "BlockOffset" can point past the end of the "DataBlocks" data
231	   which indicates that the next data block occurs in a subsequent
232	   encapsulating packet.

234	   Having the "BlockOffset" always point at the next available data
235	   block allows for recovering the next inner packet in the presence of
236	   outer encapsulating packet loss.

238	   An example IP-TFS packet flow can be found in Appendix A.

240	2.2.1.  Data Blocks

242	    +---------------------------------------------------------------+
243	    | Type  | rest of IPv4, IPv6 or pad.
244	    +--------

246	                   Figure 2: Layout of IP-TFS data block

248	   A data block is defined by a 4-bit type code followed by the data
249	   block data.  The type values have been carefully chosen to coincide
250	   with the IPv4/IPv6 version field values so that no per-data block
251	   type overhead is required to encapsulate an IP packet.  Likewise, the
252	   length of the data block is extracted from the encapsulated IPv4 or
253	   IPv6 packet's length field.

255	2.2.2.  No Implicit End Padding Required

257	   It's worth noting that since a data block type is identified by its
258	   first octet there is never a need for an implicit pad at the end of
259	   an encapsulating packet.  Even when the start of a data block occurs
260	   near the end of a encapsulating packet such that there is no room for
261	   the length field of the encapsulated header to be included in the
262	   current encapsulating packet, the fact that the length comes at a
263	   known location and is guaranteed to be present is enough to fetch the
264	   length field from the subsequent encapsulating packet payload.  Only
265	   when there is no data to encapsulated is end padding required, and
266	   then an explicit "Pad Data Block" would be used to identify the
267	   padding.

269	2.2.3.  Fragmentation, Sequence Numbers and All-Pad Payloads

271	   In order for a receiver to be able to reassemble fragmented inner-
272	   packets, the sender MUST send the inner-packet fragments back-to-back
273	   in the logical outer packet stream (i.e., using consecutive ESP
274	   sequence numbers).  However, the sender is allowed to insert "all-
275	   pad" payloads (i.e., payloads with a "BlockOffset" of zero and a
276	   single pad "DataBlock") in between the packets carrying the inner-
277	   packet fragment payloads.  This possible interleaving of all-pad
278	   payloads allows the sender to always be able to send a tunnel packet,
279	   regardless of the encapsulation computational requirements.

281	   When a receiver is reassembling an inner-packet, and it receives an
282	   "all-pad" payload, it increments the expected sequence number that
283	   the next inner-packet fragment is expected to arrive in.

285	   Given the above, the receiver will need to handle out-of-order
286	   arrival of outer ESP packets prior to reassembly processing.  ESP
287	   already provides for optionally detecting replay attacks.  Detecting
288	   replay attacks normally utilizes a window method.  A similar sequence
289	   number based sliding window can be used to correct re-ordering of the
290	   outer packet stream.  Receiving a larger (newer) sequence number
291	   packet advances the window, and received older ESP packets whose
292	   sequence numbers the window has passed by are dropped.  A good choice
293	   for the size of this window depends on the amount of re-ordering the
294	   user may normally experience.

296	   As the amount of reordering that may be present is hard to predict
297	   the window size SHOULD be configurable by the user.  Implementations
298	   MAY also dynamically adjust the reordering window based on actual
299	   reordering seen in arriving packets.  Finally, we note that as IP-TFS
300	   is sending a continuous stream of packets there is no requirement for
301	   timers (although there's no prohibition either) as newly arrived
302	   packets will cause the window to advance and older packets will then
303	   be processed as they leave the window.  Implementations that are
304	   concerned about memory use when packets are delayed (e.g., when an SA
305	   deletion is delayed) can of course use timers to drop packets as
306	   well.

308	   While ESP guarantees an increasing sequence number with subsequently
309	   sent packets, it does not actually require the sequence numbers to be
310	   generated with no gaps (e.g., sending only even numbered sequence
311	   numbers would be allowed as long as they are always increasing).
312	   Gaps in the sequence numbers will not work for this specification so
313	   the sequence number stream is further restricted to not contain gaps
314	   (i.e., each subsequent outer packet must be sent with the sequence
315	   number incremented by 1).

317	   When using the AGGFRAG_PAYLOAD in conjunction with replay detection,
318	   the window size for both MAY be reduced to share the smaller of the
319	   two window sizes.  This is b/c packets outside of the smaller window
320	   but inside the larger would still be dropped by the mechanism with
321	   the smaller window size.

323	   Finally, as sequence numbers are reset when switching SAs (e.g., when
324	   re-keying a child SA), an implementation SHOULD NOT send initial
325	   fragments of an inner packet using one SA and subsequent fragments in
326	   a different SA.

328	2.2.3.1.  Optional Extra Padding

330	   When the tunnel bandwidth is not being fully utilized, an
331	   implementation MAY pad-out the current encapsulating packet in order
332	   to deliver an inner packet un-fragmented in the following outer
333	   packet.  The benefit would be to avoid inner-packet fragmentation in
334	   the presence of a bursty offered load (non-bursty traffic will
335	   naturally not fragment).  An implementation MAY also choose to allow
336	   for a minimum fragment size to be configured (e.g., as a percentage
337	   of the AGGFRAG_PAYLOAD payload size) to avoid fragmentation at the
338	   cost of tunnel bandwidth.  The cost with these methods is complexity
339	   and added delay of inner traffic.  The main advantage to avoiding
340	   fragmentation is to minimize inner packet loss in the presence of
341	   outer packet loss.  When this is worthwhile (e.g., how much loss and
342	   what type of loss is required, given different inner traffic shapes
343	   and utilization, for this to make sense), and what values to use for
344	   the allowable/added delay may be worth researching, but is outside
345	   the scope of this document.

347	   While use of padding to avoid fragmentation does not impact
348	   interoperability, used inappropriately it can reduce the effective
349	   throughput of a tunnel.  Implementations implementing either of the
350	   above approaches will need to take care to not reduce the effective
351	   capacity, and overall utility, of the tunnel through the overuse of
352	   padding.

354	2.2.4.  Empty Payload

356	   In order to support reporting of congestion control information
357	   (described later) on a non-AGGFRAG_PAYLOAD enabled SA, IP-TFS allows
358	   for the sending of an AGGFRAG_PAYLOAD payload with no data blocks
359	   (i.e., the ESP payload length is equal to the AGGFRAG_PAYLOAD header
360	   length).  This special payload is called an empty payload.

362	2.2.5.  IP Header Value Mapping

364	   [RFC4301] provides some direction on when and how to map various
365	   values from an inner IP header to the outer encapsulating header,
366	   namely the Don't-Fragment (DF) bit ([RFC0791] and [RFC8200]), the
367	   Differentiated Services (DS) field [RFC2474] and the Explicit
368	   Congestion Notification (ECN) field [RFC3168].  Unlike [RFC4301], IP-
369	   TFS may and often will be encapsulating more than one IP packet per
370	   ESP packet.  To deal with this, these mappings are restricted
371	   further.  In particular IP-TFS never maps the inner DF bit as it is
372	   unrelated to the IP-TFS tunnel functionality; IP-TFS never IP
373	   fragments the inner packets and the inner packets will not affect the
374	   fragmentation of the outer encapsulation packets.  Likewise, the ECN
375	   value need not be mapped as any congestion related to the constant-
376	   send-rate IP-TFS tunnel is unrelated (by design!) to the inner
377	   traffic flow.  Finally, by default the DS field SHOULD NOT be copied
378	   although an implementation MAY choose to allow for configuration to
379	   override this behavior.  An implementation SHOULD also allow the DS
380	   value to be set by configuration.

382	   It is worth noting that an implementation MAY still set the ECN value
383	   of inner packets based on the normal ECN specification ([RFC3168]).

385	2.2.6.  IP Time-To-Live (TTL) and Tunnel errors

387	   [RFC4301] specifies how to modify the inner packet TTL ([RFC0791]).

389	   Any errors (e.g., ICMP errors arriving back at the tunnel ingress due
390	   to tunnel traffic) should be handled the same as with non IP-TFS
391	   IPsec tunnels.

393	2.2.7.  Effective MTU of the Tunnel

395	   Unlike [RFC4301], there is normally no effective MTU (EMTU) on an IP-
396	   TFS tunnel as all IP packet sizes are properly transmitted without
397	   requiring IP fragmentation prior to tunnel ingress.  That said, an
398	   implementation MAY allow for explicitly configuring an MTU for the
399	   tunnel.

401	   If IP-TFS fragmentation has been disabled, then the tunnel's EMTU and
402	   behaviors are the same as normal IPsec tunnels ([RFC4301]).

404	2.3.  Exclusive SA Use

406	   It is not the intention of this specification to allow for mixed use
407	   of an AGGFRAG_PAYLOAD enabled SA.  In other words, an SA that has
408	   AGGFRAG_PAYLOAD enabled MUST NOT have non-AGGFRAG_PAYLOAD payloads
409	   such as IP (IP protocol 4), TCP transport (IP protocol 6), or ESP pad
410	   packets (protocol 59) intermixed with non-empty AGGFRAG_PAYLOAD
411	   payloads.  Empty AGGFRAG_PAYLOAD payloads (Section 2.2.4) are used to
412	   transmit congestion control information on non-IP-TFS enabled SAs, so
413	   intermixing is allowed in this specific case.  While it's possible to
414	   envision making the algorithm work in the presence of sequence number
415	   skips in the AGGFRAG_PAYLOAD payload stream, the added complexity is
416	   not deemed worthwhile.  Other IPsec uses can configure and use their
417	   own SAs.

419	2.4.  Modes of Operation

421	   Just as with normal IPsec/ESP tunnels, IP-TFS tunnels are
422	   unidirectional.  Bidirectional IP-TFS functionality is achieved by
423	   setting up 2 IP-TFS tunnels, one in either direction.

425	   An IP-TFS tunnel can operate in 2 modes, a non-congestion controlled
426	   mode and congestion controlled mode.

428	2.4.1.  Non-Congestion Controlled Mode

430	   In the non-congestion controlled mode IP-TFS sends fixed-sized
431	   packets at a constant rate.  The packet send rate is constant and is
432	   not automatically adjusted regardless of any network congestion
433	   (e.g., packet loss).

435	   For similar reasons as given in [RFC7510] the non-congestion
436	   controlled mode should only be used where the user has full
437	   administrative control over the path the tunnel will take.  This is
438	   required so the user can guarantee the bandwidth and also be sure as
439	   to not be negatively affecting network congestion [RFC2914].  In this
440	   case packet loss should be reported to the administrator (e.g., via
441	   syslog, YANG notification, SNMP traps, etc) so that any failures due
442	   to a lack of bandwidth can be corrected.

444	2.4.2.  Congestion Controlled Mode

446	   With the congestion controlled mode, IP-TFS adapts to network
447	   congestion by lowering the packet send rate to accommodate the
448	   congestion, as well as raising the rate when congestion subsides.
449	   Since overhead is per packet, by allowing for maximal fixed-size
450	   packets and varying the send rate transport overhead is minimized.

452	   The output of the congestion control algorithm will adjust the rate
453	   at which the ingress sends packets.  While this document does not
454	   require a specific congestion control algorithm, best current
455	   practice RECOMMENDS that the algorithm conform to [RFC5348].
456	   Congestion control principles are documented in [RFC2914] as well.
457	   An example of an implementation of the [RFC5348] algorithm which
458	   matches the requirements of IP-TFS (i.e., designed for fixed-size
459	   packet and send rate varied based on congestion) is documented in
460	   [RFC4342].

462	   The required inputs for the TCP friendly rate control algorithm
463	   described in [RFC5348] are the receiver's loss event rate and the
464	   sender's estimated round-trip time (RTT).  These values are provided
465	   by IP-TFS using the congestion information header fields described in
466	   Section 3.  In particular these values are sufficient to implement
467	   the algorithm described in [RFC5348].

469	   At a minimum, the congestion information must be sent, from the
470	   receiver and from the sender, at least once per RTT.  Prior to
471	   establishing an RTT the information SHOULD be sent constantly from
472	   the sender and the receiver so that an RTT estimate can be
473	   established.  The lack of receiving this information over multiple
474	   consecutive RTT intervals should be considered a congestion event
475	   that causes the sender to adjust it's sending rate lower.  For
476	   example, [RFC4342] calls this the "no feedback timeout" and it is
477	   equal to 4 RTT intervals.  When a "no feedback timeout" has occurred
478	   [RFC4342] halves the sending rate.

480	   An implementation MAY choose to always include the congestion
481	   information in it's IP-TFS payload header if sending on an IP-TFS
482	   enabled SA.  Since IP-TFS normally will operate with a large packet
483	   size, the congestion information should represent a small portion of
484	   the available tunnel bandwidth.  An implementation choosing to always
485	   send the data MAY also choose to only update the "LossEventRate" and
486	   "RTT" header field values it sends every "RTT" though.

488	   When an implementation is choosing a congestion control algorithm (or
489	   a selection of algorithms) one should remember that IP-TFS is not
490	   providing for reliable delivery of IP traffic, and so per packet ACKs
491	   are not required and are not provided.

493	   It's worth noting that the variable send-rate of a congestion
494	   controlled IP-TFS tunnel, is not private; however, this send-rate is
495	   being driven by network congestion, and as long as the encapsulated
496	   (inner) traffic flow shape and timing are not directly affecting the
497	   (outer) network congestion, the variations in the tunnel rate will
498	   not weaken the provided inner traffic flow confidentiality.

500	2.4.2.1.  Circuit Breakers

502	   In additional to congestion control, implementations MAY choose to
503	   define and implement circuit breakers [RFC8084] as a recovery method
504	   of last resort.  Enabling circuit breakers is also a reason a user
505	   may wish to enable congestion information reports even when using the
506	   non-congestion controlled mode of operation.  The definition of
507	   circuit breakers are outside the scope of this document.

509	3.  Congestion Information

511	   In order to support the congestion control mode, the sender needs to
512	   know the loss event rate and also be able to approximate the RTT
513	   ([RFC5348]).  In order to obtain these values the receiver sends
514	   congestion control information on it's SA back to the sender.  Thus,
515	   in order to support congestion control the receiver must have a
516	   paired SA back to the sender (this is always the case when the tunnel
517	   was created using IKEv2).  If the SA back to the sender is a non-
518	   AGGFRAG_PAYLOAD enabled SA then an AGGFRAG_PAYLOAD empty payload
519	   (i.e., header only) is used to convey the information.

521	   In order to calculate a loss event rate compatible with [RFC5348],
522	   the receiver needs to have a round-trip time estimate.  Thus the
523	   sender communicates this estimate in the "RTT" header field.  On
524	   startup this value will be zero as no RTT estimate is yet known.

526	   In order for the sender to estimate it's "RTT" value, the sender
527	   places a timestamp value in the "TVal" header field.  On first
528	   receipt of this "TVal", the receiver records the new "TVal" value
529	   along with the time it arrived locally, subsequent receipt of the
530	   same "TVal" MUST not update the recorded time.  When the receiver
531	   sends it's CC header it places this latest recorded value in the
532	   "TEcho" header field, along with 2 delay values, "Echo Delay" and
533	   "Transmit Delay".  The "Echo Delay" value is the time delta from the
534	   recorded arrival time of "TVal" and the current clock in
535	   microseconds.  The second value, "Transmit Delay", is the receiver's
536	   current transmission delay on the tunnel (i.e., the average time
537	   between sending packets on it's half of the IP-TFS tunnel).  When the
538	   sender receives back it's "TVal" in the "TEcho" header field it
539	   calculates 2 RTT estimates.  The first is the actual delay found by
540	   subtracting the "TEcho" value from it's current clock and then
541	   subtracting "Echo Delay" as well.  The second RTT estimate is found
542	   by adding the received "Transmit Delay" header value to the senders
543	   own transmission delay (i.e., the average time between sending
544	   packets on it's half of the IP-TFS tunnel).  The larger of these 2
545	   RTT estimates SHOULD be used as the "RTT" value.  The two estimates
546	   are required to handle different combinations of faster or slower
547	   tunnel packet paths with faster or slower fixed tunnel rates.
548	   Choosing the larger of the two values guarantees that the "RTT" is
549	   never considered faster than the aggregate transmission delay based
550	   on the IP-TFS tunnel rate (the second estimate), as well as never
551	   being considered faster than the actual RTT along the tunnel packet
552	   path (the first estimate).

554	   The receiver also calculates, and communicates in the "LossEventRate"
555	   header field, the loss event rate for use by the sender.  This is
556	   slightly different from [RFC4342] which periodically sends all the
557	   loss interval data back to the sender so that it can do the
558	   calculation.  See Appendix B for a suggested way to calculate the
559	   loss event rate value.  Initially this value will be zero (indicating
560	   no loss) until enough data has been collected by the receiver to
561	   update it.

563	3.1.  ECN Support

565	   In additional to normal packet loss information IP-TFS supports use
566	   of the ECN bits in the encapsulating IP header [RFC3168] for
567	   identifying congestion.  If ECN use is enabled and a packet arrives
568	   at the egress endpoint with the Congestion Experienced (CE) value
569	   set, then the receiver considers that packet as being dropped,
570	   although it does not drop it.  The receiver MUST set the E bit in any
571	   AGGFRAG_PAYLOAD payload header containing a "LossEventRate" value
572	   derived from a CE value being considered.

574	   As noted in [RFC3168] the ECN bits are not protected by IPsec and
575	   thus may constitute a covert channel.  For this reason ECN use SHOULD
576	   NOT be enabled by default.

578	4.  Configuration

580	   IP-TFS is meant to be deployable with a minimal amount of
581	   configuration.  All IP-TFS specific configuration should be able to
582	   be specified at the unidirectional tunnel ingress (sending) side.  It
583	   is intended that non-IKEv2 operation is supported, at least, with
584	   local static configuration.

586	4.1.  Bandwidth

588	   Bandwidth is a local configuration option.  For non-congestion
589	   controlled mode the bandwidth SHOULD be configured.  For congestion
590	   controlled mode one can configure the bandwidth or have no
591	   configuration and let congestion control discover the maximum
592	   bandwidth available.  No standardized configuration method is
593	   required.

595	4.2.  Fixed Packet Size

597	   The fixed packet size to be used for the tunnel encapsulation packets
598	   MAY be configured manually or can be automatically determined using
599	   other methods such as PLMTUD ([RFC4821], [RFC8899]) or PMTUD
600	   ([RFC1191], [RFC8201]).  As PMTUD is known to have issues, PLMTUD is
601	   considered the more robust option.  No standardized configuration
602	   method is required.

604	4.3.  Congestion Control

606	   Congestion control is a local configuration option.  No standardized
607	   configuration method is required.

609	5.  IKEv2

611	5.1.  USE_AGGFRAG Notification Message

613	   As mentioned previously IP-TFS tunnels utilize ESP payloads of type
614	   AGGFRAG_PAYLOAD.

616	   When using IKEv2, a new "USE_AGGFRAG" Notification Message is used to
617	   enable use of the AGGFRAG_PAYLOAD payload on a child SA pair.  The
618	   method used is similar to how USE_TRANSPORT_MODE is negotiated, as
619	   described in [RFC7296].

621	   To request using the AGGFRAG_PAYLOAD payload on the Child SA pair,
622	   the initiator includes the USE_AGGFRAG notification in an SA payload
623	   requesting a new Child SA (either during the initial IKE_AUTH or
624	   during non-rekeying CREATE_CHILD_SA exchanges).  If the request is
625	   accepted then response MUST also include a notification of type
626	   USE_AGGFRAG.  If the responder declines the request the child SA will
627	   be established without AGGFRAG_PAYLOAD payload use enabled.  If this
628	   is unacceptable to the initiator, the initiator MUST delete the child
629	   SA.

631	   The USE_AGGFRAG notification MUST NOT be sent, and MUST be ignored,
632	   during a CREATE_CHILD_SA rekeying exchange as it is not allowed to
633	   change use of the AGGFRAG_PAYLOAD payload type during rekeying.  A
634	   new child SA due to re-keying inherits the use of AGGFRAG_PAYLOAD
635	   from the re-keyed child SA.

637	   The USE_AGGFRAG notification contains a 1 octet payload of flags that
638	   specify any requirements from the sender of the message.  If any
639	   requirement flags are not understood or cannot be supported by the
640	   receiver then the receiver should not enable use of AGGFRAG_PAYLOAD
641	   payload type (either by not responding with the USE_AGGFRAG
642	   notification, or in the case of the initiator, by deleting the child
643	   SA if the now established non-AGGFRAG_PAYLOAD using SA is
644	   unacceptable).

646	   The notification type and payload flag values are defined in
647	   Section 6.1.4.

649	6.  Packet and Data Formats

651	6.1.  AGGFRAG_PAYLOAD Payload

653	   ESP Payload Type: 0x5

655	   An IP-TFS payload is identified by the ESP payload type
656	   AGGFRAG_PAYLOAD which has the value 0x5.  The first octet of this
657	   payload indicates the format of the remaining payload data.

659	     0 1 2 3 4 5 6 7
660	    +-+-+-+-+-+-+-+-+-+-+-
661	    |   Sub-type    | ...
662	    +-+-+-+-+-+-+-+-+-+-+-

664	   Sub-type:
665	      An 8 bit value indicating the payload format.

667	   This specification defines 2 payload sub-types.  These payload
668	   formats are defined in the following sections.

670	6.1.1.  Non-Congestion Control AGGFRAG_PAYLOAD Payload Format

672	   The non-congestion control AGGFRAG_PAYLOAD payload is comprised of a
673	   4 octet header followed by a variable amount of "DataBlocks" data as
674	   shown below.

676	                         1                   2                   3
677	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
678	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
679	    |  Sub-Type (0) |   Reserved    |          BlockOffset          |
680	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
681	    |       DataBlocks ...
682	    +-+-+-+-+-+-+-+-+-+-+-

684	   Sub-type:
685	      An octet indicating the payload format.  For this non-congestion
686	      control format, the value is 0.

688	   Reserved:
689	      An octet set to 0 on generation, and ignored on receipt.

691	   BlockOffset:
692	      A 16 bit unsigned integer counting the number of octets of
693	      "DataBlocks" data before the start of a new data block.
694	      "BlockOffset" can count past the end of the "DataBlocks" data in
695	      which case all the "DataBlocks" data belongs to the previous data
696	      block being re-assembled.  If the "BlockOffset" extends into
697	      subsequent packets it continues to only count subsequent
698	      "DataBlocks" data (i.e., it does not count subsequent packets
699	      non-"DataBlocks" octets).

701	   DataBlocks:
702	      Variable number of octets that begins with the start of a data
703	      block, or the continuation of a previous data block, followed by
704	      zero or more additional data blocks.

706	6.1.2.  Congestion Control AGGFRAG_PAYLOAD Payload Format

708	   The congestion control AGGFRAG_PAYLOAD payload is comprised of a 24
709	   octet header followed by a variable amount of "DataBlocks" data as
710	   shown below.

712	                         1                   2                   3
713	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
714	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
715	    |  Sub-type (1) |  Reserved   |E|          BlockOffset          |
716	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
717	    |                          LossEventRate                        |
718	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
719	    |                      RTT                  |   Echo Delay ...
720	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
721	         ... Echo Delay   |           Transmit Delay                |
722	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
723	    |                              TVal                             |
724	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
725	    |                             TEcho                             |
726	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
727	    |       DataBlocks ...
728	    +-+-+-+-+-+-+-+-+-+-+-

730	   Sub-type:
731	      An octet indicating the payload format.  For this congestion
732	      control format, the value is 1.

734	   Reserved:
735	      A 7 bit field set to 0 on generation, and ignored on receipt.

737	   E:
738	      A 1 bit value if set indicates that Congestion Experienced (CE)
739	      ECN bits were received and used in deriving the reported
740	      "LossEventRate".

742	   BlockOffset:
743	      The same value as the non-congestion controlled payload format
744	      value.

746	   LossEventRate:
747	      A 32 bit value specifying the inverse of the current loss event
748	      rate as calculated by the receiver.  A value of zero indicates no
749	      loss.  Otherwise the loss event rate is "1/LossEventRate".

751	   RTT:
752	      A 22 bit value specifying the sender's current round-trip time
753	      estimate in microseconds.  The value MAY be zero prior to the
754	      sender having calculated a round-trip time estimate.  The value
755	      SHOULD be set to zero on non-AGGFRAG_PAYLOAD enabled SAs.  If the
756	      value is equal to or larger than "0x3FFFFF" it MUST be set to
757	      "0x3FFFFF".

759	   Echo Delay:

761	      A 21 bit value specifying the delay in microseconds incurred
762	      between the receiver first receiving the "TVal" value which it is
763	      sending back in "TEcho".  If the value is equal to or larger than
764	      "0x1FFFFF" it MUST be set to "0x1FFFFF".

766	   Transmit Delay:
767	      A 21 bit value specifying the transmission delay in microseconds.
768	      This is the fixed (or average) delay on the receiver between it
769	      sending packets on the IPTFS tunnel.  If the value is equal to or
770	      larger than "0x1FFFFF" it MUST be set to "0x1FFFFF".

772	   TVal:
773	      An opaque 32 bit value that will be echoed back by the receiver in
774	      later packets in the "TEcho" field, along with an "Echo Delay"
775	      value of how long that echo took.

777	   TEcho:
778	      The opaque 32 bit value from a received packet's "TVal" field.
779	      The received "TVal" is placed in "TEcho" along with an "Echo
780	      Delay" value indicating how long it has been since receiving the
781	      "TVal" value.

783	   DataBlocks:
784	      Variable number of octets that begins with the start of a data
785	      block, or the continuation of a previous data block, followed by
786	      zero or more additional data blocks.  For the special case of
787	      sending congestion control information on an non-IP-TFS enabled SA
788	      this value MUST be empty (i.e., be zero octets long).

790	6.1.3.  Data Blocks

792	                         1                   2                   3
793	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
794	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
795	    | Type  | IPv4, IPv6 or pad...
796	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

798	   Type:
799	      A 4 bit field where 0x0 identifies a pad data block, 0x4 indicates
800	      an IPv4 data block, and 0x6 indicates an IPv6 data block.

802	6.1.3.1.  IPv4 Data Block
803	                         1                   2                   3
804	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
805	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
806	    |  0x4  |  IHL  |  TypeOfService  |         TotalLength         |
807	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
808	    | Rest of the inner packet ...
809	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

811	   These values are the actual values within the encapsulated IPv4
812	   header.  In other words, the start of this data block is the start of
813	   the encapsulated IP packet.

815	   Type:
816	      A 4 bit value of 0x4 indicating IPv4 (i.e., first nibble of the
817	      IPv4 packet).

819	   TotalLength:
820	      The 16 bit unsigned integer "Total Length" field of the IPv4 inner
821	      packet.

823	6.1.3.2.  IPv6 Data Block

825	                         1                   2                   3
826	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
827	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
828	    |  0x6  | TrafficClass  |               FlowLabel               |
829	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
830	    |         PayloadLength         | Rest of the inner packet ...
831	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

833	   These values are the actual values within the encapsulated IPv6
834	   header.  In other words, the start of this data block is the start of
835	   the encapsulated IP packet.

837	   Type:
838	      A 4 bit value of 0x6 indicating IPv6 (i.e., first nibble of the
839	      IPv6 packet).

841	   PayloadLength:
842	      The 16 bit unsigned integer "Payload Length" field of the inner
843	      IPv6 inner packet.

845	6.1.3.3.  Pad Data Block
846	                         1                   2                   3
847	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
848	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
849	    |  0x0  | Padding ...
850	    +-+-+-+-+-+-+-+-+-+-+-

852	   Type:
853	      A 4 bit value of 0x0 indicating a padding data block.

855	   Padding:
856	      extends to end of the encapsulating packet.

858	6.1.4.  IKEv2 USE_AGGFRAG Notification Message

860	   As discussed in Section 5.1 a notification message USE_AGGFRAG is
861	   used to negotiate use of the ESP AGGFRAG_PAYLOAD payload type.

863	   The USE_AGGFRAG Notification Message State Type is (TBD2).

865	   The notification payload contains 1 octet of requirement flags.
866	   There are currently 2 requirement flags defined.  This may be revised
867	   by later specifications.

869	    +-+-+-+-+-+-+-+-+
870	    |0|0|0|0|0|0|C|D|
871	    +-+-+-+-+-+-+-+-+

873	   0:
874	      6 bits - reserved, MUST be zero on send, unless defined by later
875	      specifications.

877	   C:
878	      Congestion Control bit.  If set, then the sender is requiring that
879	      congestion control information MUST be returned to it periodically
880	      as defined in Section 3.

882	   D:
883	      Don't Fragment bit, if set indicates the sender of the notify
884	      message does not support receiving packet fragments (i.e., inner
885	      packets MUST be sent using a single "Data Block").  This value
886	      only applies to what the sender is capable of receiving; the
887	      sender MAY still send packet fragments unless similarly restricted
888	      by the receiver in it's USE_AGGFRAG notification.

890	7.  IANA Considerations

892	7.1.  AGGFRAG_PAYLOAD Sub-Type Registry

894	   This document requests IANA create a registry called "AGGFRAG_PAYLOAD
895	   Sub-Type Registry" under a new category named "ESP AGGFRAG_PAYLOAD
896	   Parameters".  The registration policy for this registry is "Standards
897	   Action" ([RFC8126] and [RFC7120]).

899	   Name:
900	      AGGFRAG_PAYLOAD Sub-Type Registry

902	   Description:
903	      AGGFRAG_PAYLOAD Payload Formats.

905	   Reference:
906	      This document

908	   This initial content for this registry is as follows:

910	    Sub-Type  Name                           Reference
911	   --------------------------------------------------------
912	           0  Non-Congestion Control Format  This document
913	           1  Congestion Control Format      This document
914	       3-255  Reserved

916	7.2.  USE_AGGFRAG Notify Message Status Type

918	   This document requests a status type USE_AGGFRAG be allocated from
919	   the "IKEv2 Notify Message Types - Status Types" registry.

921	   Value:
922	      TBD2

924	   Name:
925	      USE_AGGFRAG

927	   Reference:
928	      This document

930	8.  Security Considerations

932	   This document describes a mechanism to add Traffic Flow
933	   Confidentiality to IP traffic.  Use of this mechanism is expected to
934	   increase the security of the traffic being transported.  Other than
935	   the additional security afforded by using this mechanism, IP-TFS
936	   utilizes the security protocols [RFC4303] and [RFC7296] and so their
937	   security considerations apply to IP-TFS as well.

939	   As noted previously in Section 2.4.2, for TFC to be fully maintained
940	   the encapsulated traffic flow should not be affecting network
941	   congestion in a predictable way, and if it would be then non-
942	   congestion controlled mode use should be considered instead.

944	9.  References

946	9.1.  Normative References

948	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
949	              Requirement Levels", BCP 14, RFC 2119,
950	              DOI 10.17487/RFC2119, March 1997,
951	              <https://www.rfc-editor.org/info/rfc2119>.

953	   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)",
954	              RFC 4303, DOI 10.17487/RFC4303, December 2005,
955	              <https://www.rfc-editor.org/info/rfc4303>.

957	   [RFC7296]  Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T.
958	              Kivinen, "Internet Key Exchange Protocol Version 2
959	              (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October
960	              2014, <https://www.rfc-editor.org/info/rfc7296>.

962	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
963	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
964	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

966	9.2.  Informative References

968	   [AppCrypt]
969	              Schneier, B., "Applied Cryptography: Protocols,
970	              Algorithms, and Source Code in C", 11 2017.

972	   [I-D.iab-wire-image]
973	              Trammell, B. and M. Kuehlewind, "The Wire Image of a
974	              Network Protocol", draft-iab-wire-image-01 (work in
975	              progress), November 2018.

977	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
978	              DOI 10.17487/RFC0791, September 1981,
979	              <https://www.rfc-editor.org/info/rfc791>.

981	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
982	              DOI 10.17487/RFC1191, November 1990,
983	              <https://www.rfc-editor.org/info/rfc1191>.

985	   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
986	              "Definition of the Differentiated Services Field (DS
987	              Field) in the IPv4 and IPv6 Headers", RFC 2474,
988	              DOI 10.17487/RFC2474, December 1998,
989	              <https://www.rfc-editor.org/info/rfc2474>.

991	   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
992	              RFC 2914, DOI 10.17487/RFC2914, September 2000,
993	              <https://www.rfc-editor.org/info/rfc2914>.

995	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
996	              of Explicit Congestion Notification (ECN) to IP",
997	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
998	              <https://www.rfc-editor.org/info/rfc3168>.

1000	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
1001	              Internet Protocol", RFC 4301, DOI 10.17487/RFC4301,
1002	              December 2005, <https://www.rfc-editor.org/info/rfc4301>.

1004	   [RFC4342]  Floyd, S., Kohler, E., and J. Padhye, "Profile for
1005	              Datagram Congestion Control Protocol (DCCP) Congestion
1006	              Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
1007	              DOI 10.17487/RFC4342, March 2006,
1008	              <https://www.rfc-editor.org/info/rfc4342>.

1010	   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
1011	              Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007,
1012	              <https://www.rfc-editor.org/info/rfc4821>.

1014	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
1015	              Friendly Rate Control (TFRC): Protocol Specification",
1016	              RFC 5348, DOI 10.17487/RFC5348, September 2008,
1017	              <https://www.rfc-editor.org/info/rfc5348>.

1019	   [RFC7120]  Cotton, M., "Early IANA Allocation of Standards Track Code
1020	              Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January
1021	              2014, <https://www.rfc-editor.org/info/rfc7120>.

1023	   [RFC7510]  Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black,
1024	              "Encapsulating MPLS in UDP", RFC 7510,
1025	              DOI 10.17487/RFC7510, April 2015,
1026	              <https://www.rfc-editor.org/info/rfc7510>.

1028	   [RFC8084]  Fairhurst, G., "Network Transport Circuit Breakers",
1029	              BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017,
1030	              <https://www.rfc-editor.org/info/rfc8084>.

1032	   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
1033	              Writing an IANA Considerations Section in RFCs", BCP 26,
1034	              RFC 8126, DOI 10.17487/RFC8126, June 2017,
1035	              <https://www.rfc-editor.org/info/rfc8126>.

1037	   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1038	              (IPv6) Specification", STD 86, RFC 8200,
1039	              DOI 10.17487/RFC8200, July 2017,
1040	              <https://www.rfc-editor.org/info/rfc8200>.

1042	   [RFC8201]  McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed.,
1043	              "Path MTU Discovery for IP version 6", STD 87, RFC 8201,
1044	              DOI 10.17487/RFC8201, July 2017,
1045	              <https://www.rfc-editor.org/info/rfc8201>.

1047	   [RFC8899]  Fairhurst, G., Jones, T., Tuexen, M., Ruengeler, I., and
1048	              T. Voelker, "Packetization Layer Path MTU Discovery for
1049	              Datagram Transports", RFC 8899, DOI 10.17487/RFC8899,
1050	              September 2020, <https://www.rfc-editor.org/info/rfc8899>.

1052	Appendix A.  Example Of An Encapsulated IP Packet Flow

1054	   Below an example inner IP packet flow within the encapsulating tunnel
1055	   packet stream is shown.  Notice how encapsulated IP packets can start
1056	   and end anywhere, and more than one or less than 1 may occur in a
1057	   single encapsulating packet.

1059	     Offset: 0        Offset: 100    Offset: 2900    Offset: 1400
1060	    [ ESP1  (1500) ][ ESP2  (1500) ][ ESP3  (1500) ][ ESP4  (1500) ]
1061	    [--800--][--800--][60][-240-][--4000----------------------][pad]

1063	                   Figure 3: Inner and Outer Packet Flow

1065	   The encapsulated IP packet flow (lengths include IP header and
1066	   payload) is as follows: an 800 octet packet, an 800 octet packet, a
1067	   60 octet packet, a 240 octet packet, a 4000 octet packet.

1069	   The "BlockOffset" values in the 4 IP-TFS payload headers for this
1070	   packet flow would thus be: 0, 100, 2900, 1400 respectively.  The
1071	   first encapsulating packet ESP1 has a zero "BlockOffset" which points
1072	   at the IP data block immediately following the IP-TFS header.  The
1073	   following packet ESP2s "BlockOffset" points inward 100 octets to the
1074	   start of the 60 octet data block.  The third encapsulating packet
1075	   ESP3 contains the middle portion of the 4000 octet data block so the
1076	   offset points past its end and into the forth encapsulating packet.
1077	   The fourth packet ESP4s offset is 1400 pointing at the padding which
1078	   follows the completion of the continued 4000 octet packet.

1080	Appendix B.  A Send and Loss Event Rate Calculation

1082	   The current best practice indicates that congestion control SHOULD be
1083	   done in a TCP friendly way.  A TCP friendly congestion control
1084	   algorithm is described in [RFC5348].  For this IP-TFS use case (as
1085	   with [RFC4342]) the (fixed) packet size is used as the segment size
1086	   for the algorithm.  The main formula in the algorithm for the send
1087	   rate is then as follows:

1089	                                 1
1090	      X = -----------------------------------------------
1091	          R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2))

1093	   Where "X" is the send rate in packets per second, "R" is the round
1094	   trip time estimate and "p" is the loss event rate (the inverse of
1095	   which is provided by the receiver).

1097	   In addition the algorithm in [RFC5348] also uses an "X_recv" value
1098	   (the receiver's receive rate).  For IP-TFS one MAY set this value
1099	   according to the sender's current tunnel send-rate ("X").

1101	   The IP-TFS receiver, having the RTT estimate from the sender can use
1102	   the same method as described in [RFC5348] and [RFC4342] to collect
1103	   the loss intervals and calculate the loss event rate value using the
1104	   weighted average as indicated.  The receiver communicates the inverse
1105	   of this value back to the sender in the AGGFRAG_PAYLOAD payload
1106	   header field "LossEventRate".

1108	   The IP-TFS sender now has both the "R" and "p" values and can
1109	   calculate the correct sending rate.  If following [RFC5348] the
1110	   sender SHOULD also use the slow start mechanism described therein
1111	   when the IP-TFS SA is first established.

1113	Appendix C.  Comparisons of IP-TFS

1115	C.1.  Comparing Overhead

1117	C.1.1.  IP-TFS Overhead

1119	   The overhead of IP-TFS is 40 bytes per outer packet.  Therefore the
1120	   octet overhead per inner packet is 40 divided by the number of outer
1121	   packets required (fractional allowed).  The overhead as a percentage
1122	   of inner packet size is a constant based on the Outer MTU size.

1124	      OH = 40 / Outer Payload Size / Inner Packet Size
1125	      OH % of Inner Packet Size = 100 * OH / Inner Packet Size
1126	      OH % of Inner Packet Size = 4000 / Outer Payload Size
1127	                        Type  IP-TFS  IP-TFS  IP-TFS
1128	                         MTU     576    1500    9000
1129	                       PSize     536    1460    8960
1130	                      -------------------------------
1131	                          40   7.46%   2.74%   0.45%
1132	                         576   7.46%   2.74%   0.45%
1133	                        1500   7.46%   2.74%   0.45%
1134	                        9000   7.46%   2.74%   0.45%

1136	       Figure 4: IP-TFS Overhead as Percentage of Inner Packet Size

1138	C.1.2.  ESP with Padding Overhead

1140	   The overhead per inner packet for constant-send-rate padded ESP
1141	   (i.e., traditional IPsec TFC) is 36 octets plus any padding, unless
1142	   fragmentation is required.

1144	   When fragmentation of the inner packet is required to fit in the
1145	   outer IPsec packet, overhead is the number of outer packets required
1146	   to carry the fragmented inner packet times both the inner IP overhead
1147	   (20) and the outer packet overhead (36) minus the initial inner IP
1148	   overhead plus any required tail padding in the last encapsulation
1149	   packet.  The required tail padding is the number of required packets
1150	   times the difference of the Outer Payload Size and the IP Overhead
1151	   minus the Inner Payload Size.  So:

1153	     Inner Paylaod Size = IP Packet Size - IP Overhead
1154	     Outer Payload Size = MTU - IPsec Overhead

1156	                   Inner Payload Size
1157	     NF0 = ----------------------------------
1158	            Outer Payload Size - IP Overhead

1160	     NF = CEILING(NF0)

1162	     OH = NF * (IP Overhead + IPsec Overhead)
1163	          - IP Overhead
1164	          + NF * (Outer Payload Size - IP Overhead)
1165	          - Inner Payload Size

1167	     OH = NF * (IPsec Overhead + Outer Payload Size)
1168	          - (IP Overhead + Inner Payload Size)

1170	     OH = NF * (IPsec Overhead + Outer Payload Size)
1171	          - Inner Packet Size

1173	C.2.  Overhead Comparison

1175	   The following tables collect the overhead values for some common L3
1176	   MTU sizes in order to compare them.  The first table is the number of
1177	   octets of overhead for a given L3 MTU sized packet.  The second table
1178	   is the percentage of overhead in the same MTU sized packet.

1180	           Type  ESP+Pad  ESP+Pad  ESP+Pad  IP-TFS  IP-TFS  IP-TFS
1181	         L3 MTU      576     1500     9000     576    1500    9000
1182	          PSize      540     1464     8964     536    1460    8960
1183	        -----------------------------------------------------------
1184	             40      500     1424     8924     3.0     1.1     0.2
1185	            128      412     1336     8836     9.6     3.5     0.6
1186	            256      284     1208     8708    19.1     7.0     1.1
1187	            536        4      928     8428    40.0    14.7     2.4
1188	            576      576      888     8388    43.0    15.8     2.6
1189	           1460      268        4     7504   109.0    40.0     6.5
1190	           1500      228     1500     7464   111.9    41.1     6.7
1191	           8960     1408     1540        4   668.7   245.5    40.0
1192	           9000     1368     1500     9000   671.6   246.6    40.2

1194	                  Figure 5: Overhead comparison in octets

1196	          Type  ESP+Pad  ESP+Pad   ESP+Pad  IP-TFS  IP-TFS  IP-TFS
1197	           MTU      576     1500      9000     576    1500    9000
1198	         PSize      540     1464      8964     536    1460    8960
1199	        -----------------------------------------------------------
1200	            40  1250.0%  3560.0%  22310.0%   7.46%   2.74%   0.45%
1201	           128   321.9%  1043.8%   6903.1%   7.46%   2.74%   0.45%
1202	           256   110.9%   471.9%   3401.6%   7.46%   2.74%   0.45%
1203	           536     0.7%   173.1%   1572.4%   7.46%   2.74%   0.45%
1204	           576   100.0%   154.2%   1456.2%   7.46%   2.74%   0.45%
1205	          1460    18.4%     0.3%    514.0%   7.46%   2.74%   0.45%
1206	          1500    15.2%   100.0%    497.6%   7.46%   2.74%   0.45%
1207	          8960    15.7%    17.2%      0.0%   7.46%   2.74%   0.45%
1208	          9000    15.2%    16.7%    100.0%   7.46%   2.74%   0.45%

1210	           Figure 6: Overhead as Percentage of Inner Packet Size

1212	C.3.  Comparing Available Bandwidth

1214	   Another way to compare the two solutions is to look at the amount of
1215	   available bandwidth each solution provides.  The following sections
1216	   consider and compare the percentage of available bandwidth.  For the
1217	   sake of providing a well understood baseline normal (unencrypted)
1218	   Ethernet as well as normal ESP values are included.

1220	C.3.1.  Ethernet

1222	   In order to calculate the available bandwidth the per packet overhead
1223	   is calculated first.  The total overhead of Ethernet is 14+4 octets
1224	   of header and CRC plus and additional 20 octets of framing (preamble,
1225	   start, and inter-packet gap) for a total of 38 octets.  Additionally
1226	   the minimum payload is 46 octets.

1228	         Size  E + P  E + P  E + P  IPTFS  IPTFS  IPTFS  Enet   ESP
1229	          MTU    590   1514   9014    590   1514   9014   any   any
1230	           OH     74     74     74     78     78     78    38    74
1231	        ------------------------------------------------------------
1232	           40    614   1538   9038     45     42     40    84   114
1233	          128    614   1538   9038    146    134    129   166   202
1234	          256    614   1538   9038    293    269    258   294   330
1235	          536    614   1538   9038    614    564    540   574   610
1236	          576   1228   1538   9038    659    606    581   614   650
1237	         1460   1842   1538   9038   1672   1538   1472  1498  1534
1238	         1500   1842   3076   9038   1718   1580   1513  1538  1574
1239	         8960  11052  10766   9038  10263   9438   9038  8998  9034
1240	         9000  11052  10766  18076  10309   9480   9078  9038  9074

1242	                      Figure 7: L2 Octets Per Packet

1244	        Size  E + P  E + P  E + P  IPTFS  IPTFS  IPTFS  Enet   ESP
1245	         MTU  590    1514   9014   590    1514   9014   any    any
1246	          OH  74     74     74     78     78     78     38     74
1247	       --------------------------------------------------------------
1248	          40  2.0M   0.8M   0.1M   27.3M  29.7M  31.0M  14.9M  11.0M
1249	         128  2.0M   0.8M   0.1M   8.5M   9.3M   9.7M   7.5M   6.2M
1250	         256  2.0M   0.8M   0.1M   4.3M   4.6M   4.8M   4.3M   3.8M
1251	         536  2.0M   0.8M   0.1M   2.0M   2.2M   2.3M   2.2M   2.0M
1252	         576  1.0M   0.8M   0.1M   1.9M   2.1M   2.2M   2.0M   1.9M
1253	        1460  678K   812K   138K   747K   812K   848K   834K   814K
1254	        1500  678K   406K   138K   727K   791K   826K   812K   794K
1255	        8960  113K   116K   138K   121K   132K   138K   138K   138K
1256	        9000  113K   116K   69K    121K   131K   137K   138K   137K

1258	               Figure 8: Packets Per Second on 10G Ethernet

1260	   Size   E + P   E + P   E + P   IPTFS   IPTFS   IPTFS    Enet     ESP
1261	            590    1514    9014     590    1514    9014     any     any
1262	             74      74      74      78      78      78      38      74
1263	  ----------------------------------------------------------------------
1264	     40   6.51%   2.60%   0.44%  87.30%  94.93%  99.14%  47.62%  35.09%
1265	    128  20.85%   8.32%   1.42%  87.30%  94.93%  99.14%  77.11%  63.37%
1266	    256  41.69%  16.64%   2.83%  87.30%  94.93%  99.14%  87.07%  77.58%
1267	    536  87.30%  34.85%   5.93%  87.30%  94.93%  99.14%  93.38%  87.87%
1268	    576  46.91%  37.45%   6.37%  87.30%  94.93%  99.14%  93.81%  88.62%
1269	   1460  79.26%  94.93%  16.15%  87.30%  94.93%  99.14%  97.46%  95.18%
1270	   1500  81.43%  48.76%  16.60%  87.30%  94.93%  99.14%  97.53%  95.30%
1271	   8960  81.07%  83.22%  99.14%  87.30%  94.93%  99.14%  99.58%  99.18%
1272	   9000  81.43%  83.60%  49.79%  87.30%  94.93%  99.14%  99.58%  99.18%

1274	             Figure 9: Percentage of Bandwidth on 10G Ethernet

1276	   A sometimes unexpected result of using IP-TFS (or any packet
1277	   aggregating tunnel) is that, for small to medium sized packets, the
1278	   available bandwidth is actually greater than native Ethernet.  This
1279	   is due to the reduction in Ethernet framing overhead.  This increased
1280	   bandwidth is paid for with an increase in latency.  This latency is
1281	   the time to send the unrelated octets in the outer tunnel frame.  The
1282	   following table illustrates the latency for some common values on a
1283	   10G Ethernet link.  The table also includes latency introduced by
1284	   padding if using ESP with padding.

1286	                        ESP+Pad  ESP+Pad  IP-TFS   IP-TFS
1287	                        1500     9000     1500     9000

1289	                 ------------------------------------------
1290	                    40  1.14 us  7.14 us  1.17 us  7.17 us
1291	                   128  1.07 us  7.07 us  1.10 us  7.10 us
1292	                   256  0.97 us  6.97 us  1.00 us  7.00 us
1293	                   536  0.74 us  6.74 us  0.77 us  6.77 us
1294	                   576  0.71 us  6.71 us  0.74 us  6.74 us
1295	                  1460  0.00 us  6.00 us  0.04 us  6.04 us
1296	                  1500  1.20 us  5.97 us  0.00 us  6.00 us

1298	                         Figure 10: Added Latency

1300	   Notice that the latency values are very similar between the two
1301	   solutions; however, whereas IP-TFS provides for constant high
1302	   bandwidth, in some cases even exceeding native Ethernet, ESP with
1303	   padding often greatly reduces available bandwidth.

1305	Appendix D.  Acknowledgements

1307	   We would like to thank Don Fedyk for help in reviewing and editing
1308	   this work.  We would also like to thank Valery Smyslov for reviews
1309	   and suggestions for improvements as well as Joseph Touch for the
1310	   transport area review and suggested improvements.

1312	Appendix E.  Contributors

1314	   The following people made significant contributions to this document.

1316	      Lou Berger
1317	      LabN Consulting, L.L.C.

1319	      Email: lberger@labn.net

1321	Author's Address

1323	   Christian Hopps
1324	   LabN Consulting, L.L.C.

1326	   Email: chopps@chopps.org