idnits 2.17.1 

draft-welzl-loops-gen-info-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references
     ([I-D.li-tsvwg-loops-problem-opportunities], [I-D.ietf-nvo3-geneve],
     [I-D.ietf-intarea-gue]), which it shouldn't.  Please replace those with
     straight textual mentions of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 841 has weird spacing: '...packets    for...'

  -- The document date (September 03, 2019) is 1696 days in the past.  Is
     this intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-15) exists of
     draft-ietf-tcpm-rack-05

  == Outdated reference: A later version (-16) exists of
     draft-ietf-nvo3-geneve-13

  == Outdated reference: A later version (-09) exists of
     draft-ietf-intarea-gue-07

  == Outdated reference: A later version (-06) exists of
     draft-li-tsvwg-loops-problem-opportunities-03


     Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TSVWG                                                           M. Welzl
3	Internet-Draft                                        University of Oslo
4	Intended status: Standards Track                         C. Bormann, Ed.
5	Expires: March 6, 2020                           Universitaet Bremen TZI
6	                                                      September 03, 2019

8	                     LOOPS Generic Information Set
9	                     draft-welzl-loops-gen-info-01

11	Abstract

13	   LOOPS (Local Optimizations on Path Segments) aims to provide local
14	   (not end-to-end but in-network) recovery of lost packets to achieve
15	   better data delivery in the presence of losses.
16	   [I-D.li-tsvwg-loops-problem-opportunities] provides an overview over
17	   the problems and optimization opportunities that LOOPS could address.

19	   The present document is a strawman for the set of information that
20	   would be interchanged in a LOOPS protocol, without already defining a
21	   specific data packet format.

23	   The generic information set needs to be mapped to a specific
24	   encapsulation protocol to actually run the LOOPS optimizations.  The
25	   current version of this document contains sketches of bindings to GUE
26	   [I-D.ietf-intarea-gue] and Geneve [I-D.ietf-nvo3-geneve].

28	Status of This Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at https://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on March 6, 2020.

45	Copyright Notice

47	   Copyright (c) 2019 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (https://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Table of Contents

62	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
63	     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   5
64	   2.  Challenges  . . . . . . . . . . . . . . . . . . . . . . . . .   6
65	     2.1.  No Access to End-to-End Transport Information . . . . . .   6
66	     2.2.  Path Asymmetry  . . . . . . . . . . . . . . . . . . . . .   6
67	     2.3.  Reordering vs. Spurious Retransmission  . . . . . . . . .   6
68	     2.4.  Informing the End-to-End Transport  . . . . . . . . . . .   7
69	     2.5.  Congestion Detection  . . . . . . . . . . . . . . . . . .   8
70	   3.  Simplifying assumptions . . . . . . . . . . . . . . . . . . .   8
71	   4.  LOOPS Generic Information Set . . . . . . . . . . . . . . . .   9
72	     4.1.  Setup Information . . . . . . . . . . . . . . . . . . . .   9
73	     4.2.  Forward Information . . . . . . . . . . . . . . . . . . .   9
74	     4.3.  Reverse Information . . . . . . . . . . . . . . . . . . .  10
75	   5.  LOOPS General Operation . . . . . . . . . . . . . . . . . . .  11
76	     5.1.  Initial Packet Sequence Number  . . . . . . . . . . . . .  11
77	     5.2.  Acknowledgement Generation  . . . . . . . . . . . . . . .  11
78	     5.3.  Measurement . . . . . . . . . . . . . . . . . . . . . . .  12
79	     5.4.  Loss detection and Recovery . . . . . . . . . . . . . . .  12
80	       5.4.1.  Local Retransmission  . . . . . . . . . . . . . . . .  12
81	       5.4.2.  FEC . . . . . . . . . . . . . . . . . . . . . . . . .  13
82	     5.5.  Discussion  . . . . . . . . . . . . . . . . . . . . . . .  13
83	   6.  Sketches of Bindings to Tunnel Protocols  . . . . . . . . . .  13
84	     6.1.  Embedding LOOPS in Geneve . . . . . . . . . . . . . . . .  14
85	     6.2.  Embedding LOOPS in GUE  . . . . . . . . . . . . . . . . .  14
86	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
87	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
88	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  15
89	   Appendix A.  Protocol used in Prototype Implementation  . . . . .  16
90	     A.1.  Block Code FEC  . . . . . . . . . . . . . . . . . . . . .  17
91	   Appendix B.  Transparent mode . . . . . . . . . . . . . . . . . .  18
92	     B.1.  Packet identification . . . . . . . . . . . . . . . . . .  19
93	     B.2.  Generic information and protocol operation  . . . . . . .  20
94	     B.3.  A hybrid mode . . . . . . . . . . . . . . . . . . . . . .  20
95	   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  22
96	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  22

98	1.  Introduction

100	   Today's networks exhibit a wide variety of data rates and, relative
101	   to those, processing power and memory capacities of nodes acting as
102	   routers.  For instance, networks that employ tunneling to build
103	   overlay networks may position powerful virtual router nodes in the
104	   network to act as tunnel endpoints.  The capabilities available in
105	   the more powerful cases provide new opportunities for optimizations.

107	   LOOPS (Local Optimizations on Path Segments) aims to provide local
108	   (not end-to-end but in-network) recovery of lost packets to achieve
109	   better data delivery.  [I-D.li-tsvwg-loops-problem-opportunities]
110	   provides an overview over the problems and optimization opportunities
111	   that LOOPS could address.  One simplifying assumption (Section 3) in
112	   the present document is that LOOPS segments operate independently
113	   from each other, each as a pair of a LOOPS Ingress and a LOOPS Egress
114	   node.

116	   The present document is a strawman for the set of information that
117	   would be interchanged in a LOOPS protocol between these nodes,
118	   without already defining a specific data packet format.  The main
119	   body of the document defines a mode of the LOOPS protocol that is
120	   based on traditional tunneling, the "tunnel mode".  Appendix B is an
121	   even rougher strawman of a radically different, alternative mode that
122	   we call "transparent mode", as well as a slightly more conventional
123	   "hybrid mode" (Appendix B.3).  These different modes may be
124	   applicable to different usage scenarios and will be developed in
125	   parallel, with a view of ultimately standardizing one or more of
126	   them.

128	   For tunnel mode, the generic information set needs to be mapped to a
129	   specific encapsulation protocol to actually run the LOOPS
130	   optimizations.  LOOPS is not tied to any specific overlay protocol,
131	   but is meant to run embedded into a variety of tunnel protocols.
132	   LOOPS information is added as part of a tunnel protocol header at the
133	   LOOPS ingress as shown in Figure 1.  The current version of this
134	   document contains sketches of bindings to GUE [I-D.ietf-intarea-gue]
135	   and Geneve [I-D.ietf-nvo3-geneve].

137	             +------------------------------------+
138	             |           Outer header             |
139	             +------------------------------------+
140	           / |         Tunnel Base Header         |
141	         /   +------------------------------------+\
142	    Tunnel   |    +-------------------------+     | \
143	    Header   ~    |    LOOPS Information    |     ~  Tunnel Header
144	         \   |    +-------------------------+     |  Extensions
145	           \ +------------------------------------+ /
146	             |           Data packet              |
147	             +------------------------------------+

149	             Figure 1: Packet in Tunnel with LOOPS Information

151	   Figure 2 is extracted from the LOOPS problems and opportunities
152	   document [I-D.li-tsvwg-loops-problem-opportunities].  It illustrates
153	   the basic architecture and terms of the applicable scenario of LOOPS.
154	   Not all of the concepts introduced in the problems and opportunities
155	   document are actually used in the current strawman specification;
156	   Section 3 lays out some simplifying assumptions that the present
157	   proposal makes.

159	                                                      ON=overlay node
160	                                                      UN=underlay node

162	   +---------+                                               +---------+
163	   |   App   | <---------------- end-to-end ---------------> |   App   |
164	   +---------+                                               +---------+
165	   |Transport| <---------------- end-to-end ---------------> |Transport|
166	   +---------+                                               +---------+
167	   |         |                                               |         |
168	   |         |        +--+  path  +--+  path segment2  +--+  |         |
169	   |         |        |  |<-seg1->|  |<--------------> |  |  |         |
170	   | Network |  +--+  |ON|  +--+  |ON|  +--+   +----+  |ON|  | Network |
171	   |         |--|UN|--|  |--|UN|--|  |--|UN|---| UN |--|  |--|         |
172	   +---------+  +--+  +--+  +--+  +--+  +--+   +----+  +--+  +---------+
173	     End Host                                                  End Host
174	                       <--------------------------------->
175	                        LOOPS domain: path segments enabling
176	                        optimization for local in-network recovery

178	                      Figure 2: LOOPS Usage Scenario

180	1.1.  Terminology

182	   This document makes use of the terminology defined in
183	   [I-D.li-tsvwg-loops-problem-opportunities].  This section defines
184	   additional terminology used by this document.

186	   Data packets:  The payload packets that enter and exit a LOOPS
187	      segment.

189	   LOOPS Segment:  A part of an end-to-end path covered by a single
190	      instance of the LOOPS protocol, the sub-path between the LOOPS
191	      Ingress and the LOOPS Egress.

193	   LOOPS Ingress:  The node that forwards data packets and forward
194	      information into the LOOPS segment, potentially performing
195	      retransmission and forward error correction based on
196	      acknowledgements and measurements received from the LOOPS Egress.

198	   LOOPS Egress:  The node that receives the data packets and forward
199	      information from the LOOPS ingress, sends acknowledgements and
200	      measurements back to the LOOPS ingress (reverse information),
201	      potentially recovers data packets from forward error correction
202	      information received.

204	   LOOPS Nodes:  Collective term for LOOPS Ingress and LOOPS Egress in a
205	      LOOPS Segment.

207	   Forward Information:  Information that is added to the stream of data
208	      packets in the forward direction by the LOOPS Ingress.

210	   Reverse Information:  Information that flows in the reverse
211	      direction, from the LOOPS Egress back to the LOOPS Ingress.

213	   Setup Information:  Information that is not transferred as part of
214	      the Forward or Reverse Information, but is part of the setup of
215	      the LOOPS Nodes.

217	   PSN:  Packet Sequence Number, a sequence number identifying a data
218	      packet.

220	   Sender:  Original sender of a packet on an end-to-end path that
221	      includes one or more LOOPS segment(s).

223	   Receiver:  Ultimate receiver of a packet on an end-to-end path that
224	      includes one or more LOOPS segment(s).

226	2.  Challenges

228	   LOOPS has to perform well in the presence of some challenges, which
229	   are discussed in this section.

231	2.1.  No Access to End-to-End Transport Information

233	   LOOPS is defined to be independent of the content of the packets
234	   being forwarded: there is no dependency on transport-layer or higher
235	   information.  The intention is to keep LOOPS useful with a traffic
236	   mix that may contain encrypted transport protocols such as QUIC as
237	   well as encrypted VPN traffic.

239	2.2.  Path Asymmetry

241	   A LOOPS segment is defined as a unidirectional forwarding path.  The
242	   tunnel might be shared with a LOOPS segment in the inverse direction;
243	   this then allows to piggyback Reverse Information on encapsulated
244	   packets on that segment.  But there is no guarantee that the inverse
245	   direction of any end-to-end-path crosses that segment, so the LOOPS
246	   optimizations have to be useful on their own in each direction.

248	2.3.  Reordering vs. Spurious Retransmission

250	   The end-to-end transport layer protocol may have its own
251	   retransmission mechanism to recover lost packets.  When LOOPS
252	   recovers a loss, ideally this local recovery would avoid the
253	   triggering of a retransmission at the end-to-end sender.

255	   Whether this is possible depends on the specific end-to-end mechanism
256	   used for triggering retransmission.  When end-to-end retransmission
257	   is triggered by receiving a sequence of duplicate acknowledgements
258	   (DUPACKs), and with more than a few packets in flight, the recovered
259	   packet is likely to be too late to fill the hole in the sequence
260	   number space that triggers the DUPACK detection.

262	   (Given a reasonable setting of parameters, the local retransmission
263	   will still arrive earlier than the end-to-end retransmission and will
264	   possibly unblock application processing earlier; with spurious
265	   retransmission detection, there also will be little long-term effect
266	   on the send rate.)

268	   The waste of bandwidth caused by a DUPACK-based end-to-end
269	   retransmission can be avoided when the end-to-end loss detection is
270	   based on time instead of sequence numbers, e.g., with RACK
271	   [I-D.ietf-tcpm-rack].  This requires a limit on the additional
272	   latency that LOOPS will incur in its attempt to recover the loss
273	   locally.  In the present version of this document, opportunity to set
274	   such a limit is provided in the Setup Information.  The limit can be
275	   used to compute a deadline for retransmission, but also can be used
276	   to choose FEC parameters that keep extra latency low.

278	2.4.  Informing the End-to-End Transport

280	   Congestion control at the end-to-end sender is used to adapt its
281	   sending rate to the network congestion status.  In typical TCP
282	   senders, packet loss implies congestion and leads to a reduction in
283	   sending rate.  With LOOPS operating, packet loss can be masked from
284	   the sender as the loss may have been locally recovered.  In this
285	   case, rate reduction may not be invoked at the sender.  This is a
286	   desirable performance improvement if the loss was a random loss.

288	   If LOOPS successfully conceals congestion losses from the end-to-end
289	   transport protocol, that might increase the rate to a level that
290	   congests the LOOPS segment, or that causes excessive queueing at the
291	   LOOPS ingress.  What LOOPS should be able to achieve is to let the
292	   end host sender invoke the rate reduction mechanism when there is a
293	   congestion loss no matter if the lost packet was recovered locally.

295	   As with any tunneling protocol, information about congestion events
296	   inside the tunnel needs to be exported to the end-to-end path the
297	   tunnel is part of.  See e.g., [RFC6040] for a discussion of how to do
298	   this in the presence of ECN.  A more recent draft,
299	   [I-D.ietf-tsvwg-tunnel-congestion-feedback], proposes to activate ECN
300	   for the tunnel regardless of whether the end-to-end protocol signals
301	   the use of an ECN-capable transport (ECT), which requires more
302	   complicated action at the tunnel egress.

304	   A sender that interprets reordering as a signal of packet loss
305	   (DUPACKs) initiates a retransmission and reduces the sending rate.
306	   When spurious retransmission detection (e.g., via F-RTO [RFC5862] or
307	   DSACK [RFC3708] is enabled by the TCP sender, it will often be able
308	   undo the unnecessary window reduction.  As LOOPS recovers lost
309	   packets locally, in most cases the end host sender will eventually
310	   find out its reordering-based retransmission (if any) is spurious.
311	   This is an appropriate performance improvement if the loss was a
312	   random loss.  For congestion losses, a congestion event needs to be
313	   signaled to the end-to-end transport.

315	   If the end-to-end transport is ECN-capable (which is visible at the
316	   IP level), congestion loss events can easily be signaled to them by
317	   setting the CE (congestion experienced) mark.  If LOOPS detects a
318	   congestion loss for a non-ECT packet, it needs to signal a congestion
319	   loss event by introducing a packet loss.  This can be done by
320	   choosing not to retransmit or repair the packet loss locally in this
321	   case.  Note that one congestion loss per end-to-end RTT is sufficient
322	   to provide the rate reduction, so LOOPS may still be able to recover
323	   most packets, in particular for burst losses.  (As LOOPS does not
324	   interact with the end-to-end transport, it does not know the end-to-
325	   end RTT.  Some lower bound derived from configuration and
326	   measurements could be used instead.)

328	2.5.  Congestion Detection

330	   Properly informing the end-to-end transport protocol about congestion
331	   loss events requires distinguishing these from random losses.  In
332	   some special cases, distinguishing information may be available from
333	   a link layer (e.g., see Section 3 of
334	   [I-D.li-tsvwg-loops-problem-opportunities]).  By enabling ECN inside
335	   the tunnel, congestion events experienced at ECN-capable routers will
336	   usually be identified by the CE mark, which clearly rules out a
337	   random loss.

339	   In the general case, the segment may be composed of hops without such
340	   special indications.  In these cases, some detection mechanism is
341	   required to provide this distinguishing information.  The specific
342	   mechanism used by an implementation is out of scope of LOOPS, but
343	   LOOPS will need to provide measurement information for this
344	   mechanism.  For instance, congestion detection might be based on path
345	   segment latency information, the proper measurement of which
346	   therefore requires special attention in LOOPS.

348	3.  Simplifying assumptions

350	   The above notwithstanding, Implementations may want to make use of
351	   indicators such as transport layer port numbers to partition a tunnel
352	   flow into separate application flows, e.g., for active queue
353	   management (AQM).  Any such functionality is orthogonal to the LOOPS
354	   protocol itself and thus out of scope for the present document.

356	   One observation that simplifies the design of LOOPS in comparison to
357	   that of a reliable transport protocol is that LOOPS does not _have_
358	   to recover every packet loss.  Therefore, probabilistic approaches,
359	   and simply giving up after some time has elapsed, can simplify the
360	   protocol significantly.

362	   For now, we assume that LOOPS segments that may line up on an end-to-
363	   end path operate independently of each other.  Since the objective of
364	   LOOPS ultimately is to assist the end-to-end protocol, it is likely
365	   that some cooperation between them would be beneficial, e.g., to
366	   obtain some measurements that cover a larger part of the end-to-end
367	   path.  For instance, cooperating LOOPS segments could try to divide
368	   up permissible increases to end-to-end latency between them.  This is
369	   out of scope for the present version.

371	   Another simplifying assumption is that LOOPS nodes have reasonably
372	   precise absolute time available to them, so there is no need to
373	   burden the LOOPS protocol with time synchronization.  How this is
374	   achieved is out of scope.

376	   LOOPS nodes are created and set up (information about their peers,
377	   parameters) by some control plane mechanism that is out of scope for
378	   this specification.  This means there is no need in the LOOPS
379	   protocol itself to manage setup information.

381	4.  LOOPS Generic Information Set

383	   This section sketches a generic information set for the LOOPS
384	   protocol.  Entries marked with (*) are items that may not be
385	   necessary and probably should be left out of an initial
386	   specification.

388	4.1.  Setup Information

390	   Setup Information might include:

392	   o  encapsulation protocol in use, and its vital parameters

394	   o  identity of LOOPS ingress and LOOPS egress; information relevant
395	      for running the encapsulation protocol such as port numbers

397	   o  target maximum latency increase caused by the operation of LOOPS
398	      on this segment

400	   o  maximum retransmission count (*)

402	   In the data plane, we have forward information (information added to
403	   each data packet) and reverse information.  The latter can be sent in
404	   separate packets (e.g., Geneve control-only packets
405	   [I-D.ietf-nvo3-geneve]) and/or piggybacked like the forward
406	   information.

408	4.2.  Forward Information

410	   In the forward information, we have identified:

412	   o  tunnel type (a few bits, meaning agreed between Ingress and
413	      Egress)

415	   o  packet sequence number PSN (20+ bits), counting the LOOPS packets
416	      transmitted by the LOOPS ingress (i.e., retransmissions receive a
417	      new PSN)

419	   o  an "ACK desirable" flag (one bit, usually set for a certain
420	      percentage of the data packets only)

422	   o  anything that the FEC scheme needs.

424	   The first four together (say, 3+24+4+1) might even fit into 32 bits,
425	   but probably need up to 48 bits total.  FEC info of course often
426	   needs more space.

428	   (Note that in this proposal there is no timestamp in the forward
429	   information; see Section 5.3.)

431	   24 bits of PSN, minus one bit for sequence number arithmetic, gives 8
432	   million packets (or 2.4 GB at typical packet sizes) per worst-case
433	   RTT.  So if that is, say, 30 seconds, this would be enough to fill
434	   640 Mbit/s.

436	4.3.  Reverse Information

438	   For the reverse information, we have identified:

440	   o  one optional block 1, possibly repeated:

442	   o  PSN being acknowledged

444	   o  absolute time of reception for the packet acknowledged (PSN)

446	   o  one optional block 2, possibly repeated:

448	   o  an ACK bitmap (based on PSN), always starting at a multiple of 8

450	   o  a delta indicating the end PSN of the bitmap (actually the first
451	      PSN that is beyond it), using (Acked-PSN & ~7) + 8*(delta+1) as
452	      the end of the bitmap.  Acked-PSN in that formula is the previous
453	      block 1 PSN seen in this packet, or 0 if none so far.

455	   Block 1 and Block 2 can be interspersed and repeated.  They can be
456	   piggybacked on a reverse direction data packet or sent separately if
457	   none occurs within some timeout.  They will usually be aggregated in
458	   some useful form.  Block 1 information sets are only returned for
459	   packets that have "ACK desirable" set.  Block 2 information is sent
460	   by the receiver based on some saturation scheme (e.g., at least three
461	   copies for each PSN span over time).  Still, it might be possible to
462	   go down to 1 or 2 amortized bytes per forward packet spent for all
463	   this.

465	   The latency calculation is done by the sender, who occasionally sets
466	   "ACK desirable", and notes down the absolute time of transmission for
467	   this data packet (the timekeeping can be done quite efficiently as
468	   deltas).  Upon reception of a block 1 ACK, it can then subtract that
469	   from the absolute time of reception indicated.  This assumes time
470	   synchronization between the nodes is at least as good as the
471	   precision of latency measurement needed, which should be no problem
472	   with IEEE 1588 PTP synchronization (but could be if using NTP-based
473	   synchronization only).  A sender can freely garbage collect noted
474	   down transmission time information; doing this too early just means
475	   that the quality of the RTT sampling will reduce.

477	5.  LOOPS General Operation

479	   In the Tunnel Mode described in the main body of this document, LOOPS
480	   information is carried by some tunnel encapsulation.

482	5.1.  Initial Packet Sequence Number

484	   There is no connection establishment procedure in LOOPS.  The initial
485	   PSN is assigned unilaterally by the LOOPS Ingress.

487	   Because of the short time that is usually set in the maximum latency
488	   increase, there is little damage from a collision of PSNs with
489	   packets still in flight from previous instances of LOOPS.

491	   Collisions can be minimized by assigning initial PSNs randomly, or
492	   using stable storage.  Random assignment is more useful for longer
493	   PSNs, where the likelihood of overlap will be low.  The specific way
494	   a LOOPS ingress uses stable storage is a local matter and thus out of
495	   scope.  (Implementation note: this can be made to work similar to
496	   secure nonce generation with write attenuation: Say, every 10000
497	   packets, the sender notes down the PSN into stable storage.  After a
498	   reboot, it reloads the PSN and adds 10000 in sequence number
499	   arithmetic [RFC1982], plus maybe another 10000 so the sender does not
500	   have to wait for the store operation to succeed before sending more
501	   packets.)

503	5.2.  Acknowledgement Generation

505	   A data packet forwarded by the LOOPS ingress always carries PSN
506	   information.  The LOOPS egress uses the largest newly received PSN
507	   with the "ACK desired" bit as the ACK number in the block 1 part of
508	   the acknowledgement.  This means that the LOOPS ingress gets to
509	   modulate the number of acknowledgement sent by the LOOPS egress.
510	   However, whenever an out-of-order packet arrives while there still
511	   are "holes" in the PSNs received, the LOOPS receiver should generate
512	   a block 2 acknowledgement immediately that the LOOPS sender can use
513	   as a NACK list.

515	   Reverse information can be piggybacked in a reverse direction data
516	   packet.  When the reverse direction has no user data to be sent, a
517	   pure reverse information packet needs to be generated.  This may be
518	   based on a short delay during which the LOOPS egress waits for a data
519	   packet to piggyback on.  (To reduce MTU considerations, the egress
520	   could wait for less-than-full data packets.)

522	5.3.  Measurement

524	   When sending a block 1 acknowledgement, the LOOPS egress indicates
525	   the absolute time of reception of the packet.  The LOOPS ingress can
526	   subtract the absolute time of transmission that it still has
527	   available, resulting in one high quality latency sample.  (In an
528	   alternative design, the forward information could include the
529	   absolute time of transmission as well, and block1 information would
530	   echo it back.  This trades memory management at the ingress for
531	   increased bandwidth and MTU reduction.)

533	   The LOOPS ingress can also use the time of reception of the block 1
534	   acknowledgement to obtain a segment RTT sample.  Note that this will
535	   include any wait time the LOOPS egress incurs while waiting for a
536	   piggybacking opportunity -- this is appropriate, as all uses of an
537	   RTT will be for keeping a retransmission timeout.

539	   To maintain quality of information during idle times, the LOOPS
540	   ingress may send keepalive packets, which are discarded at the LOOPS
541	   egress after sending acknowledgements.  The indication that a packet
542	   is a keepalive packet is dependent on the encapsulation protocol.

544	5.4.  Loss detection and Recovery

546	   There are two ways for LOOPS local recovery, retransmission and FEC.

548	5.4.1.  Local Retransmission

550	   When retransmission is used as recovery mechanism, the LOOPS ingress
551	   detects a packet loss by receiving a NACK or by local timeout (using
552	   a RTO value that might be calculated as in [RFC6298]).  It might
553	   employ a DUPACK-like or a RACK-like mechanism for delayed reaction to
554	   a NACK.

556	   When a retransmission is desired (see Section 2.4 for why it might
557	   not be), the LOOPS ingress performs the local in-network recovery by
558	   retransmitting the packet.  Further retransmissions may be desirable
559	   if the NACK is persistent beyond an RTO, as long as the maximum
560	   latency increase is not reached.

562	5.4.2.  FEC

564	   FEC is another way to perform local recovery.  When FEC is in use, a
565	   FEC header is sent with data packets as well as with special repair
566	   packets added to the flow.  The specific FEC scheme used could be
567	   defined in the Setup Information, using a mechanism like [RFC5052].
568	   The FEC rate (amount of redundancy added) and possibly the FEC scheme
569	   could be unilaterally adjusted by the LOOPS ingress in an adaptive
570	   mechanism based on the measurement information.

572	5.5.  Discussion

574	   Without progress in the way that end-host transport protocols handle
575	   reordering, LOOPS will be unable to prevent end-to-end
576	   retransmissions that duplicate effort that is spent in local
577	   retransmissions.  It depends on parameters of the path segment
578	   whether this wasted effort is significant or not.

580	   One remedy against this waste could be the introduction of
581	   resequencing at the LOOPS Egress node.  This increases overall mean
582	   packet latency, but does not always increase actual end-to-end data
583	   stream latency if a head-of-line blocking transport such as TCP is in
584	   use.  For applications with a large percentage of legacy TCP end-
585	   hosts and sufficient processing capabilities at the LOOPS Egress
586	   node, resequencing may be a viable choice.  Note that resequencing
587	   could be switched off and on depending on some measurement
588	   information.

590	   To enable resequencing at the LOOPS Egress, a packet numbering scheme
591	   is needed that allows the LOOPS Egress to reconstruct the sequence at
592	   the LOOPS ingress.  This could be done by reverting to a traditional
593	   packet sequence number counting incoming data packets, possibly
594	   combined with a "retransmission" bit that indicates that the specific
595	   LOOPS packet is a retransmission and not the original transmission.
596	   (The acknowledgement/measurement ambiguity could be further reduced
597	   by adding transmission counter TC that counts transmission/
598	   retransmission for this PSN; a few bits should be enough for the
599	   limited retransmission envisaged.)

601	6.  Sketches of Bindings to Tunnel Protocols

603	   The LOOPS information defined above in a generic way can be mapped to
604	   specific tunnel encapsulation protocols.  Sketches for two tunnel
605	   protocols are given below: Geneve (Section 6.1), and GUE
606	   (Section 6.2).  The actual encapsulation can be designed in a
607	   "native" way by putting each of the various elements into the TLV
608	   format of the encapsulation protocol, or it can be achieved by
609	   providing single TLVs for forward and reverse information and using
610	   some generic encoding of both kinds of information as shown in
611	   Appendix B.3.

613	6.1.  Embedding LOOPS in Geneve

615	   Geneve [I-D.ietf-nvo3-geneve] is an extensible overlay protocol which
616	   can embed LOOPS functions.  Geneve uses TLVs to carry optional
617	   information between NVEs.  NVE is logically the same entity as the
618	   LOOPS node.

620	   For Geneve, a new LOOPS TLV needs to be defined and its format needs
621	   to be consistent with LOOPS generic information in Section 4.  When
622	   the Geneve LOOPS TLV is put in forward information, NVEs should be
623	   able to process it.  Any settings needed can be provided in the Setup
624	   Information.

626	   In the reverse direction, when no data packets are available for
627	   piggybacking, a control only packet will be used to carry the LOOPS
628	   reverse information.  Such a control only packet sets the 'O' bit in
629	   the Geneve header and has no real user data.

631	   VNI is a mandatory field in Geneve base header.  The LOOPS TLV should
632	   function on the tunnel between two NVEs without looking at the VNI
633	   value.  The LOOPS PSN number space is local to the overlay tunnel
634	   regardless of the VNI inside.  At the ingress NVE, there are
635	   different ways to decide whether a packet should go to LOOPS enabled
636	   tunnel, e.g. by protocol number (TCP/UDP certain ports) or by VNI.

638	6.2.  Embedding LOOPS in GUE

640	   GUE [I-D.ietf-intarea-gue] is an extensible overlay protocol which
641	   can embed LOOPS functions.  GUE uses flags to indicate the presence
642	   of fixed length header extensions.  It also allows variable length
643	   extensions to be put in "Private data" field.  A new LOOPS data block
644	   in the "private data" field needs to be defined based on the LOOPS
645	   generic information in Section 4.

647	   In the reverse direction, when no data packets are available for
648	   piggybacking, LOOPS reverse information is carried in a control
649	   message with the C-bit set in the GUE header.  The Proto/ctype field
650	   contains a control message type when C bit is set.  Hence a new
651	   control message type should be defined for such LOOPS reverse
652	   information.

654	7.  IANA Considerations

656	   No IANA action is required at this stage.  When a LOOPS
657	   representation is designed for a specific tunneling protocol, new
658	   codepoints will be required in the registries that pertain to that
659	   protocol.

661	8.  Security Considerations

663	   To be defined.

665	9.  Informative References

667	   [RFC1982]  Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982,
668	              DOI 10.17487/RFC1982, August 1996,
669	              <https://www.rfc-editor.org/info/rfc1982>.

671	   [RFC3708]  Blanton, E. and M. Allman, "Using TCP Duplicate Selective
672	              Acknowledgement (DSACKs) and Stream Control Transmission
673	              Protocol (SCTP) Duplicate Transmission Sequence Numbers
674	              (TSNs) to Detect Spurious Retransmissions", RFC 3708,
675	              DOI 10.17487/RFC3708, February 2004,
676	              <https://www.rfc-editor.org/info/rfc3708>.

678	   [RFC5052]  Watson, M., Luby, M., and L. Vicisano, "Forward Error
679	              Correction (FEC) Building Block", RFC 5052,
680	              DOI 10.17487/RFC5052, August 2007,
681	              <https://www.rfc-editor.org/info/rfc5052>.

683	   [RFC5862]  Yasukawa, S. and A. Farrel, "Path Computation Clients
684	              (PCC) - Path Computation Element (PCE) Requirements for
685	              Point-to-Multipoint MPLS-TE", RFC 5862,
686	              DOI 10.17487/RFC5862, June 2010,
687	              <https://www.rfc-editor.org/info/rfc5862>.

689	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
690	              Notification", RFC 6040, DOI 10.17487/RFC6040, November
691	              2010, <https://www.rfc-editor.org/info/rfc6040>.

693	   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
694	              "Computing TCP's Retransmission Timer", RFC 6298,
695	              DOI 10.17487/RFC6298, June 2011,
696	              <https://www.rfc-editor.org/info/rfc6298>.

698	   [RFC6330]  Luby, M., Shokrollahi, A., Watson, M., Stockhammer, T.,
699	              and L. Minder, "RaptorQ Forward Error Correction Scheme
700	              for Object Delivery", RFC 6330, DOI 10.17487/RFC6330,
701	              August 2011, <https://www.rfc-editor.org/info/rfc6330>.

703	   [RFC8610]  Birkholz, H., Vigano, C., and C. Bormann, "Concise Data
704	              Definition Language (CDDL): A Notational Convention to
705	              Express Concise Binary Object Representation (CBOR) and
706	              JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610,
707	              June 2019, <https://www.rfc-editor.org/info/rfc8610>.

709	   [I-D.ietf-tcpm-rack]
710	              Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK:
711	              a time-based fast loss detection algorithm for TCP",
712	              draft-ietf-tcpm-rack-05 (work in progress), April 2019.

714	   [I-D.ietf-nvo3-geneve]
715	              Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic
716	              Network Virtualization Encapsulation", draft-ietf-
717	              nvo3-geneve-13 (work in progress), March 2019.

719	   [I-D.ietf-intarea-gue]
720	              Herbert, T., Yong, L., and O. Zia, "Generic UDP
721	              Encapsulation", draft-ietf-intarea-gue-07 (work in
722	              progress), March 2019.

724	   [I-D.li-tsvwg-loops-problem-opportunities]
725	              Yizhou, L., Zhou, X., Boucadair, M., and J. Wang, "LOOPS
726	              (Localized Optimizations on Path Segments) Problem
727	              Statement and Opportunities for Network-Assisted
728	              Performance Enhancement", draft-li-tsvwg-loops-problem-
729	              opportunities-03 (work in progress), July 2019.

731	   [I-D.ietf-tsvwg-tunnel-congestion-feedback]
732	              Wei, X., Yizhou, L., Boutros, S., and L. Geng, "Tunnel
733	              Congestion Feedback", draft-ietf-tsvwg-tunnel-congestion-
734	              feedback-07 (work in progress), May 2019.

736	Appendix A.  Protocol used in Prototype Implementation

738	   This appendix describes, in a somewhat abstracted form, the protocol
739	   as used in a prototype implementation, as described by Yizhou Li, and
740	   Xingwang Zhou.

742	   The prototype protocol can be run in one of two modes (defined by
743	   preconfiguration):

745	   o  Retransmission mode

747	   o  Forward Error Correction (FEC) mode

749	   Forward information is piggybacked in data packets.

751	   Reverse information can be carried in a pure acknowledgement packet
752	   or piggybacked when carrying packets for the inverse direction.

754	   The forward information includes:

756	   o  Packet Sequence Number (PSN) (32 bits): This identifies a packet
757	      over a specific overlay segment from a specific LOOPS Ingress.  If
758	      a packet is retransmitted by LOOPS, the retransmission uses the
759	      original PSN.

761	   o  Timestamp (32 bits): Information, in a format local to the LOOPS
762	      ingress, that provides the time when the packet was sent.  In the
763	      current implementation, a 32-bit unsigned value specifying the
764	      time delta in some granularity from the epoch time to the sending
765	      time of the packet carrying this timestamp.  The granularity can
766	      be from 1 ms to 1 second.  The epoch time follows the current TCP
767	      practice which is 1 January 1970 00:00:00 UTC.  Note that a
768	      retransmitted packet uses its own Timestamp.

770	   o  FEC Info for Block Code (56 bits): This header is used in FEC
771	      mode.  It currently only provides for a block code FEC scheme.  It
772	      includes the Source Block Number (SBN), Encoding Symbol ID (ESI),
773	      number of symbols in a single source block and symbol size.
774	      Appendix A.1 gives more details on FEC.

776	   The reverse information includes:

778	   o  ACK Number (32 bits): The largest (in sequence number arithmetic
779	      [RFC1982]) PSN received so far.

781	   o  NACK List (variable): This indicates an array of PSN numbers to
782	      describe the PSN "holes" preceding the ACK number.  It
783	      conceptually lists the PSNs of every packet perceived as lost by
784	      the LOOPS egress.  In actual use, it is truncated.

786	   o  Echoed Timestamp (32 bits): The timestamp received with the packet
787	      being acknowledged.

789	A.1.  Block Code FEC

791	   The prototype currently uses a block code FEC scheme (RaptorQ
792	   [RFC6330]).  The fields in the FEC Info forward information are:

794	   o  Source Block Number (SBN): 16 bits.  An integer identifier for the
795	      source block that the encoding symbols within the packet relate
796	      to.

798	   o  Encoding Symbol ID (ESI): 16 bits.  An integer identifier for the
799	      encoding symbols within the packet.

801	   o  K: 8 bits.  Number of symbols in a single source block.

803	   o  T: 16 bits.  Symbol size in bytes.

805	   The LOOPS Ingress uses the data packet in Figure 1 to generate the
806	   encoding packet.  Both source packets and repair packets carry the
807	   FEC header information; the LOOPS Egress reconstructs the data
808	   packets from both kinds of packets.  The LOOPS Egress currently
809	   resequences the forwarded and reconstructed packets, so they are
810	   passed on in-order when the lost packets are recoverable within the
811	   source block.

813	   The LOOPS Nodes need to agree on the use of FEC block mode and on the
814	   specific FEC Encoding ID to use; this is currently done by
815	   configuration.

817	Appendix B.  Transparent mode

819	   This appendix defines a very different way to provide the LOOPS
820	   services, "transparent mode".  (We call the protocol described in the
821	   main body of the document "encapsulated mode".)

823	   In transparent mode, the idea is that LOOPS does not meddle with the
824	   forward transmission of data packets, but runs on the side exchanging
825	   additional information.

827	   An implementation could be based on conventional forwarding switches
828	   that just provide a copy of the ingress and egress packet stream to
829	   the LOOPS implementations.  The LOOPS process would occasionally
830	   inject recovered packets back into the LOOPS egress node's forwarding
831	   switch, see Figure 3.

833	              |
834	      +-------+-------------------------------------------+
835	      |       |                                           |
836	      |  +----+--------+   +-------------------+          |
837	      |  |    | copy   |   |                   |          |
838	      |  |    |----------------> LOOPS ingress |          |
839	      |  |    |        |   |     |     ^       |          |
840	      |  +----+--------+   +-----|-----|-------+          |
841	      |   data|packets    forward|     |reverse           |
842	      |       |              info|     |info              |
843	      +-------+------------------|-----|------------------+
844	              |                  |     |
845	      +-------+------------------|-----|------------------+
846	      |       |                  |     |                  |
847	      |  +----+---------+   +----|-----|----------+       |
848	      |  |    | copy    |   |    v     |          |       |
849	      |  |    |---------|---|---> LOOPS egress    |       |
850	      |  |    |         |   |                     |       |
851	      |  |    |<--------|---|---- inject          |       |
852	      |  +----+---------+   +---------------------+       |
853	      |       |                                           |
854	      +-------+-------------------------------------------+
855	              |
856	              v

858	                     Figure 3: LOOPS Transparent Mode

860	   The obvious advantage of transparent mode is that no encapsulation is
861	   needed, reducing processing requirements and keeping the MTU
862	   unchanged.  The obvious disadvantage is that no forward information
863	   can be provided with each data packet, so a replacement needs to be
864	   found for the PSN (packet sequence number) employed in encapsulated
865	   mode.  Any forward information beyond the data packets is sent in
866	   separate packets exchanged directly between the LOOPS nodes.

868	B.1.  Packet identification

870	   Retransmission mode and FEC mode differ in their needs for packet
871	   identification.  For retransmission mode, a somewhat probabilistic
872	   accuracy of the packet identification is sufficient, for FEC mode,
873	   packet identification should not make mistakes (as these would lead
874	   to faultily reconstructed packets).

876	   In Retransmission mode, misidentification of a packet could lead to
877	   measurement errors as well as missed retransmission opportunities.
878	   The latter will be fixed end-to-end.  The tolerance for measurement
879	   errors would influence the degree of accuracy that is aimed for.

881	   Packet identification can be based on a cryptographic hash of the
882	   packet, computed in LOOPS ingress and egress using the same algorithm
883	   (excluding fields that can change in transit, such as TTL/hop limit).
884	   The hash can directly be used as a packet number, or it can be sent
885	   in the forward information together with a packet sequence number,
886	   establishing a mapping.

888	   For probabilistic packet identification, it is almost always
889	   sufficient to hash the first few (say, 64) bytes of the packet; all
890	   known transport protocols keep sufficient identifying information in
891	   that part (and, for encrypted protocols, the entropy will be
892	   sufficient).  Any collisions of the hash could be used to disqualify
893	   the packet for measurement purposes, minimizing the measurement
894	   errors; this could allow rather short packet identifiers in
895	   retransmission mode.

897	   For FEC mode, the packet identification together with the per-packet
898	   FEC information needs to be sent in the (separate) forward
899	   information, so that a systematic code can be reconstructed.  For
900	   retransmission mode, there is no need to send any forward information
901	   for most packets, or a mapping from packet identifiers to packet
902	   sequence numbers could be sent in the forward information (probably
903	   in some aggregated form).  The latter would allow keeping the
904	   acknowledgement form described in the main body (with aggregate
905	   acknowledgement); otherwise, packet identifiers need to be
906	   acknowledged.  With this change, the LOOPS egress will send reverse
907	   information as in the encapsulating LOOPS protocol.

909	B.2.  Generic information and protocol operation

911	   With the changes outlined above, transparent mode operates just as
912	   encapsulated mode.  If packet sequence numbers are not used, there is
913	   no use for block2 reverse information; if they are used, a new block3
914	   needs to be defined that provides the mapping from packet identifiers
915	   to packet sequence numbers in the forward information.  To avoid MTU
916	   reduction, some mechanism will be needed to encapsulate the actual
917	   FEC information (additional packets) in the forward information.

919	B.3.  A hybrid mode

921	   Figure 3 can be modified by including a GRE encapsulator into the top
922	   left corner and a GRE decapsulator in the bottom left corner.  This
923	   provides more defined ingress and egress points, but it also provides
924	   an opportunity to add a packet sequence number at the ingress.  The
925	   copies to the top right and bottom right corners are the encapsulated
926	   form, i.e., include the sequence number.

928	   The GRE packet header then has the form:

930	      0                   1                   2                   3
931	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
932	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
933	     |0|0|0|1|    000000000    | 000 |         Protocol Type         |
934	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
935	     |                         Sequence Number                       |
936	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

938	   The forward and reverse information can be designed closer to the
939	   approach in the main body of the document, to be exchanged using UDP
940	   packets between top right ingress and bottom right egress using a
941	   port number allocated for this purpose.

943	   Rough ideas for both directions are given below in CDDL [RFC8610].
944	   This information set could be encoded in CBOR or in a bespoke
945	   encoding; details such as this can be defined later.

947	   forward-information = [
948	     [rel-psn, ack-desired, ? fec-info] /
949	     fec-repair-data
950	   ]

952	   rel-psn = uint; relative packet sequence number
953	   ; always given as a delta from the previous one in the array
954	   ; starting out with a "previous value" of 0

956	   ack-desired = bool

958	   fec-info = [
959	       sbn: uint, ; Source Block Number
960	       esi: uint, ; Encoding Symbol ID
961	       ? (
962	         nsssb: uint; number of symbols in a single source block
963	         ss: uint; symbol size
964	       )
965	   ]

967	   fec-repair-data = [
968	       repair-data: bytes
969	       ? (
970	         sbn: uint, ; Source Block Number
971	         esi: uint, ; Encoding Symbol ID
972	       )
973	   ]

975	   If left out for a sequence number, the fec-info block is constructed
976	   by adding one to the previous one.  fec-repair-data contain repair
977	   symbols for the sbn/esi given (which, again, are reconstructed from
978	   context if not given).

980	   reverse-information = [
981	       block1 / block2
982	   ]

984	   block1 = [rel-psn, timestamp]
985	   block2 = [end-psn-delta: uint, acked-bits: bytes]

987	   The acked-bits in a block2 is a bitmap that gives acknowledgments for
988	   received data packets.  The bitmap always comes as a multiple of 8
989	   bits (all bytes are filled in with 8 bits, each identifying a PSN).
990	   The end PSN of the bitmap (actually the first PSN that would be
991	   beyond it) is computed from the current PSN as set by rel-psn,
992	   rounded down to a multiple of 8, and adding 8*(end-psn-delta+1) to
993	   that value.

995	Acknowledgements

997	   Sami Boutros helped with sketching the use of Geneve (Section 6.1),
998	   and Tom Herbert helped with sketching the use of GUE (Section 6.2).

1000	   Michael Welzl has been supported by the Research Council of Norway
1001	   under its "Toppforsk" programme through the "OCARINA" project.

1003	Authors' Addresses

1005	   Michael Welzl
1006	   University of Oslo
1007	   PO Box 1080 Blindern
1008	   Oslo  N-0316
1009	   Norway

1011	   Phone: +47 22 85 24 20
1012	   Email: michawe@ifi.uio.no

1014	   Carsten Bormann (editor)
1015	   Universitaet Bremen TZI
1016	   Postfach 330440
1017	   Bremen  D-28359
1018	   Germany

1020	   Phone: +49-421-218-63921
1021	   Email: cabo@tzi.org