idnits 2.17.1 draft-welzl-loops-gen-info-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.li-tsvwg-loops-problem-opportunities], [I-D.ietf-nvo3-geneve], [I-D.ietf-intarea-gue]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 841 has weird spacing: '...packets for...' -- The document date (September 03, 2019) is 1696 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-05 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-13 == Outdated reference: A later version (-09) exists of draft-ietf-intarea-gue-07 == Outdated reference: A later version (-06) exists of draft-li-tsvwg-loops-problem-opportunities-03 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG M. Welzl 3 Internet-Draft University of Oslo 4 Intended status: Standards Track C. Bormann, Ed. 5 Expires: March 6, 2020 Universitaet Bremen TZI 6 September 03, 2019 8 LOOPS Generic Information Set 9 draft-welzl-loops-gen-info-01 11 Abstract 13 LOOPS (Local Optimizations on Path Segments) aims to provide local 14 (not end-to-end but in-network) recovery of lost packets to achieve 15 better data delivery in the presence of losses. 16 [I-D.li-tsvwg-loops-problem-opportunities] provides an overview over 17 the problems and optimization opportunities that LOOPS could address. 19 The present document is a strawman for the set of information that 20 would be interchanged in a LOOPS protocol, without already defining a 21 specific data packet format. 23 The generic information set needs to be mapped to a specific 24 encapsulation protocol to actually run the LOOPS optimizations. The 25 current version of this document contains sketches of bindings to GUE 26 [I-D.ietf-intarea-gue] and Geneve [I-D.ietf-nvo3-geneve]. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on March 6, 2020. 45 Copyright Notice 47 Copyright (c) 2019 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 64 2. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 2.1. No Access to End-to-End Transport Information . . . . . . 6 66 2.2. Path Asymmetry . . . . . . . . . . . . . . . . . . . . . 6 67 2.3. Reordering vs. Spurious Retransmission . . . . . . . . . 6 68 2.4. Informing the End-to-End Transport . . . . . . . . . . . 7 69 2.5. Congestion Detection . . . . . . . . . . . . . . . . . . 8 70 3. Simplifying assumptions . . . . . . . . . . . . . . . . . . . 8 71 4. LOOPS Generic Information Set . . . . . . . . . . . . . . . . 9 72 4.1. Setup Information . . . . . . . . . . . . . . . . . . . . 9 73 4.2. Forward Information . . . . . . . . . . . . . . . . . . . 9 74 4.3. Reverse Information . . . . . . . . . . . . . . . . . . . 10 75 5. LOOPS General Operation . . . . . . . . . . . . . . . . . . . 11 76 5.1. Initial Packet Sequence Number . . . . . . . . . . . . . 11 77 5.2. Acknowledgement Generation . . . . . . . . . . . . . . . 11 78 5.3. Measurement . . . . . . . . . . . . . . . . . . . . . . . 12 79 5.4. Loss detection and Recovery . . . . . . . . . . . . . . . 12 80 5.4.1. Local Retransmission . . . . . . . . . . . . . . . . 12 81 5.4.2. FEC . . . . . . . . . . . . . . . . . . . . . . . . . 13 82 5.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 13 83 6. Sketches of Bindings to Tunnel Protocols . . . . . . . . . . 13 84 6.1. Embedding LOOPS in Geneve . . . . . . . . . . . . . . . . 14 85 6.2. Embedding LOOPS in GUE . . . . . . . . . . . . . . . . . 14 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 87 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15 88 9. Informative References . . . . . . . . . . . . . . . . . . . 15 89 Appendix A. Protocol used in Prototype Implementation . . . . . 16 90 A.1. Block Code FEC . . . . . . . . . . . . . . . . . . . . . 17 91 Appendix B. Transparent mode . . . . . . . . . . . . . . . . . . 18 92 B.1. Packet identification . . . . . . . . . . . . . . . . . . 19 93 B.2. Generic information and protocol operation . . . . . . . 20 94 B.3. A hybrid mode . . . . . . . . . . . . . . . . . . . . . . 20 95 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 22 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 98 1. Introduction 100 Today's networks exhibit a wide variety of data rates and, relative 101 to those, processing power and memory capacities of nodes acting as 102 routers. For instance, networks that employ tunneling to build 103 overlay networks may position powerful virtual router nodes in the 104 network to act as tunnel endpoints. The capabilities available in 105 the more powerful cases provide new opportunities for optimizations. 107 LOOPS (Local Optimizations on Path Segments) aims to provide local 108 (not end-to-end but in-network) recovery of lost packets to achieve 109 better data delivery. [I-D.li-tsvwg-loops-problem-opportunities] 110 provides an overview over the problems and optimization opportunities 111 that LOOPS could address. One simplifying assumption (Section 3) in 112 the present document is that LOOPS segments operate independently 113 from each other, each as a pair of a LOOPS Ingress and a LOOPS Egress 114 node. 116 The present document is a strawman for the set of information that 117 would be interchanged in a LOOPS protocol between these nodes, 118 without already defining a specific data packet format. The main 119 body of the document defines a mode of the LOOPS protocol that is 120 based on traditional tunneling, the "tunnel mode". Appendix B is an 121 even rougher strawman of a radically different, alternative mode that 122 we call "transparent mode", as well as a slightly more conventional 123 "hybrid mode" (Appendix B.3). These different modes may be 124 applicable to different usage scenarios and will be developed in 125 parallel, with a view of ultimately standardizing one or more of 126 them. 128 For tunnel mode, the generic information set needs to be mapped to a 129 specific encapsulation protocol to actually run the LOOPS 130 optimizations. LOOPS is not tied to any specific overlay protocol, 131 but is meant to run embedded into a variety of tunnel protocols. 132 LOOPS information is added as part of a tunnel protocol header at the 133 LOOPS ingress as shown in Figure 1. The current version of this 134 document contains sketches of bindings to GUE [I-D.ietf-intarea-gue] 135 and Geneve [I-D.ietf-nvo3-geneve]. 137 +------------------------------------+ 138 | Outer header | 139 +------------------------------------+ 140 / | Tunnel Base Header | 141 / +------------------------------------+\ 142 Tunnel | +-------------------------+ | \ 143 Header ~ | LOOPS Information | ~ Tunnel Header 144 \ | +-------------------------+ | Extensions 145 \ +------------------------------------+ / 146 | Data packet | 147 +------------------------------------+ 149 Figure 1: Packet in Tunnel with LOOPS Information 151 Figure 2 is extracted from the LOOPS problems and opportunities 152 document [I-D.li-tsvwg-loops-problem-opportunities]. It illustrates 153 the basic architecture and terms of the applicable scenario of LOOPS. 154 Not all of the concepts introduced in the problems and opportunities 155 document are actually used in the current strawman specification; 156 Section 3 lays out some simplifying assumptions that the present 157 proposal makes. 159 ON=overlay node 160 UN=underlay node 162 +---------+ +---------+ 163 | App | <---------------- end-to-end ---------------> | App | 164 +---------+ +---------+ 165 |Transport| <---------------- end-to-end ---------------> |Transport| 166 +---------+ +---------+ 167 | | | | 168 | | +--+ path +--+ path segment2 +--+ | | 169 | | | |<-seg1->| |<--------------> | | | | 170 | Network | +--+ |ON| +--+ |ON| +--+ +----+ |ON| | Network | 171 | |--|UN|--| |--|UN|--| |--|UN|---| UN |--| |--| | 172 +---------+ +--+ +--+ +--+ +--+ +--+ +----+ +--+ +---------+ 173 End Host End Host 174 <---------------------------------> 175 LOOPS domain: path segments enabling 176 optimization for local in-network recovery 178 Figure 2: LOOPS Usage Scenario 180 1.1. Terminology 182 This document makes use of the terminology defined in 183 [I-D.li-tsvwg-loops-problem-opportunities]. This section defines 184 additional terminology used by this document. 186 Data packets: The payload packets that enter and exit a LOOPS 187 segment. 189 LOOPS Segment: A part of an end-to-end path covered by a single 190 instance of the LOOPS protocol, the sub-path between the LOOPS 191 Ingress and the LOOPS Egress. 193 LOOPS Ingress: The node that forwards data packets and forward 194 information into the LOOPS segment, potentially performing 195 retransmission and forward error correction based on 196 acknowledgements and measurements received from the LOOPS Egress. 198 LOOPS Egress: The node that receives the data packets and forward 199 information from the LOOPS ingress, sends acknowledgements and 200 measurements back to the LOOPS ingress (reverse information), 201 potentially recovers data packets from forward error correction 202 information received. 204 LOOPS Nodes: Collective term for LOOPS Ingress and LOOPS Egress in a 205 LOOPS Segment. 207 Forward Information: Information that is added to the stream of data 208 packets in the forward direction by the LOOPS Ingress. 210 Reverse Information: Information that flows in the reverse 211 direction, from the LOOPS Egress back to the LOOPS Ingress. 213 Setup Information: Information that is not transferred as part of 214 the Forward or Reverse Information, but is part of the setup of 215 the LOOPS Nodes. 217 PSN: Packet Sequence Number, a sequence number identifying a data 218 packet. 220 Sender: Original sender of a packet on an end-to-end path that 221 includes one or more LOOPS segment(s). 223 Receiver: Ultimate receiver of a packet on an end-to-end path that 224 includes one or more LOOPS segment(s). 226 2. Challenges 228 LOOPS has to perform well in the presence of some challenges, which 229 are discussed in this section. 231 2.1. No Access to End-to-End Transport Information 233 LOOPS is defined to be independent of the content of the packets 234 being forwarded: there is no dependency on transport-layer or higher 235 information. The intention is to keep LOOPS useful with a traffic 236 mix that may contain encrypted transport protocols such as QUIC as 237 well as encrypted VPN traffic. 239 2.2. Path Asymmetry 241 A LOOPS segment is defined as a unidirectional forwarding path. The 242 tunnel might be shared with a LOOPS segment in the inverse direction; 243 this then allows to piggyback Reverse Information on encapsulated 244 packets on that segment. But there is no guarantee that the inverse 245 direction of any end-to-end-path crosses that segment, so the LOOPS 246 optimizations have to be useful on their own in each direction. 248 2.3. Reordering vs. Spurious Retransmission 250 The end-to-end transport layer protocol may have its own 251 retransmission mechanism to recover lost packets. When LOOPS 252 recovers a loss, ideally this local recovery would avoid the 253 triggering of a retransmission at the end-to-end sender. 255 Whether this is possible depends on the specific end-to-end mechanism 256 used for triggering retransmission. When end-to-end retransmission 257 is triggered by receiving a sequence of duplicate acknowledgements 258 (DUPACKs), and with more than a few packets in flight, the recovered 259 packet is likely to be too late to fill the hole in the sequence 260 number space that triggers the DUPACK detection. 262 (Given a reasonable setting of parameters, the local retransmission 263 will still arrive earlier than the end-to-end retransmission and will 264 possibly unblock application processing earlier; with spurious 265 retransmission detection, there also will be little long-term effect 266 on the send rate.) 268 The waste of bandwidth caused by a DUPACK-based end-to-end 269 retransmission can be avoided when the end-to-end loss detection is 270 based on time instead of sequence numbers, e.g., with RACK 271 [I-D.ietf-tcpm-rack]. This requires a limit on the additional 272 latency that LOOPS will incur in its attempt to recover the loss 273 locally. In the present version of this document, opportunity to set 274 such a limit is provided in the Setup Information. The limit can be 275 used to compute a deadline for retransmission, but also can be used 276 to choose FEC parameters that keep extra latency low. 278 2.4. Informing the End-to-End Transport 280 Congestion control at the end-to-end sender is used to adapt its 281 sending rate to the network congestion status. In typical TCP 282 senders, packet loss implies congestion and leads to a reduction in 283 sending rate. With LOOPS operating, packet loss can be masked from 284 the sender as the loss may have been locally recovered. In this 285 case, rate reduction may not be invoked at the sender. This is a 286 desirable performance improvement if the loss was a random loss. 288 If LOOPS successfully conceals congestion losses from the end-to-end 289 transport protocol, that might increase the rate to a level that 290 congests the LOOPS segment, or that causes excessive queueing at the 291 LOOPS ingress. What LOOPS should be able to achieve is to let the 292 end host sender invoke the rate reduction mechanism when there is a 293 congestion loss no matter if the lost packet was recovered locally. 295 As with any tunneling protocol, information about congestion events 296 inside the tunnel needs to be exported to the end-to-end path the 297 tunnel is part of. See e.g., [RFC6040] for a discussion of how to do 298 this in the presence of ECN. A more recent draft, 299 [I-D.ietf-tsvwg-tunnel-congestion-feedback], proposes to activate ECN 300 for the tunnel regardless of whether the end-to-end protocol signals 301 the use of an ECN-capable transport (ECT), which requires more 302 complicated action at the tunnel egress. 304 A sender that interprets reordering as a signal of packet loss 305 (DUPACKs) initiates a retransmission and reduces the sending rate. 306 When spurious retransmission detection (e.g., via F-RTO [RFC5862] or 307 DSACK [RFC3708] is enabled by the TCP sender, it will often be able 308 undo the unnecessary window reduction. As LOOPS recovers lost 309 packets locally, in most cases the end host sender will eventually 310 find out its reordering-based retransmission (if any) is spurious. 311 This is an appropriate performance improvement if the loss was a 312 random loss. For congestion losses, a congestion event needs to be 313 signaled to the end-to-end transport. 315 If the end-to-end transport is ECN-capable (which is visible at the 316 IP level), congestion loss events can easily be signaled to them by 317 setting the CE (congestion experienced) mark. If LOOPS detects a 318 congestion loss for a non-ECT packet, it needs to signal a congestion 319 loss event by introducing a packet loss. This can be done by 320 choosing not to retransmit or repair the packet loss locally in this 321 case. Note that one congestion loss per end-to-end RTT is sufficient 322 to provide the rate reduction, so LOOPS may still be able to recover 323 most packets, in particular for burst losses. (As LOOPS does not 324 interact with the end-to-end transport, it does not know the end-to- 325 end RTT. Some lower bound derived from configuration and 326 measurements could be used instead.) 328 2.5. Congestion Detection 330 Properly informing the end-to-end transport protocol about congestion 331 loss events requires distinguishing these from random losses. In 332 some special cases, distinguishing information may be available from 333 a link layer (e.g., see Section 3 of 334 [I-D.li-tsvwg-loops-problem-opportunities]). By enabling ECN inside 335 the tunnel, congestion events experienced at ECN-capable routers will 336 usually be identified by the CE mark, which clearly rules out a 337 random loss. 339 In the general case, the segment may be composed of hops without such 340 special indications. In these cases, some detection mechanism is 341 required to provide this distinguishing information. The specific 342 mechanism used by an implementation is out of scope of LOOPS, but 343 LOOPS will need to provide measurement information for this 344 mechanism. For instance, congestion detection might be based on path 345 segment latency information, the proper measurement of which 346 therefore requires special attention in LOOPS. 348 3. Simplifying assumptions 350 The above notwithstanding, Implementations may want to make use of 351 indicators such as transport layer port numbers to partition a tunnel 352 flow into separate application flows, e.g., for active queue 353 management (AQM). Any such functionality is orthogonal to the LOOPS 354 protocol itself and thus out of scope for the present document. 356 One observation that simplifies the design of LOOPS in comparison to 357 that of a reliable transport protocol is that LOOPS does not _have_ 358 to recover every packet loss. Therefore, probabilistic approaches, 359 and simply giving up after some time has elapsed, can simplify the 360 protocol significantly. 362 For now, we assume that LOOPS segments that may line up on an end-to- 363 end path operate independently of each other. Since the objective of 364 LOOPS ultimately is to assist the end-to-end protocol, it is likely 365 that some cooperation between them would be beneficial, e.g., to 366 obtain some measurements that cover a larger part of the end-to-end 367 path. For instance, cooperating LOOPS segments could try to divide 368 up permissible increases to end-to-end latency between them. This is 369 out of scope for the present version. 371 Another simplifying assumption is that LOOPS nodes have reasonably 372 precise absolute time available to them, so there is no need to 373 burden the LOOPS protocol with time synchronization. How this is 374 achieved is out of scope. 376 LOOPS nodes are created and set up (information about their peers, 377 parameters) by some control plane mechanism that is out of scope for 378 this specification. This means there is no need in the LOOPS 379 protocol itself to manage setup information. 381 4. LOOPS Generic Information Set 383 This section sketches a generic information set for the LOOPS 384 protocol. Entries marked with (*) are items that may not be 385 necessary and probably should be left out of an initial 386 specification. 388 4.1. Setup Information 390 Setup Information might include: 392 o encapsulation protocol in use, and its vital parameters 394 o identity of LOOPS ingress and LOOPS egress; information relevant 395 for running the encapsulation protocol such as port numbers 397 o target maximum latency increase caused by the operation of LOOPS 398 on this segment 400 o maximum retransmission count (*) 402 In the data plane, we have forward information (information added to 403 each data packet) and reverse information. The latter can be sent in 404 separate packets (e.g., Geneve control-only packets 405 [I-D.ietf-nvo3-geneve]) and/or piggybacked like the forward 406 information. 408 4.2. Forward Information 410 In the forward information, we have identified: 412 o tunnel type (a few bits, meaning agreed between Ingress and 413 Egress) 415 o packet sequence number PSN (20+ bits), counting the LOOPS packets 416 transmitted by the LOOPS ingress (i.e., retransmissions receive a 417 new PSN) 419 o an "ACK desirable" flag (one bit, usually set for a certain 420 percentage of the data packets only) 422 o anything that the FEC scheme needs. 424 The first four together (say, 3+24+4+1) might even fit into 32 bits, 425 but probably need up to 48 bits total. FEC info of course often 426 needs more space. 428 (Note that in this proposal there is no timestamp in the forward 429 information; see Section 5.3.) 431 24 bits of PSN, minus one bit for sequence number arithmetic, gives 8 432 million packets (or 2.4 GB at typical packet sizes) per worst-case 433 RTT. So if that is, say, 30 seconds, this would be enough to fill 434 640 Mbit/s. 436 4.3. Reverse Information 438 For the reverse information, we have identified: 440 o one optional block 1, possibly repeated: 442 o PSN being acknowledged 444 o absolute time of reception for the packet acknowledged (PSN) 446 o one optional block 2, possibly repeated: 448 o an ACK bitmap (based on PSN), always starting at a multiple of 8 450 o a delta indicating the end PSN of the bitmap (actually the first 451 PSN that is beyond it), using (Acked-PSN & ~7) + 8*(delta+1) as 452 the end of the bitmap. Acked-PSN in that formula is the previous 453 block 1 PSN seen in this packet, or 0 if none so far. 455 Block 1 and Block 2 can be interspersed and repeated. They can be 456 piggybacked on a reverse direction data packet or sent separately if 457 none occurs within some timeout. They will usually be aggregated in 458 some useful form. Block 1 information sets are only returned for 459 packets that have "ACK desirable" set. Block 2 information is sent 460 by the receiver based on some saturation scheme (e.g., at least three 461 copies for each PSN span over time). Still, it might be possible to 462 go down to 1 or 2 amortized bytes per forward packet spent for all 463 this. 465 The latency calculation is done by the sender, who occasionally sets 466 "ACK desirable", and notes down the absolute time of transmission for 467 this data packet (the timekeeping can be done quite efficiently as 468 deltas). Upon reception of a block 1 ACK, it can then subtract that 469 from the absolute time of reception indicated. This assumes time 470 synchronization between the nodes is at least as good as the 471 precision of latency measurement needed, which should be no problem 472 with IEEE 1588 PTP synchronization (but could be if using NTP-based 473 synchronization only). A sender can freely garbage collect noted 474 down transmission time information; doing this too early just means 475 that the quality of the RTT sampling will reduce. 477 5. LOOPS General Operation 479 In the Tunnel Mode described in the main body of this document, LOOPS 480 information is carried by some tunnel encapsulation. 482 5.1. Initial Packet Sequence Number 484 There is no connection establishment procedure in LOOPS. The initial 485 PSN is assigned unilaterally by the LOOPS Ingress. 487 Because of the short time that is usually set in the maximum latency 488 increase, there is little damage from a collision of PSNs with 489 packets still in flight from previous instances of LOOPS. 491 Collisions can be minimized by assigning initial PSNs randomly, or 492 using stable storage. Random assignment is more useful for longer 493 PSNs, where the likelihood of overlap will be low. The specific way 494 a LOOPS ingress uses stable storage is a local matter and thus out of 495 scope. (Implementation note: this can be made to work similar to 496 secure nonce generation with write attenuation: Say, every 10000 497 packets, the sender notes down the PSN into stable storage. After a 498 reboot, it reloads the PSN and adds 10000 in sequence number 499 arithmetic [RFC1982], plus maybe another 10000 so the sender does not 500 have to wait for the store operation to succeed before sending more 501 packets.) 503 5.2. Acknowledgement Generation 505 A data packet forwarded by the LOOPS ingress always carries PSN 506 information. The LOOPS egress uses the largest newly received PSN 507 with the "ACK desired" bit as the ACK number in the block 1 part of 508 the acknowledgement. This means that the LOOPS ingress gets to 509 modulate the number of acknowledgement sent by the LOOPS egress. 510 However, whenever an out-of-order packet arrives while there still 511 are "holes" in the PSNs received, the LOOPS receiver should generate 512 a block 2 acknowledgement immediately that the LOOPS sender can use 513 as a NACK list. 515 Reverse information can be piggybacked in a reverse direction data 516 packet. When the reverse direction has no user data to be sent, a 517 pure reverse information packet needs to be generated. This may be 518 based on a short delay during which the LOOPS egress waits for a data 519 packet to piggyback on. (To reduce MTU considerations, the egress 520 could wait for less-than-full data packets.) 522 5.3. Measurement 524 When sending a block 1 acknowledgement, the LOOPS egress indicates 525 the absolute time of reception of the packet. The LOOPS ingress can 526 subtract the absolute time of transmission that it still has 527 available, resulting in one high quality latency sample. (In an 528 alternative design, the forward information could include the 529 absolute time of transmission as well, and block1 information would 530 echo it back. This trades memory management at the ingress for 531 increased bandwidth and MTU reduction.) 533 The LOOPS ingress can also use the time of reception of the block 1 534 acknowledgement to obtain a segment RTT sample. Note that this will 535 include any wait time the LOOPS egress incurs while waiting for a 536 piggybacking opportunity -- this is appropriate, as all uses of an 537 RTT will be for keeping a retransmission timeout. 539 To maintain quality of information during idle times, the LOOPS 540 ingress may send keepalive packets, which are discarded at the LOOPS 541 egress after sending acknowledgements. The indication that a packet 542 is a keepalive packet is dependent on the encapsulation protocol. 544 5.4. Loss detection and Recovery 546 There are two ways for LOOPS local recovery, retransmission and FEC. 548 5.4.1. Local Retransmission 550 When retransmission is used as recovery mechanism, the LOOPS ingress 551 detects a packet loss by receiving a NACK or by local timeout (using 552 a RTO value that might be calculated as in [RFC6298]). It might 553 employ a DUPACK-like or a RACK-like mechanism for delayed reaction to 554 a NACK. 556 When a retransmission is desired (see Section 2.4 for why it might 557 not be), the LOOPS ingress performs the local in-network recovery by 558 retransmitting the packet. Further retransmissions may be desirable 559 if the NACK is persistent beyond an RTO, as long as the maximum 560 latency increase is not reached. 562 5.4.2. FEC 564 FEC is another way to perform local recovery. When FEC is in use, a 565 FEC header is sent with data packets as well as with special repair 566 packets added to the flow. The specific FEC scheme used could be 567 defined in the Setup Information, using a mechanism like [RFC5052]. 568 The FEC rate (amount of redundancy added) and possibly the FEC scheme 569 could be unilaterally adjusted by the LOOPS ingress in an adaptive 570 mechanism based on the measurement information. 572 5.5. Discussion 574 Without progress in the way that end-host transport protocols handle 575 reordering, LOOPS will be unable to prevent end-to-end 576 retransmissions that duplicate effort that is spent in local 577 retransmissions. It depends on parameters of the path segment 578 whether this wasted effort is significant or not. 580 One remedy against this waste could be the introduction of 581 resequencing at the LOOPS Egress node. This increases overall mean 582 packet latency, but does not always increase actual end-to-end data 583 stream latency if a head-of-line blocking transport such as TCP is in 584 use. For applications with a large percentage of legacy TCP end- 585 hosts and sufficient processing capabilities at the LOOPS Egress 586 node, resequencing may be a viable choice. Note that resequencing 587 could be switched off and on depending on some measurement 588 information. 590 To enable resequencing at the LOOPS Egress, a packet numbering scheme 591 is needed that allows the LOOPS Egress to reconstruct the sequence at 592 the LOOPS ingress. This could be done by reverting to a traditional 593 packet sequence number counting incoming data packets, possibly 594 combined with a "retransmission" bit that indicates that the specific 595 LOOPS packet is a retransmission and not the original transmission. 596 (The acknowledgement/measurement ambiguity could be further reduced 597 by adding transmission counter TC that counts transmission/ 598 retransmission for this PSN; a few bits should be enough for the 599 limited retransmission envisaged.) 601 6. Sketches of Bindings to Tunnel Protocols 603 The LOOPS information defined above in a generic way can be mapped to 604 specific tunnel encapsulation protocols. Sketches for two tunnel 605 protocols are given below: Geneve (Section 6.1), and GUE 606 (Section 6.2). The actual encapsulation can be designed in a 607 "native" way by putting each of the various elements into the TLV 608 format of the encapsulation protocol, or it can be achieved by 609 providing single TLVs for forward and reverse information and using 610 some generic encoding of both kinds of information as shown in 611 Appendix B.3. 613 6.1. Embedding LOOPS in Geneve 615 Geneve [I-D.ietf-nvo3-geneve] is an extensible overlay protocol which 616 can embed LOOPS functions. Geneve uses TLVs to carry optional 617 information between NVEs. NVE is logically the same entity as the 618 LOOPS node. 620 For Geneve, a new LOOPS TLV needs to be defined and its format needs 621 to be consistent with LOOPS generic information in Section 4. When 622 the Geneve LOOPS TLV is put in forward information, NVEs should be 623 able to process it. Any settings needed can be provided in the Setup 624 Information. 626 In the reverse direction, when no data packets are available for 627 piggybacking, a control only packet will be used to carry the LOOPS 628 reverse information. Such a control only packet sets the 'O' bit in 629 the Geneve header and has no real user data. 631 VNI is a mandatory field in Geneve base header. The LOOPS TLV should 632 function on the tunnel between two NVEs without looking at the VNI 633 value. The LOOPS PSN number space is local to the overlay tunnel 634 regardless of the VNI inside. At the ingress NVE, there are 635 different ways to decide whether a packet should go to LOOPS enabled 636 tunnel, e.g. by protocol number (TCP/UDP certain ports) or by VNI. 638 6.2. Embedding LOOPS in GUE 640 GUE [I-D.ietf-intarea-gue] is an extensible overlay protocol which 641 can embed LOOPS functions. GUE uses flags to indicate the presence 642 of fixed length header extensions. It also allows variable length 643 extensions to be put in "Private data" field. A new LOOPS data block 644 in the "private data" field needs to be defined based on the LOOPS 645 generic information in Section 4. 647 In the reverse direction, when no data packets are available for 648 piggybacking, LOOPS reverse information is carried in a control 649 message with the C-bit set in the GUE header. The Proto/ctype field 650 contains a control message type when C bit is set. Hence a new 651 control message type should be defined for such LOOPS reverse 652 information. 654 7. IANA Considerations 656 No IANA action is required at this stage. When a LOOPS 657 representation is designed for a specific tunneling protocol, new 658 codepoints will be required in the registries that pertain to that 659 protocol. 661 8. Security Considerations 663 To be defined. 665 9. Informative References 667 [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, 668 DOI 10.17487/RFC1982, August 1996, 669 . 671 [RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective 672 Acknowledgement (DSACKs) and Stream Control Transmission 673 Protocol (SCTP) Duplicate Transmission Sequence Numbers 674 (TSNs) to Detect Spurious Retransmissions", RFC 3708, 675 DOI 10.17487/RFC3708, February 2004, 676 . 678 [RFC5052] Watson, M., Luby, M., and L. Vicisano, "Forward Error 679 Correction (FEC) Building Block", RFC 5052, 680 DOI 10.17487/RFC5052, August 2007, 681 . 683 [RFC5862] Yasukawa, S. and A. Farrel, "Path Computation Clients 684 (PCC) - Path Computation Element (PCE) Requirements for 685 Point-to-Multipoint MPLS-TE", RFC 5862, 686 DOI 10.17487/RFC5862, June 2010, 687 . 689 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 690 Notification", RFC 6040, DOI 10.17487/RFC6040, November 691 2010, . 693 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 694 "Computing TCP's Retransmission Timer", RFC 6298, 695 DOI 10.17487/RFC6298, June 2011, 696 . 698 [RFC6330] Luby, M., Shokrollahi, A., Watson, M., Stockhammer, T., 699 and L. Minder, "RaptorQ Forward Error Correction Scheme 700 for Object Delivery", RFC 6330, DOI 10.17487/RFC6330, 701 August 2011, . 703 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 704 Definition Language (CDDL): A Notational Convention to 705 Express Concise Binary Object Representation (CBOR) and 706 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 707 June 2019, . 709 [I-D.ietf-tcpm-rack] 710 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 711 a time-based fast loss detection algorithm for TCP", 712 draft-ietf-tcpm-rack-05 (work in progress), April 2019. 714 [I-D.ietf-nvo3-geneve] 715 Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic 716 Network Virtualization Encapsulation", draft-ietf- 717 nvo3-geneve-13 (work in progress), March 2019. 719 [I-D.ietf-intarea-gue] 720 Herbert, T., Yong, L., and O. Zia, "Generic UDP 721 Encapsulation", draft-ietf-intarea-gue-07 (work in 722 progress), March 2019. 724 [I-D.li-tsvwg-loops-problem-opportunities] 725 Yizhou, L., Zhou, X., Boucadair, M., and J. Wang, "LOOPS 726 (Localized Optimizations on Path Segments) Problem 727 Statement and Opportunities for Network-Assisted 728 Performance Enhancement", draft-li-tsvwg-loops-problem- 729 opportunities-03 (work in progress), July 2019. 731 [I-D.ietf-tsvwg-tunnel-congestion-feedback] 732 Wei, X., Yizhou, L., Boutros, S., and L. Geng, "Tunnel 733 Congestion Feedback", draft-ietf-tsvwg-tunnel-congestion- 734 feedback-07 (work in progress), May 2019. 736 Appendix A. Protocol used in Prototype Implementation 738 This appendix describes, in a somewhat abstracted form, the protocol 739 as used in a prototype implementation, as described by Yizhou Li, and 740 Xingwang Zhou. 742 The prototype protocol can be run in one of two modes (defined by 743 preconfiguration): 745 o Retransmission mode 747 o Forward Error Correction (FEC) mode 749 Forward information is piggybacked in data packets. 751 Reverse information can be carried in a pure acknowledgement packet 752 or piggybacked when carrying packets for the inverse direction. 754 The forward information includes: 756 o Packet Sequence Number (PSN) (32 bits): This identifies a packet 757 over a specific overlay segment from a specific LOOPS Ingress. If 758 a packet is retransmitted by LOOPS, the retransmission uses the 759 original PSN. 761 o Timestamp (32 bits): Information, in a format local to the LOOPS 762 ingress, that provides the time when the packet was sent. In the 763 current implementation, a 32-bit unsigned value specifying the 764 time delta in some granularity from the epoch time to the sending 765 time of the packet carrying this timestamp. The granularity can 766 be from 1 ms to 1 second. The epoch time follows the current TCP 767 practice which is 1 January 1970 00:00:00 UTC. Note that a 768 retransmitted packet uses its own Timestamp. 770 o FEC Info for Block Code (56 bits): This header is used in FEC 771 mode. It currently only provides for a block code FEC scheme. It 772 includes the Source Block Number (SBN), Encoding Symbol ID (ESI), 773 number of symbols in a single source block and symbol size. 774 Appendix A.1 gives more details on FEC. 776 The reverse information includes: 778 o ACK Number (32 bits): The largest (in sequence number arithmetic 779 [RFC1982]) PSN received so far. 781 o NACK List (variable): This indicates an array of PSN numbers to 782 describe the PSN "holes" preceding the ACK number. It 783 conceptually lists the PSNs of every packet perceived as lost by 784 the LOOPS egress. In actual use, it is truncated. 786 o Echoed Timestamp (32 bits): The timestamp received with the packet 787 being acknowledged. 789 A.1. Block Code FEC 791 The prototype currently uses a block code FEC scheme (RaptorQ 792 [RFC6330]). The fields in the FEC Info forward information are: 794 o Source Block Number (SBN): 16 bits. An integer identifier for the 795 source block that the encoding symbols within the packet relate 796 to. 798 o Encoding Symbol ID (ESI): 16 bits. An integer identifier for the 799 encoding symbols within the packet. 801 o K: 8 bits. Number of symbols in a single source block. 803 o T: 16 bits. Symbol size in bytes. 805 The LOOPS Ingress uses the data packet in Figure 1 to generate the 806 encoding packet. Both source packets and repair packets carry the 807 FEC header information; the LOOPS Egress reconstructs the data 808 packets from both kinds of packets. The LOOPS Egress currently 809 resequences the forwarded and reconstructed packets, so they are 810 passed on in-order when the lost packets are recoverable within the 811 source block. 813 The LOOPS Nodes need to agree on the use of FEC block mode and on the 814 specific FEC Encoding ID to use; this is currently done by 815 configuration. 817 Appendix B. Transparent mode 819 This appendix defines a very different way to provide the LOOPS 820 services, "transparent mode". (We call the protocol described in the 821 main body of the document "encapsulated mode".) 823 In transparent mode, the idea is that LOOPS does not meddle with the 824 forward transmission of data packets, but runs on the side exchanging 825 additional information. 827 An implementation could be based on conventional forwarding switches 828 that just provide a copy of the ingress and egress packet stream to 829 the LOOPS implementations. The LOOPS process would occasionally 830 inject recovered packets back into the LOOPS egress node's forwarding 831 switch, see Figure 3. 833 | 834 +-------+-------------------------------------------+ 835 | | | 836 | +----+--------+ +-------------------+ | 837 | | | copy | | | | 838 | | |----------------> LOOPS ingress | | 839 | | | | | | ^ | | 840 | +----+--------+ +-----|-----|-------+ | 841 | data|packets forward| |reverse | 842 | | info| |info | 843 +-------+------------------|-----|------------------+ 844 | | | 845 +-------+------------------|-----|------------------+ 846 | | | | | 847 | +----+---------+ +----|-----|----------+ | 848 | | | copy | | v | | | 849 | | |---------|---|---> LOOPS egress | | 850 | | | | | | | 851 | | |<--------|---|---- inject | | 852 | +----+---------+ +---------------------+ | 853 | | | 854 +-------+-------------------------------------------+ 855 | 856 v 858 Figure 3: LOOPS Transparent Mode 860 The obvious advantage of transparent mode is that no encapsulation is 861 needed, reducing processing requirements and keeping the MTU 862 unchanged. The obvious disadvantage is that no forward information 863 can be provided with each data packet, so a replacement needs to be 864 found for the PSN (packet sequence number) employed in encapsulated 865 mode. Any forward information beyond the data packets is sent in 866 separate packets exchanged directly between the LOOPS nodes. 868 B.1. Packet identification 870 Retransmission mode and FEC mode differ in their needs for packet 871 identification. For retransmission mode, a somewhat probabilistic 872 accuracy of the packet identification is sufficient, for FEC mode, 873 packet identification should not make mistakes (as these would lead 874 to faultily reconstructed packets). 876 In Retransmission mode, misidentification of a packet could lead to 877 measurement errors as well as missed retransmission opportunities. 878 The latter will be fixed end-to-end. The tolerance for measurement 879 errors would influence the degree of accuracy that is aimed for. 881 Packet identification can be based on a cryptographic hash of the 882 packet, computed in LOOPS ingress and egress using the same algorithm 883 (excluding fields that can change in transit, such as TTL/hop limit). 884 The hash can directly be used as a packet number, or it can be sent 885 in the forward information together with a packet sequence number, 886 establishing a mapping. 888 For probabilistic packet identification, it is almost always 889 sufficient to hash the first few (say, 64) bytes of the packet; all 890 known transport protocols keep sufficient identifying information in 891 that part (and, for encrypted protocols, the entropy will be 892 sufficient). Any collisions of the hash could be used to disqualify 893 the packet for measurement purposes, minimizing the measurement 894 errors; this could allow rather short packet identifiers in 895 retransmission mode. 897 For FEC mode, the packet identification together with the per-packet 898 FEC information needs to be sent in the (separate) forward 899 information, so that a systematic code can be reconstructed. For 900 retransmission mode, there is no need to send any forward information 901 for most packets, or a mapping from packet identifiers to packet 902 sequence numbers could be sent in the forward information (probably 903 in some aggregated form). The latter would allow keeping the 904 acknowledgement form described in the main body (with aggregate 905 acknowledgement); otherwise, packet identifiers need to be 906 acknowledged. With this change, the LOOPS egress will send reverse 907 information as in the encapsulating LOOPS protocol. 909 B.2. Generic information and protocol operation 911 With the changes outlined above, transparent mode operates just as 912 encapsulated mode. If packet sequence numbers are not used, there is 913 no use for block2 reverse information; if they are used, a new block3 914 needs to be defined that provides the mapping from packet identifiers 915 to packet sequence numbers in the forward information. To avoid MTU 916 reduction, some mechanism will be needed to encapsulate the actual 917 FEC information (additional packets) in the forward information. 919 B.3. A hybrid mode 921 Figure 3 can be modified by including a GRE encapsulator into the top 922 left corner and a GRE decapsulator in the bottom left corner. This 923 provides more defined ingress and egress points, but it also provides 924 an opportunity to add a packet sequence number at the ingress. The 925 copies to the top right and bottom right corners are the encapsulated 926 form, i.e., include the sequence number. 928 The GRE packet header then has the form: 930 0 1 2 3 931 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 932 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 933 |0|0|0|1| 000000000 | 000 | Protocol Type | 934 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 935 | Sequence Number | 936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 The forward and reverse information can be designed closer to the 939 approach in the main body of the document, to be exchanged using UDP 940 packets between top right ingress and bottom right egress using a 941 port number allocated for this purpose. 943 Rough ideas for both directions are given below in CDDL [RFC8610]. 944 This information set could be encoded in CBOR or in a bespoke 945 encoding; details such as this can be defined later. 947 forward-information = [ 948 [rel-psn, ack-desired, ? fec-info] / 949 fec-repair-data 950 ] 952 rel-psn = uint; relative packet sequence number 953 ; always given as a delta from the previous one in the array 954 ; starting out with a "previous value" of 0 956 ack-desired = bool 958 fec-info = [ 959 sbn: uint, ; Source Block Number 960 esi: uint, ; Encoding Symbol ID 961 ? ( 962 nsssb: uint; number of symbols in a single source block 963 ss: uint; symbol size 964 ) 965 ] 967 fec-repair-data = [ 968 repair-data: bytes 969 ? ( 970 sbn: uint, ; Source Block Number 971 esi: uint, ; Encoding Symbol ID 972 ) 973 ] 975 If left out for a sequence number, the fec-info block is constructed 976 by adding one to the previous one. fec-repair-data contain repair 977 symbols for the sbn/esi given (which, again, are reconstructed from 978 context if not given). 980 reverse-information = [ 981 block1 / block2 982 ] 984 block1 = [rel-psn, timestamp] 985 block2 = [end-psn-delta: uint, acked-bits: bytes] 987 The acked-bits in a block2 is a bitmap that gives acknowledgments for 988 received data packets. The bitmap always comes as a multiple of 8 989 bits (all bytes are filled in with 8 bits, each identifying a PSN). 990 The end PSN of the bitmap (actually the first PSN that would be 991 beyond it) is computed from the current PSN as set by rel-psn, 992 rounded down to a multiple of 8, and adding 8*(end-psn-delta+1) to 993 that value. 995 Acknowledgements 997 Sami Boutros helped with sketching the use of Geneve (Section 6.1), 998 and Tom Herbert helped with sketching the use of GUE (Section 6.2). 1000 Michael Welzl has been supported by the Research Council of Norway 1001 under its "Toppforsk" programme through the "OCARINA" project. 1003 Authors' Addresses 1005 Michael Welzl 1006 University of Oslo 1007 PO Box 1080 Blindern 1008 Oslo N-0316 1009 Norway 1011 Phone: +47 22 85 24 20 1012 Email: michawe@ifi.uio.no 1014 Carsten Bormann (editor) 1015 Universitaet Bremen TZI 1016 Postfach 330440 1017 Bremen D-28359 1018 Germany 1020 Phone: +49-421-218-63921 1021 Email: cabo@tzi.org