idnits 2.17.1 draft-li-tsvwg-loops-problem-opportunities-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 24, 2019) is 1799 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-13 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-05 == Outdated reference: A later version (-26) exists of draft-ietf-6man-segment-routing-header-19 == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Y. Li 3 Internet-Draft X. Zhou 4 Intended status: Informational Huawei 5 Expires: November 25, 2019 May 24, 2019 7 LOOPS (Localized Optimizations of Path Segments) Problem Statement and 8 Opportunities 9 draft-li-tsvwg-loops-problem-opportunities-02 11 Abstract 13 In various network deployments, end to end paths are partitioned into 14 multiple segments. In some cloud based WAN connections, multiple 15 overlay tunnels in series are used to achieve better path selection 16 and lower latency. In satellite communication, the end to end path 17 is split into two terrestrial segments and a satellite segment. 18 Packet losses can be caused both by random events or congestion in 19 various deployments. 21 Traditional end-to-end transport layers respond to packet loss slowly 22 especially in long-haul networks: They either wait for some signal 23 from the receiver to indicate a loss and then retransmit from the 24 sender or rely on sender's timeout which is often quite long. Non- 25 congestion caused packet loss may make the TCP sender over-reduce the 26 sending rate unnecessarily. With end-to-end encryption moving under 27 the transport (QUIC), traditional PEP (performance enhancing proxy) 28 techniques such as TCP splitting are no longer applicable. 30 LOOPS (Local Optimizations on Path Segments) aims to provide non end- 31 to-end, locally based in-network recovery to achieve better data 32 delivery by making packet loss recovery faster and by avoiding the 33 senders over-reducing their sending rate. In an overlay network 34 scenario, LOOPS can be performed over the existing, or purposely 35 created, overlay tunnel based path segments. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on November 25, 2019. 54 Copyright Notice 56 Copyright (c) 2019 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 72 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 73 2. Cloud-Internet Overlay Network . . . . . . . . . . . . . . . 5 74 2.1. Tail Loss or Loss in Short Flows . . . . . . . . . . . . 7 75 2.2. Packet Loss in Real Time Media Streams . . . . . . . . . 8 76 2.3. Packet Loss and Congestion Control in Bulk Data Transfer 8 77 2.4. Multipathing . . . . . . . . . . . . . . . . . . . . . . 9 78 3. Satellite Communication . . . . . . . . . . . . . . . . . . . 9 79 4. Features and Impacts to be Considered for LOOPS . . . . . . . 11 80 4.1. Local Recovery and End-to-end Retransmission . . . . . . 12 81 4.1.1. OE to OE Measurement, Recovery and Multipathing . . . 13 82 4.2. Congestion Control Interaction . . . . . . . . . . . . . 14 83 4.3. Overlay Protocol Extensions . . . . . . . . . . . . . . . 16 84 4.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 16 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 17 86 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 87 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 88 8. Informative References . . . . . . . . . . . . . . . . . . . 17 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 91 1. Introduction 93 Overlay tunnels are widely deployed for various networks, including 94 long haul WAN interconnection, enterprise wireless access networks, 95 etc. The end to end connection is partitioned into multiple path 96 segments using overlay tunnels. This serves a number of purposes, 97 for instance, selecting a better path over the WAN or delivering the 98 packets over heterogeneous network, such as enterprise access and 99 core networks. 101 A reliable transport layer normally employs some end-to-end 102 retransmission mechanisms which also address congestion control 103 [RFC0793] [RFC5681]. The sender either waits for the receiver to 104 send some signals on a packet loss or sets some form of timeout for 105 retransmission. For unreliable transport layer protocols such as RTP 106 [RFC3550], optional and limited usage of end-to-end retransmission is 107 employed to recover from packet loss [RFC4585] [RFC4588]. 109 End-to-end retransmission to recover lost packets is slow especially 110 when the network is long haul. When a path is partitioned into 111 multiple path segments that are realized as overlay tunnels, LOOPS 112 (Local Optimizations on Path Segments) tries to provide local segment 113 based in-network recovery to achieve better data delivery by making 114 packet loss recovery faster and by avoiding the senders over-reducing 115 their sending rate. In an overlay network scenario, LOOPS can be 116 performed over the existing, or purposely created, overlay tunnel 117 based path segments. 119 Some link types (satellite, microwave) may exhibit unusually high 120 loss rate in special conditions (e.g., fades due to heavy rain). The 121 traditional TCP sender interprets loss as congestion and over-reduces 122 the sending rate, degrading the throughput. LOOPS is also applicable 123 to such scenarios to improve throughput. 125 Section 2 presents some of the issues and opportunities found in 126 Cloud-Internet overlay networks that require higher performance and 127 more reliable packet transmission in best effort networks. Section 3 128 discusses applications of LOOPS in satellite communication. 129 Section 4 describes the corresponding solution features and the their 130 impact on existing network technologies. 132 ON=overlay node 133 UN=underlay node 135 +---------+ +---------+ 136 | App | <---------------- end-to-end ---------------> | App | 137 +---------+ +---------+ 138 |Transport| <---------------- end-to-end ---------------> |Transport| 139 +---------+ +---------+ 140 | | | | 141 | | +--+ path +--+ path segment2 +--+ | | 142 | | | |<-seg1->| |<--------------> | | | | 143 | Network | +--+ |ON| +--+ |ON| +--+ +----+ |ON| | Network | 144 | |--|UN|--| |--|UN|--| |--|UN|---| UN |--| |--| | 145 +---------+ +--+ +--+ +--+ +--+ +--+ +----+ +--+ +---------+ 146 End Host End Host 147 <---------------------------------> 148 LOOPS domain: path segment enables 149 optimizations for better local transport 151 Figure 1: LOOPS in Overlay Network Usage Scenario 153 1.1. Terminology 155 LOOPS: Local Optimizations on Path Segments. LOOPS includes the 156 local in-network (i.e. non end-to-end) recovery function, for 157 instance, loss detection and measurements. 159 LOOPS Node: Node supporting LOOPS functions. 161 Overlay Node (ON): Node having overlay functions (like overlay 162 protocol encapsulation/decapsulation, header modification, TLV 163 inspection) and LOOPS functions in LOOPS overlay network usage 164 scenario. Both OR and OE are Overlay Nodes. 166 Overlay Tunnel: A tunnel with designated ingress and egress nodes 167 using some network overlay protocol as encapsulation, optionally 168 with a specific traffic type. 170 Overlay Path: A channel within the overlay tunnel, where the traffic 171 transmitted on the channel needs to pass through zero or more 172 designated intermediate overlay nodes. There may be more than one 173 overlay path within an overlay tunnel when the different sets of 174 designated intermediate overlay nodes are specified. An overlay 175 path may contain multiple path segments. When an overlay tunnel 176 contains only one overlay path without any intermediate overlay 177 node specified, overlay path and overlay tunnel are used 178 interchangeably. 180 Overlay Edge (OE): Edge node of an overlay tunnel. 182 Overlay Relay (OR): Intermediate overlay node on an overlay path. 183 An overlay path need not contain any OR. 185 Path segment: Part of an overlay path between two neighbor overlay 186 nodes. It is used interchangeably with overlay segment in this 187 document when the context wants to emphasize on its overlay 188 encapsulated nature. An overlay path may contain multiple path 189 segments. When an overlay path contains only one path segment, 190 i.e. the segment is between two OEs, the path segment is 191 equivalent to the overlay path. It is also called segment for 192 simplicity in this document. 194 Overlay segment: Refers to path segment. 196 Underlay Node (UN): Nodes not participating in the overlay network 197 function. 199 2. Cloud-Internet Overlay Network 201 The Internet is a huge network of networks. The interconnections of 202 end devices using this global network are normally provided by ISPs 203 (Internet Service Provider). This network created by the composition 204 of the ISP networks is considered as the traditional Internet. CSPs 205 (Cloud Service Providers) are connecting their data centers using the 206 Internet or via self-constructed networks/links. This expands the 207 Internet's infrastructure and, together with the original ISP's 208 infrastructure, forms the Internet underlay. 210 NFV (network function virtualization) further makes it easier to 211 dynamically provision a new virtual node as a work load in a cloud 212 for CPU/storage intensive functions. With the aid of various 213 mechanisms such as kernel bypassing and Virtual IO, forwarding based 214 on virtual nodes is becoming more and more effective. The 215 interconnections among the purposely positioned virtual nodes and/or 216 the existing nodes with virtualization functions potentially form an 217 overlay of Internet. It is called the Cloud-Internet Overlay Network 218 (CION) in this document. 220 CION makes use of overlay technologies to direct the traffic going 221 through the specific overlay path regardless of the underlying 222 physical topology, in order to achieve better service delivery. It 223 purposely creates or selects overlay nodes (ON) from providers. By 224 continuously measuring the delay of path segments and use them as 225 metrics for path selection, when the number of overlay nodes is 226 sufficiently large, there is a high chance that a better path could 227 be found [DOI_10.1109_ICDCS.2016.49] [DOI_10.1145_3038912.3052560]. 229 [DOI_10.1145_3038912.3052560] further shows all cloud providers 230 experience random loss episodes and random loss accounts for more 231 than 35% of total loss. 233 Figure 2 shows an example of an overlay path over large geographic 234 distances. The path between two OEs (Overlay Edges) is an overlay 235 path. OEs are ON1 & ON4 in Figure 2. Part of the path between ONs 236 is a path segment. Figure 2 shows the overlay path with 3 segments, 237 i.e. ON1-ON2-ON3-ON4. ON is usually a virtual node, though it does 238 not have to be. Overlay path transmits packets in some form of 239 network overlay protocol encapsulation. ON has the computing and 240 memory resources that can be used for some functions like packet loss 241 detection, network measurement and feedback, packet recovery. 243 _____________ 244 / domain 1 \ 245 / \ 246 ___/ -------------\ 247 / \ 248 PoP1 ->--ON1 \ 249 | | ON4------>-- PoP2 250 | | ON2 ___|__/ 251 \__|_ |->| _____ / | 252 | \|__|__ / \ / | 253 | | | \____/ \__/ | 254 \|/ | | _____ | 255 | | | ___/ \ | 256 | | \|/ / \_____ | 257 | | | / domain 2 \ /|\ 258 | | | | ON3 | | 259 | | | \ |->| | | 260 | | | \_____|__|_______/ | 261 | /|\ | | \|/ | 262 | | | | | | 263 | | | /|\ | | 264 +--------------------------------------------------+ 265 | | | | | | | Internet | 266 | o--o o---o->---o o---o->--o--o underlay | 267 +--------------------------------------------------+ 269 Figure 2: Cloud-Internet Overlay Network (CION) 271 We tested based on 37 overlay nodes from multiple cloud providers 272 globally. Each pair of the overlay nodes are used as sender and 273 receiver. When the traffic is not intentionally directed to go 274 through any intermediate virtual nodes, we call the path that the 275 traffic takes the _default path_ in the test. When any of the 276 virtual nodes is intentionally used as an intermediate node to 277 forward the traffic, the path that the traffic takes is an _overlay 278 path_ in the test. The preliminary experiments showed that the delay 279 of an overlay path is shorter than that of the default path in 69% of 280 cases at 99% percentile and improvement is 17.5% at 99% percentile 281 when we probe Ping packets every second for a week. 283 Lower delay does not necessarily mean higher throughput. Different 284 path segments may have different packet loss rates. Loss rate is 285 another major factor impacting TCP throughput. From some customer 286 requirements, we set the target loss rate to be less than 1% at 99% 287 percentile and 99.9% percentile, respectively. The loss was measured 288 between any two overlay nodes, i.e. any potential path segment. Two 289 thousand Ping packets were sent every 20 seconds between two overlay 290 nodes for 55 hours. This preliminary experiment showed that the 291 packet loss rate satisfaction are 44.27% and 29.51% at the 99% and 292 99.9% percentiles respectively. 294 Hence packet loss in an overlay segment is a key issue to be solved 295 in CION. In long-haul networks, the end-to-end retransmission of 296 lost packet can result in an extra round trip time. Such extra time 297 is not acceptable in some cases. As CION naturally consists of 298 multiple overlay segments, LOOPS leverages this to perform local 299 optimizations on a single hop between two overlay nodes. ("Local" 300 here is a concept relative to end-to-end, it does not mean such 301 optimization is limited to LAN networks.) 303 The following subsections present different scenarios using multiple 304 segment based overlay paths with a common need of local in-network 305 loss recovery in best effort networks. 307 2.1. Tail Loss or Loss in Short Flows 309 When the lost segments are at the end of a transaction, TCP's fast 310 retransmit algorithm does not work as there are no ACKs to trigger 311 it. When a sender does not receive an ACK for a given segment within 312 a certain amount of time called retransmission timeout (RTO), it re- 313 sends the segment [RFC6298]. RTO can be as long as several seconds. 314 Hence the recovery of lost segments triggered by RTO is lengthy. 315 [I-D.dukkipati-tcpm-tcp-loss-probe] indicates that large RTOs make a 316 significant contribution to the long tail on the latency statistics 317 of short flows like web pages. 319 The short flow often completes in one or two RTTs. Even when the 320 loss is not a tail loss, it can possibly add another RTT because of 321 end-to-end retransmission (not enough packets are in flight to 322 trigger fast retransmit). In long haul networks, it can result in 323 extra time of tens or even hundreds of milliseconds. 325 An overlay segment transmits the aggregated flows from ON to ON. As 326 short flows are aggregated, the probability of tail loss over this 327 specific overlay segment decreases compared to an individual flow. 328 The overlay segment is much shorter than the end-to-end path in a 329 Cloud- Internet overlay network, hence loss recovery over an overlay 330 segment is faster. 332 2.2. Packet Loss in Real Time Media Streams 334 The Real-time transport protocol (RTP) is widely used in interactive 335 audio and video. Packet loss degrades the quality of the received 336 media. When the latency tolerance of the application is sufficiently 337 large, the RTP sender may use RTCP NACK feedback from the receiver 338 [RFC4585] to trigger the retransmission of the lost packets before 339 the playout time is reached at the receiver. 341 In a Cloud-Internet overlay network, the end-to-end path can be 342 hundreds of milliseconds. End-to-end feedback based retransmission 343 may be not be very useful when applications can not tolerate one more 344 RTT of this length. Loss recovery over an overlay segment can then 345 be used for the scenarios where RTCP NACK triggered retransmission is 346 not appropriate. 348 2.3. Packet Loss and Congestion Control in Bulk Data Transfer 350 TCP congestion control algorithms such as Reno and CUBIC basically 351 interpret packet loss as congestion experienced somewhere in the 352 path. When a loss is detected, the congestion window will be 353 decreased at the sender to make the sending slower. It has been 354 observed that packet loss is not an accurate way to detect congestion 355 in the current Internet [I-D.cardwell-iccrg-bbr-congestion-control]. 356 In long-haul links, when the loss is caused by non-persistent burst 357 which is extremely short and pretty random, the sender's reaction of 358 reducing sending rate is not able to respond in time to the 359 instantaneous path situation or to mitigate such bursts. On the 360 contrary, reducing window size at the sender unnecessarily or too 361 aggressively harms the throughput for application's long lasting 362 traffic like bulk data transfer. 364 The overlay nodes are distributed over the path with computing 365 capability, they are in a better position than the end hosts to 366 deduce the underlying links' instantaneous situation from measuring 367 the delay, loss or other metrics over the segment. Shorter round 368 trip time over a path segment will benefit more accurate and 369 immediate measurements for the maximum recent bandwidth available, 370 the minimum recent latency, or trend of change. ONs can further 371 decide if the sending rate reduction at the sender is necessary when 372 a loss happened. Section 4.2 talks more details on this. 374 2.4. Multipathing 376 As an overlay path may suffer from an impairment of the underlying 377 network, two or more overlay paths between the same set of ingress 378 and egress overlay nodes can be combined for reliability purpose. 379 During a transient time when a network impairment is detected, 380 sending replicating traffic over two paths can improve reliability. 382 When two or more disjoint overlay paths are available as shown in 383 Figure 3 from ON1 to ON2, different sets of traffic may use different 384 overlay paths. For instance, one path is for low latency and the 385 other is for higher bandwidth, or they can be simply used as load 386 balancing for better bandwidth utilization. 388 Two disjoint paths can usually be found by measuring to figure out 389 the segments with very low mathematical correlation in latency 390 change. When the number of overlay nodes is large, it is easy to 391 find disjoint or partially disjoint segments. 393 Different overlay paths may have varying characteristics. The 394 overlay tunnel should allow the overlay path to handle the packet 395 loss depending on its own path measurements. 397 ON-A 398 +----------o------------------+ 399 | | 400 | | 401 A -----o ON1 ON2o----- B 402 | | 403 +-----------------------o-----+ 404 ON-B 406 Figure 3: Multiple Overlay Paths 408 3. Satellite Communication 410 Traditionally, satellite communications deploy PEP (performance 411 enhancing proxy) nodes around the satellite link to enhance end-to- 412 end performance. TCP splitting is a common approach employed by such 413 PEPs, where the TCP connection is split into three: the segment 414 before the satellite hop, the satellite section (uplink, downlink), 415 and the segment behind the satellite hop. This requires heavy 416 interactions with the end-to-end transport protocols, usually without 417 the explicit consent of the end hosts. Unfortunately, this is 418 indistinguishable from a man-in-the-middle attack on TCP. With end- 419 to-end encryption moving under the transport (QUIC), this approach is 420 no longer useful. 422 Geosynchronous Earth Orbit (GEO) satellites have a one-way delay (up 423 to the satellite and back) on the order of 250 milliseconds. This 424 does not include queueing, coding and other delays in the satellite 425 ground equipment. The Round Trip Time for a TCP or QUIC connection 426 going over a satellite hop in both directions, in the best case, will 427 be on the order of 600 milliseconds. And, it may be considerably 428 longer. RTTs on this order of magnitude have significant performance 429 implications. 431 Packet loss recovery is an area where splitting the TCP connection 432 into different parts helps. Packets lost on the terrestrial links 433 can be recovered at terrestrial latencies. Packet loss on the 434 satellite link can be recovered more quickly by an optimized for 435 satellite protocol between the PEPs and/or link layer FEC than they 436 could be end to end. Again, encryption makes TCP splitting no longer 437 applicable. Enhanced error recovery at the satellite link layer 438 helps for the loss on the satellite link but doesn't help for the 439 terrestrial links. Even when the terrestrial segments are short, any 440 loss must be recovered across the satellite link delay. And, there 441 are cases when a satellite ground station connects to the general 442 Internet with a potentially larger terrestrial segment (e.g., to a 443 correspondent host in another country). Faster recovery over such 444 long terrestrial segments is desirable. 446 Another aspect of recovery is that terrestrial loss is highly likely 447 to be congestion related but satellite loss is more likely to be 448 transmission errors due to link conditions. A transport endpoint 449 slowing down because of mis-interpreting these errors as congestion 450 losses unnecessarily reduces performance. But, at the end points, 451 the difference between the two is not easily distinguished. To 452 elaborate more on the loss recovery for satellite communications, 453 while the error rate on the satellite paths is generally very low 454 most of the time, it might get higher during special link conditions 455 (e.g. fades due to heavy rain). The satellite hop itself does know 456 which losses are due to link conditions as opposed to congestion, but 457 it has no mechanism to signal this difference to the end hosts. 459 We will need the protocol under QUIC to try to minimize non- 460 congestion packet drop. Specific link layers may have techniques 461 such as satellite FEC to recover. Where the capabilities of that may 462 be exceeded (e.g., rain fade), we can look at LOOPS-like approaches. 464 There are two high level classes of solutions for making encrypted 465 transport traffic like QUIC work well over satellite: 467 o Hooks in the protocol which can adapt to large BDPs where both the 468 bandwidth and the latency are large. This would require end to 469 end enhancement. 471 o Capabilities (such as LOOPS) under the protocol to improve 472 performance over specific segments of the path. In particular, 473 separating the terrestrial from the satellite losses. Fixing the 474 terrestrial loss quickly and keeping throughput high over 475 satellite segment by not causing the end-hosts to over-reduce 476 their sending window in case of non-congestion loss. 478 This document focuses on the latter. 480 4. Features and Impacts to be Considered for LOOPS 482 LOOPS (Localized Optimizations of Path Segments) aims to leverage the 483 virtual nodes in a selected path to improve the transport performance 484 "locally" instead of end-to-end as those nodes have partitioned the 485 path to multiple segments. With the technologies like NFV (Network 486 function virtualization) and virtual IO, it is easier to add 487 functions to virtual nodes and even the forwarding on those virtual 488 nodes is getting more efficient. Some overlay protocols such as 489 VXLAN [RFC7348], GENEVE [I-D.ietf-nvo3-geneve], LISP [RFC6830] or 490 CAPWAP [RFC5415] are assumed to be employed in the network. In 491 overlay network usage scenario, LOOPS can extend a specific overlay 492 protocol header to perform local measurement and local recovery 493 functions, like the example shown in Figure 4. 495 +------------+------------+-----------------+---------+---------+ 496 |Outer IP hdr|Overlay hdr |LOOPS information|Inner hdr|payload | 497 +------------+------------+-----------------+---------+---------+ 499 Figure 4: LOOPS Extension Header Example 501 LOOPS uses packet number space independent from that of the transport 502 layer. Acknowledgment should be generated from ON receiver to ON 503 sender for packet loss detection and local measurement. To reduce 504 overhead, negative ACK over each path segment is a good choice here. 505 A Timestamp echo mechanism, analogous to TCP's Timestamp option, 506 should be employed in band in LOOPS extension to measure the local 507 RTT and variation for an overlay segment. Local in-network recovery 508 is performed. The measurement over segment is expected to give a 509 hint on whether the lost packet of locally recovered one was caused 510 by congestion. Such a hint could be further feedback, using like by 511 ECN Congestion Experienced (CE) markings, to the end host sender. It 512 directs the end host sender if congestion window adjustment is 513 necessary. LOOPS normally works on the overlay segment which 514 aggregates the same type of traffic, for instance TCP traffic or 515 finer granularity like TCP throughput sensitive traffic. LOOPS does 516 not look into the inner packet. Elements to be considered in LOOPS 517 are discussed briefly here. 519 4.1. Local Recovery and End-to-end Retransmission 521 There are basically two ways to perform local recovery, 522 retransmission and FEC (forward error correction). They are possibly 523 used together in some cases. Such approaches between two overlay 524 nodes recover the lost packet in relatively shorter distance and thus 525 shorter latency. Therefore the local recovery is always faster 526 compared to end-to- end. 528 At the same time, most transport layer protocols have their own end- 529 to-end retransmission to recover the lost packet. It would be ideal 530 that end-to-end retransmission at the sender was not triggered if the 531 local recovery was successful. 533 End-to-end retransmission is normally triggered by a NACK as in RTCP 534 or multiple duplicate ACKs as in TCP. 536 When FEC is used for local recovery, it may come with a buffer to 537 make sure the recovered packets delivered are in order subsequently. 538 Therefore the receiver side is unlikely to see the out-of-order 539 packets and then send a NACK or multiple duplicate ACKs. The side 540 effect to unnecessarily trigger end-to-end retransmit is minimum. 541 When FEC is used, if redundancy and block size are determined, extra 542 latency required to recover lost packets is also bounded. Then RTT 543 variation caused by it is predictable. In some extreme case like a 544 large number of packet loss caused by persistent burst, FEC may not 545 be able to recover it. Then end-to-end retransmit will work as a 546 last resort. In summary, when FEC is used as local recovery, the 547 impact on end-to-end retransmission is limited. 549 When retransmission is used, more care is required. 551 For packet loss in RTP streaming, retransmission can recover those 552 packets which would not be retransmitted end-to-end otherwise due to 553 long RTT. It would be ideal if the retransmitted packet reaches the 554 receiver before it sends back information that the sender would 555 interpret as a NACK for the lost packet. Therefore when the 556 segment(s) being retransmitted is a small portion of the whole end to 557 end path, the retransmission will have a significant effect of 558 improving the quality at receiver. When the sender also re-transmits 559 the packet based on a NACK received, the receiver will receive the 560 duplicated retransmitted packets and should ignore the duplication. 562 For packet loss in TCP flows, TCP RENO and CUBIC use duplicate ACKs 563 as a loss signal to trigger the fast retransmit. There are different 564 ways to avoid the sender's end-to-end retransmission being triggered 565 prematurely: 567 o The egress overlay node can buffer the out-of-order packets for a 568 while, giving a limited time for a packet being retransmitted 569 somewhere in the overlay path to reach it. The retransmitted 570 packet and the buffered packets caused by it may increase the RTT 571 variation at the sender. When the retransmitted latency is a 572 small portion of RTT or the loss is rare, such RTT variation will 573 be smoothed without much impact. Another possible way is to make 574 the sender exclude such packets from the RTT measurement. The 575 locally recovered packets can be specially marked and this marking 576 is spin back to end host sender. Then RTT measurement should not 577 use that packet. 579 The buffer management is nontrivial in this case. It has to be 580 determined how many out-of-order packets can be buffered at the 581 egress overlay node before it gives up waiting for a successful 582 local retransmission. As the lost packet is not always recovered 583 successfully locally, the sender may invoke end-to-end fast 584 retransmit slower than it would be in classic TCP. 586 o If LOOPS network does not buffer the out-of-order packets caused 587 by packet loss, TCP sender can use a time based loss detection 588 like RACK [I-D.ietf-tcpm-rack] to prevent the TCP sender from 589 invoking fast retransmit too early. RACK uses the notion of time 590 to replace the conventional DUPACK threshold approach to detect 591 losses. RACK is required to be tuned to fit the local 592 retransmission better. If there are n similar segments over the 593 path, segment retransmission will at least add RTT/n to the 594 reordering window by average when the packet is lost only once 595 over the whole overlay path. This approach is more preferred than 596 one described in previous bullet. On the other hand, if time 597 based loss detection is not supported at the sender, end to end 598 retransmission will be invoked as usual. It wastes some 599 bandwidth. 601 4.1.1. OE to OE Measurement, Recovery and Multipathing 603 When local recovery is between two neighbor ONs, it is called per-hop 604 recovery. It can be between overlay relays or between overlay relay 605 and overlay edge. Another type of local recovery is called OE to OE 606 recovery which performs between overlay edge nodes. When the 607 segments of an overlay path have similar characteristics and/or only 608 OE has the expected processing capability, OE to OE based local 609 recovery can be used instead of per-hop recovery. 611 If there is more than one overlay path in an overlay tunnel, 612 multipathing splits and recombines the traffic. Measurements such as 613 round trip time and loss rate between OEs hav to be specific to each 614 path. The ingress OE can use the feedback measurement to determine 615 the FEC parameter settings for different path. FEC can also be 616 configured to work over the combined path. The egress OE must be 617 able to remove the replicated packet when overlay path is switched 618 during impairment. 620 OE to OE measurement can help each segment determine its proportion 621 in edge to edge delay. It is useful for ON to decide if it is 622 necessary to turn on the per-hop recovery or how to fine tune the 623 parameter settings. When the segment delay ratio is small, the 624 segment retransmission is more effective. 626 4.2. Congestion Control Interaction 628 When a TCP-like transport layer protocol is used, local recovery in 629 LOOPS has to interact with the upper layer transport congestion 630 control. Classic TCP adjusts the congestion window when a loss is 631 detected and fast retransmit is invoked. 633 The local recovery mechanism breaks the assumption of the necessary 634 and sufficient conditional relationship between detected packet loss 635 and congestion control trigger at the sender in classic TCP. The 636 loss that is locally recovered can be caused by a non-persistent 637 congestion such as a microburst or a random loss, both of which 638 ideally would not let the sender invoke the congestion control 639 mechanism. But then, it can also possibly caused by a real 640 persistent congestion which should let the sender invoke sending rate 641 reduction. In either case, the sender does not see the locally 642 recovered packet as a loss. 644 When the local recovery takes effect, we consider the following two 645 cases. Firstly, the classic TCP sender does not see the enough 646 number of duplicate ACKs to trigger fast retransmit. This could be 647 the result of in-order packet delivery including locally recovered 648 ones to the receiver as mentioned in last subsection. Classic TCP 649 sender in this case will not reduce congestion window as no loss is 650 detected. Secondly, if a time based loss detection such as RACK is 651 used, as long as the locally recovered packet's ACK reaches the 652 sender before the reordering window expires, the congestion window 653 will not be reduced. 655 Such behavior brings the desirable throughput improvement when the 656 recovered packet is lost due to non-persistent congestion. It solves 657 the throughput problem mentioned in Section 2.3 and Section 3. 658 However, it also brings the risk that the sender is not able to 659 detect the real persistent congestion in time and then overshoot. 660 Eventually a severe congestion that is not recoverable by a local 661 recovery mechanism may occur. In addition, it may be unfriendly to 662 other flows (possibly pushing them out) if those flows are running 663 over the same underlying bottleneck links. 665 There is a spectrum of approaches. On one end, each locally 666 recovered packet can be treated exactly as a loss in order to invoke 667 the congestion control at the sender to guarantee the fair sharing as 668 classic TCP by setting its CE (Congestion Experienced) bit. Explicit 669 Congestion Notification (ECN) can be used here as ECN marking was 670 required to be equivalent to a packet drop [RFC3168]. Congestion 671 control at the sender works as usual and no throughput improvement 672 could be achieved (although the benefit of faster recovery is still 673 there). On the other hand, ON can perform its congestion measurement 674 over the segment, for instance local RTT and its variation trend. 675 Then the lost packet can be determined if it was caused by congestion 676 or other factors. It will further decide if it is necessary to set 677 CE marking or even what ratio is set to make the sender adjust the 678 sending rate more correctly. 680 There are possible cases that the sender detects the loss even with 681 local recovery in function. For example, when the re-ordering window 682 in RACK is not optimally adapted, the sender may trigger the 683 congestion control at the same time of end-to-end retransmission. If 684 spurious retransmission detection based on DSACK [RFC3708] is used, 685 such end-to-end retransmission will be found out unnecessary when 686 locally recovered packets reaches the receiver successfully. Then 687 congestion control changes will be undone at the sender. This 688 results in similar pros and cons as described earlier. Pros are 689 preventing the unnecessary window reduction and improving the 690 throughput when the loss is caused by non-persistent congestion or 691 random loss. Cons are some mechanisms like ECN or its variants 692 should be used wisely to make sure the congestion control is invoked 693 in case of persistent congestion. 695 An approach where the losses on a path segment are not immediately 696 made known to the end-to-end congestion control can be combined with 697 a "circuit breaker" style congestion control on the path segment. 698 When the usage of path segment by the overlay flow starts to become 699 unfair, the path segment sends congestion signals up to the end-to- 700 end congestion control. This must be carefully tuned to avoid 701 unwanted oscillation. 703 In summary, local recovery can improve Flow Completion Time (FCT) by 704 eliminating tail loss in small flows. As it changes loss event to 705 out-of-order event in most cases to TCP sender, if TCP sender uses 706 loss based congestion control, there is some implication on the 707 throughput. We suggest ECN and spurious retransmission to be enabled 708 when local recovery is in use, it would give the desirable 709 throughput, i.e. when loss is caused by congestion, reduce congestion 710 window; otherwise keep sender's sending rate. We do not suggest to 711 use spurious retransmission alone together with local recovery as it 712 may cause the TCP sender falsely undo window reduction when 713 congestion occurs. If only ECN is enabled or neither ECN nor 714 spurious retransmission is enabled, the throughput with local 715 recovery in use is no much difference from that of the tradition TCP. 717 4.3. Overlay Protocol Extensions 719 The overlay usually has no control over how packets are routed in the 720 underlying network between two overlay nodes, but it can control, for 721 example, the sequence of overlay nodes a message traverses before 722 reaching its destination. LOOPS assumes the overlay protocol can 723 deliver the packets in such designated sequence. Most forms of 724 overlay networking use some sort of "encapsulation". The whole path 725 taken can be performed by stitching multiple short overlay paths, 726 like VXLAN [RFC7348], GENEVE [I-D.ietf-nvo3-geneve], or it can be a 727 single overlay path with a sequence of intermediate overlay nodes 728 specified, as in SRv6 [I-D.ietf-6man-segment-routing-header]. In 729 either way, LOOPS information is required to be embedded in those 730 protocols to support the data plane measurement and feedback. 731 Retransmission or FEC based loss recovery can be either per ON-hop 732 based or OE to OE based. 734 LOOPS alone has no setup requirement on control plane. Some overlay 735 protocol, e.g. CAPWAP [RFC5415], has session setup phase, we can use 736 it to exchange the information such as dynamic FEC parameters. 738 4.4. Summary 740 LOOPS is expected to extend the existing overlay protocols in data 741 plane. Path selection is assumed a feature provided by the overlay 742 protocols via SDN or other approaches and is not a part of LOOPS. 743 LOOPS is a set of functions to be implemented on ONs in a long haul 744 overlay network. LOOPS includes the following features. 746 1. Local recovery. Retransmission, FEC or hybrid can be used as 747 local recovery method. Such recovery mechanism is in-network. 748 It is performed by two network nodes with computing and memory 749 resources. 751 2. Local congestion measurement. Sender ON measures the local 752 segment RTT, loss and/or throughput to immediately get the 753 overlay segment status. 755 3. Signal to end to end congestion control. Strategy to set/not set 756 ECN CE marking or simply drop the packet to signal the end host 757 sender about the loss event to help adjust the sending rate. 759 5. Security Considerations 761 LOOPS does not look at the traffic payload, so encrypted payload does 762 not affect functionality of LOOPS. The use of LOOPS introduces some 763 issues which impact security. ON with LOOPS function represents a 764 point in the network where the traffic can be potentially 765 manipulated. Denial of service attack can be launched from an ON. A 766 rogue ON might be able to spoof packet as if it come from a 767 legitimate ON. It may also modify the ECN CE marking in packets to 768 influence the sender's rate. In order to protected from such 769 attacks, the overlay protocol itself should have some build-in 770 security protection which inherently be used by LOOPS. The operator 771 should use some authentication mechanism to make sure ONs are valid 772 and non-compromised. 774 6. IANA Considerations 776 No IANA action is required. 778 7. Acknowledgements 780 Thanks to etosat mailing list about the discussion about the SatCom 781 and LOOPS use case. 783 8. Informative References 785 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 786 RFC 793, DOI 10.17487/RFC0793, September 1981, 787 . 789 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 790 of Explicit Congestion Notification (ECN) to IP", 791 RFC 3168, DOI 10.17487/RFC3168, September 2001, 792 . 794 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 795 Jacobson, "RTP: A Transport Protocol for Real-Time 796 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 797 July 2003, . 799 [RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective 800 Acknowledgement (DSACKs) and Stream Control Transmission 801 Protocol (SCTP) Duplicate Transmission Sequence Numbers 802 (TSNs) to Detect Spurious Retransmissions", RFC 3708, 803 DOI 10.17487/RFC3708, February 2004, 804 . 806 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 807 "Extended RTP Profile for Real-time Transport Control 808 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 809 DOI 10.17487/RFC4585, July 2006, 810 . 812 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 813 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 814 DOI 10.17487/RFC4588, July 2006, 815 . 817 [RFC5415] Calhoun, P., Ed., Montemurro, M., Ed., and D. Stanley, 818 Ed., "Control And Provisioning of Wireless Access Points 819 (CAPWAP) Protocol Specification", RFC 5415, 820 DOI 10.17487/RFC5415, March 2009, 821 . 823 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 824 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 825 . 827 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 828 "Computing TCP's Retransmission Timer", RFC 6298, 829 DOI 10.17487/RFC6298, June 2011, 830 . 832 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 833 Locator/ID Separation Protocol (LISP)", RFC 6830, 834 DOI 10.17487/RFC6830, January 2013, 835 . 837 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 838 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 839 eXtensible Local Area Network (VXLAN): A Framework for 840 Overlaying Virtualized Layer 2 Networks over Layer 3 841 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 842 . 844 [I-D.dukkipati-tcpm-tcp-loss-probe] 845 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 846 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 847 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 848 in progress), February 2013. 850 [I-D.ietf-nvo3-geneve] 851 Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic 852 Network Virtualization Encapsulation", draft-ietf- 853 nvo3-geneve-13 (work in progress), March 2019. 855 [I-D.ietf-tcpm-rack] 856 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 857 a time-based fast loss detection algorithm for TCP", 858 draft-ietf-tcpm-rack-05 (work in progress), April 2019. 860 [I-D.ietf-6man-segment-routing-header] 861 Filsfils, C., Dukes, D., Previdi, S., Leddy, J., 862 Matsushima, S., and d. daniel.voyer@bell.ca, "IPv6 Segment 863 Routing Header (SRH)", draft-ietf-6man-segment-routing- 864 header-19 (work in progress), May 2019. 866 [I-D.cardwell-iccrg-bbr-congestion-control] 867 Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson, 868 "BBR Congestion Control", draft-cardwell-iccrg-bbr- 869 congestion-control-00 (work in progress), July 2017. 871 [DOI_10.1109_ICDCS.2016.49] 872 Cai, C., Le, F., Sun, X., Xie, G., Jamjoom, H., and R. 873 Campbell, "CRONets: Cloud-Routed Overlay Networks", 2016 874 IEEE 36th International Conference on Distributed 875 Computing Systems (ICDCS), DOI 10.1109/icdcs.2016.49, June 876 2016. 878 [DOI_10.1145_3038912.3052560] 879 Haq, O., Raja, M., and F. Dogar, "Measuring and Improving 880 the Reliability of Wide-Area Cloud Paths", Proceedings of 881 the 26th International Conference on World Wide Web - 882 WWW '17, DOI 10.1145/3038912.3052560, 2017. 884 Authors' Addresses 886 Yizhou Li 887 Huawei Technologies 888 101 Software Avenue, 889 Nanjing 210012 890 China 892 Phone: +86-25-56624584 893 Email: liyizhou@huawei.com 895 Xingwang Zhou 896 Huawei Technologies 897 101 Software Avenue, 898 Nanjing 210012 899 China 901 Email: zhouxingwang@huawei.com