idnits 2.17.1 draft-li-tsvwg-loops-problem-opportunities-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 07, 2020) is 1564 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-14 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-06 == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Y. Li 3 Internet-Draft X. Zhou 4 Intended status: Informational Huawei 5 Expires: July 10, 2020 M. Boucadair 6 Orange 7 J. Wang 8 China Telecom 9 January 07, 2020 11 LOOPS (Localized Optimizations on Path Segments) Problem Statement and 12 Opportunities for Network-Assisted Performance Enhancement 13 draft-li-tsvwg-loops-problem-opportunities-04 15 Abstract 17 In various network deployments, end to end forwarding paths are 18 partitioned into multiple segments. For example, in some cloud-based 19 WAN communications, stitching multiple overlay tunnels are used for 20 traffic policy enforcement matters such as to optimize traffic 21 distribution or to select paths exposing a lower latency. Likewise, 22 in satellite communications, the communication path is decomposed 23 into two terrestrial segments and a satellite segment. Such long- 24 haul paths are naturally composed of multiple network segments with 25 various encapsulation schemes. Packet loss may show different 26 characteristics on different segments. 28 Traditional transport protocols (e.g., TCP) respond to packet loss 29 slowly especially in long-haul networks: they either wait for some 30 signal from the receiver to indicate a loss and then retransmit from 31 the sender or rely on sender's timeout which is often quite long. 32 Non-congestive loss may make the TCP sender over-reduce the sending 33 rate unnecessarily. With the increase of end-to-end transport 34 encryption (e.g., QUIC), traditional PEP (performance enhancing 35 proxy) techniques such as TCP splitting are no longer applicable. 37 LOOPS (Local Optimizations on Path Segments) is a network-assisted 38 performance enhancement over path segment and it aims to provide 39 local in-network recovery to achieve better data delivery by making 40 packet loss recovery faster and by avoiding the senders over-reducing 41 their sending rate. In an overlay network scenario, LOOPS can be 42 performed over a variety of the existing, or purposely created, 43 tunnel-based path segments. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on July 10, 2020. 62 Copyright Notice 64 Copyright (c) 2020 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents 69 (https://trustee.ietf.org/license-info) in effect on the date of 70 publication of this document. Please review these documents 71 carefully, as they describe your rights and restrictions with respect 72 to this document. Code Components extracted from this document must 73 include Simplified BSD License text as described in Section 4.e of 74 the Trust Legal Provisions and are provided without warranty as 75 described in the Simplified BSD License. 77 Table of Contents 79 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 80 1.1. The Problem . . . . . . . . . . . . . . . . . . . . . . . 3 81 1.2. Sketching a Work Direction: Rationale & Goals . . . . . . 4 82 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 83 3. Cloud-Internet Overlay Network . . . . . . . . . . . . . . . 7 84 3.1. Tail Loss or Loss in Short Flows . . . . . . . . . . . . 9 85 3.2. Packet Loss in Real Time Media Streams . . . . . . . . . 9 86 3.3. Packet Loss and Congestion Control in Bulk Data Transfer 10 87 3.4. Multipathing . . . . . . . . . . . . . . . . . . . . . . 10 88 4. Satellite Communication . . . . . . . . . . . . . . . . . . . 11 89 5. Branch Office WAN Connection . . . . . . . . . . . . . . . . 13 90 6. Features and Impacts to be Considered for LOOPS . . . . . . . 14 91 6.1. Local Recovery and End-to-end Retransmission . . . . . . 15 92 6.1.1. OE to OE Measurement, Recovery, and Multipathing . . 17 94 6.2. Congestion Control Interaction . . . . . . . . . . . . . 18 95 6.3. Overlay Protocol Extensions . . . . . . . . . . . . . . . 19 96 6.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20 97 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 98 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 99 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 100 10. Informative References . . . . . . . . . . . . . . . . . . . 21 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 103 1. Introduction 105 1.1. The Problem 107 Tunnels are widely deployed within many networks to achieve various 108 engineering goals, including long-haul WAN interconnection or 109 enterprise wireless access networks. A connection between two 110 endpoints can be decomposed into many connection legs. As such, the 111 corresponding forwarding path can be partitioned into multiple path 112 segments that some of them are using network overlays by means of 113 tunnels. This design serves a number of purposes such as steering 114 the traffic, optimize egress/ingress link utilization, optimize 115 traffic performance metrics (such as delay, delay variation, or 116 loss), optimize resource utilization by invoking resource bonding, 117 provide high-availability, etc. 119 A reliable transport layer normally employs some end-to-end 120 retransmission mechanisms which also address congestion control 121 [RFC0793] [RFC5681]. The sender either waits for the receiver to 122 send some signals on a packet loss or sets some form of timeout for 123 retransmission. For unreliable transport protocols such as RTP 124 [RFC3550], optional and limited usage of end-to-end retransmission is 125 employed to recover from packet loss [RFC4585] [RFC4588]. 127 End-to-end retransmission to recover lost packets is slow especially 128 when the network is long-haul. When a path is partitioned into 129 multiple path segments that are realized typically as overlay 130 tunnels, LOOPS (Local Optimizations on Path Segments) aims to provide 131 local segment based in-network recovery to achieve better data 132 delivery by making packet loss recovery faster and by avoiding the 133 senders over-reducing their sending rate. In an overlay network 134 scenario, LOOPS can be performed over the existing, or purposely 135 created, overlay tunnel based path segments. Figure 1 show a basic 136 usage scenario of LOOPS. 138 Some link types (satellite, microwave, drone-based networking, etc.) 139 may exhibit unusually high loss rate in special conditions (e.g., 140 fades due to heavy rain). The traditional TCP sender interprets loss 141 as congestion and over-reduces the sending rate, degrading the 142 throughput. LOOPS is also applicable to such scenarios to improve 143 the throughput. 145 Also, multiple paths may be available in the network that may be used 146 for better performance. These paths are not visible to endpoints. 147 Means to make use of these paths while ensuring the overall 148 performance is enhanced would contribute to customer satisfaction. 149 Blindly implementing link aggregation may lead to undesired effects 150 (e.g., underperform compared to single path). 152 1.2. Sketching a Work Direction: Rationale & Goals 154 This document sketches a proposal that is meant to experimentally 155 investigate to what extent a network-assisted approach can contribute 156 to increase the overall perceived quality of experience in specific 157 situations (e.g., Sections 3.5 and 3.6 of [RFC8517]) without 158 requiring access to internal transport primitives. The rationale 159 beneath this approach is that some information (loss detection, 160 better visibility on available paths and their characteristics, etc.) 161 can be used to trigger local actions while avoiding as much as 162 possible undesired side effects (e.g., expose a behavior that would 163 be interpreted by an endpoint as an anomaly (corrupt data) and which 164 would lead to exacerbate end-to-end recovery. Such local actions 165 would have a faster effect (e.g., faster recovery, used multiple 166 paths simultaneously). 168 To that aim, the work is structured into two (2) phased stages: 170 o Stage 1: Network-assisted optimization. This one assumes that 171 optimizations (e.g., support latency-sensitive applications) can 172 be implemented at the network without requiring defining new 173 interaction with the endpoint. Existing tools such as ECN will be 174 used. Some of these optimizations may be valuable in deployments 175 where communications are established over paths that are not 176 exposing the same performance characteristics. 178 o Stage 2: Collaborative networking optimization. This one requires 179 more interaction between the network and an endpoint to implement 180 coordinated and more surgical network-assisted optimizations based 181 on information/instructions shared by an endpoint or sharing 182 locally-visible information with endpoint for better and faster 183 recovery. 185 The document focuses on the first stage. Effort related to the 186 second stage is out of scope of the initial planned work. 187 Nevertheless, future work will be planned once progress is 188 (hopefully) made on the first stage. 190 The proposed mechanism is not meant to be applied to all traffic, but 191 only to a subset which is eligible to the network-assisted 192 optimization service. 194 Which traffic is eligible is deployment-specific and policy-based. 195 For example, techniques for dynamic information of optimization 196 function (e.g., SFC) may be leveraged to unambiguously identify the 197 aggregate of traffic that is eligible to the service. Such 198 identification may be triggered by subscription actions made by 199 customers or be provided by a network provider (e.g., specific- 200 applications, during specific events such as during severe DDoS 201 attack or flash crowds events). 203 Likewise, whether the optimization function is permanently 204 instantiated or on-demand is deployment-specific. 206 This document does not intend to provide a comprehensive list of 207 target deployment cases. Sample scenarios are described to 208 illustrate some LOOPS potentials. Similar issues and optimizations 209 may be helpful in other deployments such as enhancing the reliability 210 of data transfer when a fleet of drones are used for specific 211 missions (e.g., site inspection, live streaming, and emergency 212 service). Captured data should be reliably transmitted via paths 213 involving radio connections. 215 It is not required that all segments are LOOPS-aware to benefit from 216 LOOPS advantages. 218 Section 3 presents some of the issues and opportunities found in 219 Cloud-Internet overlay networks that require higher performance and 220 more reliable packet transmission over best effort networks. 221 Section 4 discusses applications of LOOPS in satellite communication. 222 Section 6 describes the corresponding solution features and their 223 impact on existing network technologies. 225 ON=overlay node 226 UN=underlay node 228 +---------+ +---------+ 229 | App | <---------------- end-to-end ---------------> | App | 230 +---------+ +---------+ 231 |Transport| <---------------- end-to-end ---------------> |Transport| 232 +---------+ +---------+ 233 | | | | 234 | | +--+ path +--+ path segment2 +--+ | | 235 | | | |<-seg1->| |<--------------> | | | | 236 | Network | +--+ |ON| +--+ |ON| +--+ +----+ |ON| | Network | 237 | |--|UN|--| |--|UN|--| |--|UN|---| UN |--| |--| | 238 +---------+ +--+ +--+ +--+ +--+ +--+ +----+ +--+ +---------+ 239 End Host End Host 240 <---------------------------------> 241 LOOPS domain: path segment enables 242 local optimizations for better experience 244 Figure 1: LOOPS in Overlay Network Usage Scenario 246 2. Terminology 248 This document makes use of the following terms: 250 LOOPS: Local Optimizations on Path Segments. LOOPS includes to the 251 local in-network (i.e., non end-to-end) recovery functions and 252 other supporting features such as local measurement, loss 253 detection, and congestion feedback. 255 LOOPS Node: A node supporting LOOPS functions. 257 Overlay Node (ON): A node having overlay functions (e.g., overlay 258 protocol encapsulation/decapsulation, header modification, TLV 259 inspection) and LOOPS functions in LOOPS overlay network usage 260 scenario. 262 Overlay Tunnel: A tunnel with designated ingress and egress nodes 263 using some network overlay protocol as encapsulation, optionally 264 with a specific traffic type. 266 Overlay Edge (OE): Edge node of an overlay tunnel. It can behave as 267 ingress or egress as a function of the traffic direction. 269 Path segment: A LOOPS enabled tunnel-based network subpath. It is 270 used interchangeably with overlay segment in this document when 271 the context wants to emphasize on its overlay encapsulated nature. 272 It is also called segment for simplicity in this document. 274 Overlay segment: Refers to path segment. 276 Underlay Node (UN): A node not participating in the overlay network. 278 3. Cloud-Internet Overlay Network 280 CSPs (Cloud Service Providers) are connecting their data centers 281 using the Internet or via self-constructed networks/links. This 282 expands the traditional Internet's infrastructure and, together with 283 the original ISP's infrastructure, forms the Internet underlay. 285 Automation techniques and NFV (Network Function Virtualization) 286 further ambitions to make it easier to dynamically provision a new 287 virtual node/function as a workload in a cloud for CPU/storage 288 intensive functions. With the aid of various mechanisms such as 289 kernel bypassing and Virtual IO, forwarding based on virtual nodes is 290 becoming more and more effective. The interconnection among the 291 purposely positioned virtual nodes and/or the existing nodes with 292 virtualization functions potentially form an overlay infrastructure. 293 It is called the Cloud-Internet Overlay Network (CION) in this 294 document for short. 296 This architecture scenario makes use of overlay technologies to 297 direct the traffic going through the specific overlay path regardless 298 of the underlying physical topology, in order to achieve better 299 service delivery. It purposely creates or selects overlay nodes (ON) 300 from providers. By continuously measuring the delay of path segments 301 and use them as metrics for path selection, when the number of 302 overlay nodes is sufficiently large, there is a high chance that a 303 better path could be found [DOI_10.1109_ICDCS.2016.49] 304 [DOI_10.1145_3038912.3052560]. [DOI_10.1145_3038912.3052560] further 305 shows all cloud providers experience random loss episodes and random 306 loss accounts for more than 35% of total loss. 308 Some of the considerations that are discussed below may also apply 309 for interconnecting DCs owned by a network provider. 311 Figure 2 shows an example of an overlay path over large geographic 312 distances. Three path segments, i.e., ON1-ON2, ON2-ON3, ON3-ON4 are 313 shown. ON is usually a virtual node, though it does not have to be. 314 Each segment transmits packets using some form of network overlay 315 protocol encapsulation. ON has the computing and memory resources 316 that can be used for some functions like packet loss detection, 317 network measurement and feedback, and packet recovery. ONs are 318 managed by a single administrator though they can be workloads 319 created from different CSPs. 321 _____________ 322 / domain 1 \ 323 / \ 324 ___/ -------------\ 325 / \ 326 PoP1 ->--ON1 \ 327 | | ON4------>-- PoP2 328 | | ON2 ___|__/ 329 \__|_ |->| _____ / | 330 | \|__|__ / \ / | 331 | | | \____/ \__/ | 332 \|/ | | _____ | 333 | | | ___/ \ | 334 | | \|/ / \_____ | 335 | | | / domain 2 \ /|\ 336 | | | | ON3 | | 337 | | | \ |->| | | 338 | | | \_____|__|_______/ | 339 | /|\ | | \|/ | 340 | | | | | | 341 | | | /|\ | | 342 +--------------------------------------------------+ 343 | | | | | | | Internet | 344 | o--o o---o->---o o---o->--o--o underlay | 345 +--------------------------------------------------+ 347 Figure 2: Cloud-Internet Overlay Network (CION) 349 We tested based on 37 overlay nodes from multiple cloud providers 350 globally. Each pair of the overlay nodes are used as sender and 351 receiver. When the traffic is not intentionally directed to go 352 through any intermediate virtual nodes, we call the path followed by 353 the traffic in the test as the default path. When any of the virtual 354 nodes is intentionally used as an intermediate node to forward the 355 traffic, the path that the traffic takes is called an overlay path. 356 The preliminary experiments showed that the delay of an overlay path 357 is shorter than the one of the default path in 69% of cases at 99% 358 percentile and improvement is 17.5% at 99% percentile when we probe 359 Ping packets every second for a week. More experimental information 360 can be found in [OCN]. 362 Lower delay does not necessarily mean higher throughput. Different 363 path segments may have different packet loss rates. Loss rate is 364 another major factor impacting the overall TCP throughput. From some 365 customer requirements, the target loss rate is set in the test to be 366 less than 1% at 99% percentile and 99.9% percentile, respectively. 367 The loss was measured between any two overlay nodes, i.e., any 368 potential path segment. Two thousand Ping packets were sent every 20 369 seconds between two overlay nodes for 55 hours. This preliminary 370 experiment showed that the packet loss rate satisfaction are 44.27% 371 and 29.51% at the 99% and 99.9% percentiles, respectively. 373 Hence packet loss in an overlay segment is a key issue to be solved 374 in such architecture. In long-haul networks, the end-to-end 375 retransmission of lost packet can result in an extra round trip time 376 (RTT). Such extra time is not acceptable in some latency-sensitive 377 applications. As CION naturally consists of multiple overlay 378 segments, LOOPS leverages this to perform local optimizations on a 379 single hop between two overlay nodes. ("Local" here is a concept 380 relative to end-to-end, it does not mean such optimization is limited 381 to LAN networks.) 383 The following subsections present different scenarios using multiple 384 segment-based overlay paths with a common need of local in-network 385 loss recovery in best effort networks. 387 3.1. Tail Loss or Loss in Short Flows 389 When the lost segments are at the end of a transaction, TCP's fast 390 retransmit algorithm does not work as there are no ACKs to trigger 391 it. When a sender does not receive an ACK for a given segment within 392 a certain amount of time called retransmission timeout (RTO), it re- 393 sends the segment [RFC6298]. RTO can be as long as several seconds. 394 Hence the recovery of lost segments triggered by RTO is lengthy. 395 [I-D.dukkipati-tcpm-tcp-loss-probe] indicates that large RTOs make a 396 significant contribution to the long tail on the latency statistics 397 of short flows such as loading web pages. 399 The short flow often completes in one or two RTTs. Even when the 400 loss is not a tail loss, it can possibly add another RTT because of 401 end-to-end retransmission (not enough packets are in flight to 402 trigger fast retransmit). In long-haul networks, it can result in 403 extra time of tens or even hundreds of milliseconds. 405 An overlay segment transmits the aggregated flows from ON to ON. As 406 short-lived flows are aggregated, the probability of tail loss over 407 this specific overlay segment decreases compared to an individual 408 flow. The overlay segment is much shorter than the end-to-end path 409 in a Cloud- Internet overlay network, hence loss recovery over an 410 overlay segment is faster. 412 3.2. Packet Loss in Real Time Media Streams 414 The Real-time transport protocol (RTP) is widely used in interactive 415 audio and video. Packet loss degrades the quality of the received 416 media. When the latency tolerance of the application is sufficiently 417 large, the RTP sender may use RTCP NACK feedback from the receiver 418 [RFC4585] to trigger the retransmission of the lost packets before 419 the playout time is reached at the receiver. 421 In a Cloud-Internet overlay network, the end-to-end path can be 422 hundreds of milliseconds. End-to-end feedback based retransmission 423 may be not be very useful when applications can not tolerate one more 424 RTT of this length. Loss recovery over an overlay segment can then 425 be used for the scenarios where RTCP NACK triggered retransmission is 426 not appropriate. 428 3.3. Packet Loss and Congestion Control in Bulk Data Transfer 430 TCP congestion control algorithms such as Reno and CUBIC basically 431 interpret packet loss as congestion experienced somewhere in the 432 path. When a loss is detected, the congestion window will be 433 decreased at the sender to make the sending slower. It has been 434 observed that packet loss is not an accurate way to detect congestion 435 in the current Internet [I-D.cardwell-iccrg-bbr-congestion-control]. 436 In long-haul links, when the loss is caused by non-persistent burst 437 which is extremely short and pretty random, the sender's reaction of 438 reducing sending rate is not able to respond in time to the 439 instantaneous path situation or to mitigate such bursts. On the 440 contrary, reducing window size at the sender unnecessarily or too 441 aggressively harms the throughput for application's long lasting 442 traffic like bulk data transfer. 444 The overlay nodes are distributed over the path with computing 445 capability, they are in a better position than the end hosts to 446 quickly deduce the underlying links' instantaneous situation from 447 measuring the delay, loss or other metrics over the segment. Shorter 448 round trip time over a path segment will benefit more accurate and 449 immediate measurements for the maximum recent bandwidth available, 450 the minimum recent latency, or trend of change. ONs can further 451 decide if the sending rate reduction at the sender is necessary when 452 a loss happened. Section 6.2 talks more details on this. 454 3.4. Multipathing 456 As an overlay path may suffer from an impairment of the underlying 457 network, two or more overlay paths between the same set of ingress 458 and egress overlay nodes can be combined for reliability purpose. 459 During a transient time when a network impairment is detected, 460 sending replicating traffic over two paths can improve reliability. 462 When two or more disjoint overlay paths are available as shown in 463 Figure 3 from ON1 to ON2, different sets of traffic may use different 464 overlay paths. For instance, one path is for low latency and the 465 other is for higher bandwidth, or they can be simply used as load 466 balancing for better bandwidth utilization. 468 Two disjoint paths can be, for example, found by measurement to 469 figure out the segments with very low "mathematical correlation" in 470 latency change. When the number of overlay nodes is large, it is 471 easy to find disjoint or partially disjoint segments. This 472 information may be available if the ONs are managed by the network 473 provider managing the underlying forwarding paths. 475 Different overlay paths may have varying characteristics, obviously. 476 The overlay tunnel should allow the overlay path to handle the packet 477 loss depending on its own path measurements. 479 ON-A 480 +----------o------------------+ 481 | | 482 | | 483 A -----o ON1 ON2o----- B 484 | | 485 +-----------------------o-----+ 486 ON-B 488 Figure 3: Example of Multiple Overlay Paths 490 In reference to Figure 3, both A and B are not aware of the existence 491 of these multiple paths. A network-assistance would be valuable for 492 the sake of better resilience and performance. Note that in a 493 collaborative context (a.k.a., stage 2 mentioned in Section 1.2) 494 LOOPS may target means to advertise the available path 495 characteristics to an endpoint A/B, to allow an endpoint A/B to 496 control the traffic distribution policy to be enforced by ON1/ON2, or 497 to let endpoint A/B notify ON1/ON2 with their multipathing 498 preference. 500 4. Satellite Communication 502 Traditionally, satellite communications deploy PEP (performance 503 enhancing proxy [RFC3135]) nodes around the satellite link to enhance 504 end-to-end performance. TCP splitting is a common approach employed 505 by such PEPs, where the TCP connection is split into three: the 506 segment before the satellite hop, the satellite section (uplink, 507 downlink), and the segment behind the satellite hop. This requires 508 heavy interactions with the end-to-end transport protocols, usually 509 without the explicit consent of the end hosts. Unfortunately, this 510 is indistinguishable from a man-in-the-middle attack on TCP. With 511 end-to-end encryption moving under the transport (QUIC), this 512 approach is no longer useful. 514 Geosynchronous Earth Orbit (GEO) satellites have a one-way delay (up 515 to the satellite and back) on the order of 250 milliseconds. This 516 does not include queueing, coding and other delays in the satellite 517 ground equipment. The Round Trip Time for a TCP or QUIC connection 518 going over a satellite hop in both directions, in the best case, will 519 be on the order of 600 milliseconds. And, it may be considerably 520 longer. RTTs on this order of magnitude have significant performance 521 implications. 523 Packet loss recovery is an area where splitting the TCP connection 524 into different parts helps. Packets lost on the terrestrial links 525 can be recovered at terrestrial latencies. Packet loss on the 526 satellite link can be recovered more quickly by an optimized 527 satellite protocol between the PEPs and/or link layer FEC than they 528 could be end to end. Again, encryption makes TCP splitting no longer 529 applicable. Enhanced error recovery at the satellite link layer 530 helps for the loss on the satellite link but doesn't help for the 531 terrestrial links. Even when the terrestrial segments are short, any 532 loss must be recovered across the satellite link delay. And, there 533 are cases when a satellite ground station connects to the general 534 Internet with a potentially larger terrestrial segment (e.g., to a 535 correspondent host in another country). Faster recovery over such 536 long terrestrial segments is desirable. 538 Another aspect of recovery is that terrestrial loss is highly likely 539 to be congestion related but satellite loss is more likely to be 540 transmission errors due to link conditions. A transport endpoint 541 slowing down because of mis-interpreting these errors as congestion 542 losses unnecessarily reduces performance. But, at the end points, 543 the difference between the two is not easily distinguished. To 544 elaborate more on the loss recovery for satellite communications, 545 while the error rate on the satellite paths is generally very low 546 most of the time, it might get higher during special link conditions 547 (e.g. fades due to heavy rain). The satellite hop itself does know 548 which losses are due to link conditions as opposed to congestion, but 549 it has no mechanism to signal this difference to the end hosts. 551 We will need the protocol under QUIC to try to minimize non- 552 congestion packet drop. Specific link layers may have techniques 553 such as satellite FEC to recover. Where the capabilities of that may 554 be exceeded (e.g., rain fade), we can look at LOOPS-like approaches. 556 There are two high level classes of solutions for making encrypted 557 transport traffic like QUIC work well over satellite: 559 o Hooks in the transport protocol which can adapt to large BDPs 560 where both the bandwidth and the latency are large. This would 561 require end to end enhancement. 563 o Capabilities (such as LOOPS) under the transport protocol to 564 improve performance over specific segments of the path. In 565 particular, separating the terrestrial from the satellite losses. 566 Fixing the terrestrial loss quickly and keeping throughput high 567 over satellite segment by not causing the end-hosts to over-reduce 568 their sending window in case of non-congestion loss. 570 This document focuses on the latter. 572 5. Branch Office WAN Connection 574 Enterprises usually require network connections between the branch 575 offices or between branch offices and cloud data center over 576 geographic distances. With the increasing deployment of vCPE 577 (virtual CPE), some services usually hosted on the CPE are moved to 578 the provider network from the customer site. Such vCPE approach 579 enables some value added service to be provided such as WAN 580 optimization and traffic steering. 582 Figure 4 shows an example of two branch offices WAN connection via 583 Internet. Figure 5 shows a branch office access to public cloud via 584 a selected PoP (point of presence). vCPE connects to that PoP which 585 can be hundreds of kilometers away via Internet. In both cases, the 586 path segments over Internet is subject to loss. Similar problems 587 presented in subsections of Section 3 should be solved. The GW1 may 588 be reachable via multiple paths. 590 Requirements to steer traffic through different sub-paths for latency 591 optimization, resource optimization, balancing, or other purposes are 592 increasing. For example, directing the traffic from vCPE to a 593 lightly loaded PoP rather than to the closest one. Mere best effort 594 transport is not sufficient. New technologies like SFC (Service 595 Function Chaining), SRv6 (segment routing over IPv6), and NFV/SDN 596 used together with vCPE to enable the potentials to embed more 597 complicated loss recovery functions at intermediate nodes in end-to- 598 end path. 600 +------+ +-----+ Internet +------+ +-----+ 601 | GW1 |-------|vCPE1|---------------| vCPE2|-------+ GW2 | 602 +------+ +-----+ +------+ +-----+ 604 Site A Site B 606 Figure 4: Branch Office WAN Connection via Internet 607 +-------------+ 608 | +------+ | 609 | | PoP1 | | 610 +------+ +-----+ Internet | +------+ | 611 | GW1 |------|vCPE1|------------------| | | 612 +------+ +-----+ | | | 613 | +------+ | 614 Site A | | vPC1 | | 615 | +------+ | 616 |public cloud | 617 +-------------+ 618 | 619 | 620 | DC 621 | Interconnection 622 | 623 +-------------+ 624 | +------+ | 625 | | vPC2 | | 626 | +------+ | 627 | | | 628 | | | 629 | +------+ | 630 | | PoP2 | | 631 | +------+ | 632 |public cloud | 633 +-------------+ 635 Figure 5: Enterprise Cloud Access 637 6. Features and Impacts to be Considered for LOOPS 639 This section provides an overview of the proposed LOOPS solution. 640 This section is not meant to document a detailed specification, but 641 it is meant to highlight some design choices that may be followed 642 during the solution design phase. 644 LOOPS aims to improve the transport performance "locally" in addition 645 to native end-to-end mechanism supported by a given transport 646 protocol. This is possible because LOOPS nodes will be instantiated 647 to partition the path into multiple segments. With the advent of 648 automation and technologies like NFV and virtual IO, it is possible 649 to dynamically instantiate functions to nodes. Some overlay 650 protocols such as VXLAN [RFC7348], GENEVE [I-D.ietf-nvo3-geneve], 651 LISP [RFC6830] or CAPWAP [RFC5415] may be used in the network. In 652 overlay network usage scenario, LOOPS can extend a specific overlay 653 protocol header to perform local measurement and local recovery 654 functions, like the example shown in Figure 6. 656 +------------+------------+-----------------+---------+---------+ 657 |Outer IP hdr|Overlay hdr |LOOPS information|Inner hdr|payload | 658 +------------+------------+-----------------+---------+---------+ 660 Figure 6: LOOPS Extension Header Example 662 LOOPS should be designed to minimize its overhead while increasing 663 the benefit (e.g., reduces the completion time of a video 664 application, reduces the loss). Also, LOOPS should be designed to 665 auto-tune itself in case its overhead is exceeding a threshold. 667 For example, LOOPS uses packet number space independent from that of 668 the transport layer. Acknowledgment should be generated from ON 669 receiver to ON sender for packet loss detection and local 670 measurement. To reduce overhead, negative ACK over each path segment 671 is a good choice here. A Timestamp echo mechanism, analogous to 672 TCP's Timestamp option, should be employed in-band in LOOPS extension 673 to measure the local RTT and variation for an overlay segment. Local 674 in-network recovery is performed. The measurement over segment is 675 expected to give a hint on whether the lost packet of locally 676 recovered one was caused by congestion. Such a hint could be further 677 feedback, using like by ECN Congestion Experienced (CE) markings, to 678 the end host sender. It directs the end host sender if congestion 679 window adjustment is necessary. LOOPS normally works on the overlay 680 segment which aggregates the same type of traffic, for instance TCP 681 traffic or finer granularity like TCP throughput sensitive traffic. 682 LOOPS does not look into the inner packet (when an encapsulation 683 scheme is used). Elements to be considered in LOOPS are discussed 684 briefly here. 686 6.1. Local Recovery and End-to-end Retransmission 688 There are basically two ways to perform local recovery, 689 retransmission and FEC (Forward Error Correction). They are possibly 690 used together in some cases. Such approaches between two overlay 691 nodes recover the lost packet in relatively shorter distance and thus 692 shorter latency. Therefore the local recovery is always faster 693 compared to end-to- end. 695 At the same time, most transport layer protocols have their own end- 696 to-end retransmission to recover the lost packet. It would be ideal 697 if end-to-end retransmission at the sender was not triggered when the 698 local recovery is successful. 700 End-to-end retransmission is normally triggered by a NACK as in RTCP 701 or multiple duplicate ACKs as in TCP. 703 When FEC is used for local recovery, it may come with a buffer to 704 make sure the recovered packets delivered are in order subsequently. 705 Therefore the receiver side is unlikely to see the out-of-order 706 packets and then send a NACK or multiple duplicate ACKs. The side 707 effect to unnecessarily trigger end-to-end retransmit is minimum. 708 When FEC is used, if redundancy and block size are determined, extra 709 latency required to recover lost packets is also bounded. Then RTT 710 variation caused by it is predictable. In some extreme case like a 711 large number of packet loss caused by persistent burst, FEC may not 712 be able to recover it. Then end-to-end retransmit will work as a 713 last resort. In summary, when FEC is used as local recovery, the 714 impact on end-to-end retransmission is limited. 716 When local retransmission is used, more care is required. 718 For packet loss in RTP streaming, local retransmission can recover 719 those packets which would not be retransmitted end-to-end otherwise 720 due to long RTT. It would be ideal if the retransmitted packet 721 reaches the receiver before it sends back information that the sender 722 would interpret as a NACK for the lost packet. Therefore when the 723 segment(s) being retransmitted is a small portion of the whole end to 724 end path, the retransmission will have a significant effect of 725 improving the quality at receiver. When the sender also re-transmits 726 the packet based on a NACK received, the receiver will receive the 727 duplicated retransmitted packets and should ignore the duplication. 729 For packet loss in TCP flows, TCP RENO and CUBIC use duplicate ACKs 730 as a loss signal to trigger the fast retransmit. There are different 731 ways to avoid the sender's end-to-end retransmission being triggered 732 prematurely: 734 o The egress overlay node can buffer the out-of-order packets for a 735 while, giving a limited time for a packet being retransmitted 736 somewhere in the overlay path to reach it. The retransmitted 737 packet and the buffered packets caused by it may increase the RTT 738 variation at the sender. When the retransmitted latency is a 739 small portion of RTT or the loss is rare, such RTT variation will 740 be smoothed without much impact. Another possible way is to make 741 the sender exclude such packets from the RTT measurement. The 742 locally recovered packets can be specially marked and this marking 743 is spin back to end host sender. Then RTT measurement should not 744 use that packet. 746 The buffer management is nontrivial in this case. It has to be 747 determined how many out-of-order packets can be buffered at the 748 egress overlay node before it gives up waiting for a successful 749 local retransmission. In some extreme case the lost packet is not 750 recovered successfully locally, the sender may invoke end-to-end 751 fast retransmit slower than it would be in classic TCP. 753 o If LOOPS network does not buffer the out-of-order packets caused 754 by packet loss, TCP sender can use a time based loss detection 755 like RACK [I-D.ietf-tcpm-rack] to prevent the TCP sender from 756 invoking fast retransmit too early. RACK uses the notion of time 757 to replace the conventional DUPACK threshold approach to detect 758 losses. RACK is required to be tuned to fit the local 759 retransmission better. If there are n similar segments over the 760 path, segment retransmission will at least add RTT/n to the 761 reordering window by average when the packet is lost only once 762 over the whole overlay path. This approach is more preferred than 763 one described in previous bullet. On the other hand, if time 764 based loss detection is not supported at the sender, end to end 765 retransmission will be invoked as usual. It wastes some 766 bandwidth. 768 6.1.1. OE to OE Measurement, Recovery, and Multipathing 770 When multiple segments are stitched, another type of local recovery 771 can be is performed between OE (Overlay Edge) to OE. When the 772 segments of an overlay path have similar characteristics and/or only 773 OE has the expected processing capability, OE to OE based local 774 recovery can be used instead of per-segment based recovery. 776 If there is more than one overlay path between two OEs, multipathing 777 can split and recombine the traffic. Measurements such as RTT and 778 loss rate between OEs have to be specific to each path. The ingress 779 OE can use the feedback measurement to determine the FEC parameter 780 settings for different path. FEC can also be configured to work over 781 the combined path. FEC should not increase redundancy over the path 782 where a congestion is found. The egress OE should be able to remove 783 the duplicated packets when multipathing is available. 785 OE to OE measurement can help each segment determine its proportion 786 in edge to edge delay. It is useful for ON to decide if it is 787 necessary to turn on the per segment recovery or how to fine tune the 788 parameter settings. When the segment delay ratio is small, the 789 segment retransmission is more effective. Such approach requires 790 nested LOOPS function. This draft does not focus on the nest LOOPS 791 now. More details will be discussed later if comments showing 792 interests in it are received. 794 6.2. Congestion Control Interaction 796 When a TCP-like transport layer protocol is used, local recovery in 797 LOOPS has to interact with the upper layer transport congestion 798 control. Classic TCP adjusts the congestion window when a loss is 799 detected and fast retransmit is invoked. 801 The local recovery mechanism breaks the assumption of the necessary 802 and sufficient conditional relationship between detected packet loss 803 and congestion control trigger at the sender in classic TCP. The 804 loss that is locally recovered can be caused by a non-persistent 805 congestion such as a random loss or a microburst, both of which 806 ideally would not let the sender invoke the congestion control 807 mechanism. But then, loss can also possibly caused by a real 808 persistent congestion which should let the sender aware of it and 809 reduces its sending rate. 811 When a local recovery takes effect, we consider the following two 812 cases. Firstly, the classic TCP sender does not see enough number of 813 duplicate ACKs to trigger fast retransmit. This may be due to the 814 local recovery procedures, which hides the out-of-order packet from 815 receiver using mechanisms like reordering buffer at egress node. 816 Classic TCP sender in this case will not reduce congestion window as 817 no loss is detected. Secondly, if a time based loss detection such 818 as RACK is used, as long as the locally recovered packet's ACK 819 reaches the sender before the reordering window expires, the 820 congestion window will not be reduced. 822 Such behavior brings the desirable throughput improvement when the 823 recovered packet is lost due to non-persistent congestion. It solves 824 the throughput problem mentioned in Section 3.3 and Section 4. 825 However, it also brings the risk that the sender is not able to 826 detect a real persistent congestion in time, and then overshooting 827 may occur. Eventually a severe congestion that is not recoverable by 828 a local recovery mechanism will be detected by sender. In addition, 829 it may be unfriendly to other flows (possibly pushing them out) if 830 those flows are running over the same underlying bottleneck links. 832 There is a spectrum of approaches. On one end, each locally 833 recovered packet can be treated exactly as a loss in order to invoke 834 the congestion control at the sender to guarantee the fair sharing as 835 classic TCP by setting its CE (Congestion Experienced) bit. Explicit 836 Congestion Notification (ECN) can be used here as ECN marking was 837 required to be equivalent to a packet drop [RFC3168]. Congestion 838 control at the sender works as usual and no throughput improvement 839 could be achieved (although the benefit of faster recovery is still 840 there). On the other hand, ON can perform its congestion measurement 841 over the segment, for instance local RTT and its variation trend. 843 Such measurement can help to determine if a lost packet by 844 congestion. It will further decide if it is necessary to set CE 845 marking or even what ratio is set to make the sender adjust the 846 sending rate. 848 There are possible cases that the sender detects the loss even with 849 local recovery in function. For example, when the re-ordering window 850 in RACK is not optimally adapted, the sender may trigger the 851 congestion control at the same time of end-to-end retransmission. If 852 spurious retransmission detection based on DSACK [RFC3708] is used, 853 such end-to-end retransmission will be found out unnecessary when 854 locally recovered packets reaches the receiver successfully. Then 855 congestion control changes will be undone at the sender. This 856 results in similar pros and cons as described earlier. Pros are 857 preventing the unnecessary window reduction and improving the 858 throughput when the loss is caused by non-congestive loss. Cons are 859 some mechanisms like ECN or its variants should be used wisely to 860 make sure the congestion control is invoked in case of persistent 861 congestion. 863 An approach where the losses on a path segment are not immediately 864 made known to the end-to-end congestion control can be combined with 865 a "circuit breaker" style congestion control on the path segment. 866 When the usage of path segment by the overlay flow starts to become 867 unfair, the path segment sends congestion signals up to the end-to- 868 end congestion control. This must be carefully tuned to avoid 869 unwanted oscillation. 871 In summary, local recovery can improve Flow Completion Time (FCT) by 872 eliminating tail loss in small flows. As it may change loss event to 873 out-of-order event in most cases to TCP sender, if TCP sender uses 874 loss based congestion control, there is no much throughput 875 improvement. We suggest ECN and spurious retransmission to be 876 enabled when local recovery is in use, it would give the desirable 877 throughput performance, i.e. when loss is caused by congestion, 878 reduce congestion window; otherwise keep sender's sending rate. We 879 do not suggest to use spurious retransmission alone together with 880 local recovery as it may cause the TCP sender falsely undo window 881 reduction when congestion occurs. If only ECN is enabled or neither 882 ECN nor spurious retransmission is enabled, the throughput with local 883 recovery in use is no much difference from that of the tradition TCP. 885 6.3. Overlay Protocol Extensions 887 The overlay usually has no control over how packets are routed in the 888 underlying network between two overlay nodes, but it can control, for 889 example, the sequence of overlay nodes a message traverses before 890 reaching its destination. LOOPS assumes the overlay protocol can 891 deliver the packets in such designated sequence. Most forms of 892 overlay networking use some sort of "encapsulation". The whole path 893 taken can be performed by stitching multiple overlay paths, like 894 VXLAN [RFC7348], GENEVE [I-D.ietf-nvo3-geneve], or it can be a single 895 overlay path with a sequence of intermediate overlay nodes specified, 896 as in SRv6 [I-D.ietf-6man-segment-routing-header]. In either way, 897 LOOPS information is required to be embedded in some form to support 898 the data plane measurement and feedback. Retransmission or FEC based 899 loss recovery can be either per ON-hop or OE to OE based. 901 LOOPS alone has no setup requirement on control plane. Some overlay 902 protocols, e.g., CAPWAP [RFC5415], has session setup phase, it can be 903 used to exchange the information such as dynamic FEC parameters. 905 6.4. Summary 907 LOOPS is expected to extend the existing overlay protocols in data 908 plane. Path selection is assumed a feature provided by the overlay 909 protocols via SDN techniques [RFC7149] or other approaches and is not 910 a part of LOOPS. LOOPS is a set of functions to be implemented on 911 Overlay Nodes, that will be involved in forwarding packets in a long 912 haul overlay network. LOOPS targets the following features. 914 1. Local recovery: Retransmission, FEC, or combination thereof can 915 be used as local recovery method. Such recovery mechanism is in- 916 network. It is performed by two network nodes with computing and 917 memory resources. 919 2. Local congestion measurement: Ingress/Egress overlay nodes 920 measure the local segment RTT, loss and/or throughput to 921 immediately get the overlay segment status. 923 3. Signal to end-to-end congestion control: Strategy to set ECN CE 924 marking or simply not to recover the packet to signal the end 925 host sender about if and/or how to adjust the sending rate is 926 required. 928 7. Security Considerations 930 LOOPS does not require access to the traffic payload in clear, so 931 encrypted payload does not affect functionality of LOOPS. 933 The use of LOOPS introduces some issues which impact security. ON 934 with LOOPS function represents a point in the network where the 935 traffic can be potentially manipulated and intercepted by malicious 936 nodes. Means to ensure that only legitimate nodes are involved 937 should be considered. 939 Denial of service attack can be launched from an ON. A rogue ON 940 might be able to spoof packets as if it come from a legitimate ON. 941 It may also modify the ECN CE marking in packets to influence the 942 sender's rate. In order to protected from such attacks, the overlay 943 protocol itself should have some build-in security protection which 944 inherently be used by LOOPS. The operator should use some 945 authentication mechanism to make sure ONs are valid and non- 946 compromised. 948 8. IANA Considerations 950 No IANA action is required. 952 9. Acknowledgements 954 Thanks to etosat mailing list about the discussion about the SatCom 955 and LOOPS use case. 957 10. Informative References 959 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 960 RFC 793, DOI 10.17487/RFC0793, September 1981, 961 . 963 [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. 964 Shelby, "Performance Enhancing Proxies Intended to 965 Mitigate Link-Related Degradations", RFC 3135, 966 DOI 10.17487/RFC3135, June 2001, 967 . 969 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 970 of Explicit Congestion Notification (ECN) to IP", 971 RFC 3168, DOI 10.17487/RFC3168, September 2001, 972 . 974 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 975 Jacobson, "RTP: A Transport Protocol for Real-Time 976 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 977 July 2003, . 979 [RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective 980 Acknowledgement (DSACKs) and Stream Control Transmission 981 Protocol (SCTP) Duplicate Transmission Sequence Numbers 982 (TSNs) to Detect Spurious Retransmissions", RFC 3708, 983 DOI 10.17487/RFC3708, February 2004, 984 . 986 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 987 "Extended RTP Profile for Real-time Transport Control 988 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 989 DOI 10.17487/RFC4585, July 2006, 990 . 992 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 993 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 994 DOI 10.17487/RFC4588, July 2006, 995 . 997 [RFC5415] Calhoun, P., Ed., Montemurro, M., Ed., and D. Stanley, 998 Ed., "Control And Provisioning of Wireless Access Points 999 (CAPWAP) Protocol Specification", RFC 5415, 1000 DOI 10.17487/RFC5415, March 2009, 1001 . 1003 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1004 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1005 . 1007 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1008 "Computing TCP's Retransmission Timer", RFC 6298, 1009 DOI 10.17487/RFC6298, June 2011, 1010 . 1012 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1013 Locator/ID Separation Protocol (LISP)", RFC 6830, 1014 DOI 10.17487/RFC6830, January 2013, 1015 . 1017 [RFC7149] Boucadair, M. and C. Jacquenet, "Software-Defined 1018 Networking: A Perspective from within a Service Provider 1019 Environment", RFC 7149, DOI 10.17487/RFC7149, March 2014, 1020 . 1022 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1023 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1024 eXtensible Local Area Network (VXLAN): A Framework for 1025 Overlaying Virtualized Layer 2 Networks over Layer 3 1026 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1027 . 1029 [RFC8517] Dolson, D., Ed., Snellman, J., Boucadair, M., Ed., and C. 1030 Jacquenet, "An Inventory of Transport-Centric Functions 1031 Provided by Middleboxes: An Operator Perspective", 1032 RFC 8517, DOI 10.17487/RFC8517, February 2019, 1033 . 1035 [I-D.dukkipati-tcpm-tcp-loss-probe] 1036 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1037 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1038 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 1039 in progress), February 2013. 1041 [I-D.ietf-nvo3-geneve] 1042 Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic 1043 Network Virtualization Encapsulation", draft-ietf- 1044 nvo3-geneve-14 (work in progress), September 2019. 1046 [I-D.ietf-tcpm-rack] 1047 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 1048 a time-based fast loss detection algorithm for TCP", 1049 draft-ietf-tcpm-rack-06 (work in progress), November 2019. 1051 [I-D.ietf-6man-segment-routing-header] 1052 Filsfils, C., Dukes, D., Previdi, S., Leddy, J., 1053 Matsushima, S., and D. Voyer, "IPv6 Segment Routing Header 1054 (SRH)", draft-ietf-6man-segment-routing-header-26 (work in 1055 progress), October 2019. 1057 [I-D.cardwell-iccrg-bbr-congestion-control] 1058 Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson, 1059 "BBR Congestion Control", draft-cardwell-iccrg-bbr- 1060 congestion-control-00 (work in progress), July 2017. 1062 [DOI_10.1109_ICDCS.2016.49] 1063 Cai, C., Le, F., Sun, X., Xie, G., Jamjoom, H., and R. 1064 Campbell, "CRONets: Cloud-Routed Overlay Networks", 2016 1065 IEEE 36th International Conference on Distributed 1066 Computing Systems (ICDCS), DOI 10.1109/icdcs.2016.49, June 1067 2016. 1069 [DOI_10.1145_3038912.3052560] 1070 Haq, O., Raja, M., and F. Dogar, "Measuring and Improving 1071 the Reliability of Wide-Area Cloud Paths", Proceedings of 1072 the 26th International Conference on World Wide Web - 1073 WWW '17, DOI 10.1145/3038912.3052560, 2017. 1075 [OCN] Xu, Z., Ju, R., Gu, L., Wang, W., Li, J., Li, F., and L. 1076 Han, "Using Overlay Cloud Network to Accelerate Global 1077 Communications", INFOCOM ICCN 2019, April 2019, 1078 . 1081 Authors' Addresses 1083 Yizhou Li 1084 Huawei Technologies 1085 101 Software Avenue, 1086 Nanjing 210012 1087 China 1089 Phone: +86-25-56624584 1090 Email: liyizhou@huawei.com 1092 Xingwang Zhou 1093 Huawei Technologies 1094 101 Software Avenue, 1095 Nanjing 210012 1096 China 1098 Email: zhouxingwang@huawei.com 1100 Mohamed Boucadair 1101 Orange 1103 Email: mohamed.boucadair@orange.com 1105 Jianglong Wang 1106 China Telecom 1108 Email: wangjl1.bri@chinatelecom.cn