idnits 2.17.1 draft-pauly-tsvwg-tcp-encapsulation-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 02, 2018) is 2096 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 -- Obsolete informational reference (is this intentional?): RFC 8229 (Obsoleted by RFC 9329) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Pauly 3 Internet-Draft E. Kinnear 4 Intended status: Informational Apple Inc. 5 Expires: January 3, 2019 July 02, 2018 7 TCP Encapsulation Considerations 8 draft-pauly-tsvwg-tcp-encapsulation-00 10 Abstract 12 Network protocols other than TCP, such as UDP, are often blocked or 13 suboptimally handled by network middleboxes. One strategy that 14 applications can use to continue to send non-TCP traffic on such 15 networks is to encapsulate datagrams or messages within in a TCP 16 stream. However, encapsulating datagrams within TCP streams can lead 17 to performance degradation. This document provides guidelines for 18 how to use TCP for encapsulation, a summary of performance concerns, 19 and some suggested mitigations for these concerns. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on January 3, 2019. 38 Copyright Notice 40 Copyright (c) 2018 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Motivations for Encapsulation . . . . . . . . . . . . . . . . 3 57 2.1. UDP Blocking . . . . . . . . . . . . . . . . . . . . . . 3 58 2.2. UDP NAT Timeouts . . . . . . . . . . . . . . . . . . . . 3 59 3. Encapsulation Formats . . . . . . . . . . . . . . . . . . . . 3 60 3.1. Multiplexing Flows . . . . . . . . . . . . . . . . . . . 4 61 4. Deployment Considerations . . . . . . . . . . . . . . . . . . 5 62 5. Performance Considerations . . . . . . . . . . . . . . . . . 5 63 5.1. Loss Recovery . . . . . . . . . . . . . . . . . . . . . . 6 64 5.1.1. Concern . . . . . . . . . . . . . . . . . . . . . . . 6 65 5.1.2. Mitigation . . . . . . . . . . . . . . . . . . . . . 6 66 5.2. Bufferbloat . . . . . . . . . . . . . . . . . . . . . . . 7 67 5.2.1. Concern . . . . . . . . . . . . . . . . . . . . . . . 7 68 5.2.2. Mitigation . . . . . . . . . . . . . . . . . . . . . 8 69 5.3. Head of Line Blocking . . . . . . . . . . . . . . . . . . 8 70 5.3.1. Concern . . . . . . . . . . . . . . . . . . . . . . . 8 71 5.3.2. Mitigation . . . . . . . . . . . . . . . . . . . . . 9 72 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 73 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 74 8. Informative References . . . . . . . . . . . . . . . . . . . 9 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 77 1. Introduction 79 TCP streams are sometimes used as a mechanism for encapsulating 80 datagrams or messages, which is referred to in this document as "TCP 81 encapsulation". Encapsulation may be used to transmit data over 82 networks that block or suboptimally handle non-TCP traffic. The 83 current motivations for using encapsulation generally revolve around 84 the treatment of UDP packets (Section 2). 86 Implementing a TCP encapsulation strategy consists of mapping 87 datagram messages into a stream protocol, often with a length-value 88 record format (Section 3). While these formats are described here as 89 applying to encapsulating datagrams in a TCP stream, the formats are 90 equally suited to encapsulating datagrams within any stream 91 abstraction. For example, the same format may be used for both raw 92 TCP streams and TLS streams running over TCP. 94 2. Motivations for Encapsulation 96 The primary motivations for enabling TCP encapsulation that will be 97 explored in this document relate mainly to the treatment of UDP 98 packets on a given network. UDP can be used for real-time network 99 traffic, as a mechanism for deploying non-TCP transport protocols, 100 and as a tunneling protocol that is compatible with Network Address 101 Translators (NATs). 103 2.1. UDP Blocking 105 Some network middleboxes block any IP packets that do not appear to 106 be used for HTTP traffic, either as a security mechanism to block 107 unknown traffic or as a way to restrict access to whitelisted 108 services. Network applications that rely on UDP to transmit data 109 will be blocked by these middleboxes. In this case, the application 110 can attempt to use TCP encapsulation to transmit the same data over a 111 TCP stream. 113 2.2. UDP NAT Timeouts 115 Other networks may not altogether block non-TCP traffic, but instead 116 make other protocols unsuitable for use. For example, many Network 117 Address Translation (NAT) devices will maintain TCP port mappings for 118 long periods of time, since the end of a TCP stream can be detected 119 by the NAT. Since UDP packet flows do not signal when no more 120 packets will be sent, NATs often use short timeouts for UDP port 121 mappings. Thus, applications can attempt to use TCP encapsulation 122 when long-lived flows are required on networks with NATs. 124 3. Encapsulation Formats 126 The simplest approach for encapsulating datagram messages within a 127 TCP stream is to use a length-value record format. That is, a header 128 consisting of a length field, followed by the datagram message 129 itself. 131 For example, if an encapsulation protocol uses a 16-bit length field 132 (allowing up to 65536 bytes of datagram payload), it will use a 133 format like the following: 135 1 2 3 136 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 138 | Length | 139 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 140 | | 141 ~ Datagram Payload ~ 142 | | 143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 145 The format of the length header field could be longer or shorter 146 depending on the needs of the protocol. 16 bits is most appropriate 147 when encapsulating datagrams that would otherwise be sent directly in 148 IP packets, since the payload length field for an IP header is also 149 16 bits. 151 The length field must be specified to either include itself in the 152 length of the entire record, or to only describe the length of the 153 payload field. The protocol used for encapsulating IKE and ESP 154 packets in TCP [RFC8229] does include the length field itself in the 155 length of the record. This may be slightly easier for 156 implementations to parse out records, since they will not need to add 157 the length of the length field when finding record offsets within a 158 stream. 160 3.1. Multiplexing Flows 162 Since TCP encapsulation is used to avoid failures caused by NATs or 163 firewalls, some implementations re-use one TCP port or one 164 established TCP stream for multiple kinds of encapsulated traffic. 165 Using a single port or stream allows re-use of NAT bindings and 166 reduces the chance that a firewall will block some flows, but not 167 others. 169 If multiple kinds of traffic are multiplexed on the same listening 170 TCP port, individual streams opened to that port need to be 171 differentiated. This may require adding a one-time header that is 172 sent on the stream to indicate the type of encapsulated traffic that 173 will follow. For example, TCP encapsulated IKE [RFC8229] uses a 174 stream prefix to differentiate its encapsulation strategy from 175 proprietary Virtual Private Network (VPN) protocols. 177 Multiplexing multiple kinds of datagrams, or independent flows of 178 datagrams, over a single TCP stream requires adding a per-record type 179 field or marker to the encapsulation record format. For ease of 180 parsing records, this value should be placed after the length field 181 of the record format. For example, various ESP packet flows are 182 identified by the four-byte Security Parameter Index (SPI) that 183 comprises the first bytes of the datagram payload, while IKE packets 184 in the same TCP encapsulated stream are differentiated by using all 185 zeros for the first four bytes. 187 4. Deployment Considerations 189 In general, any new TCP encapsulation protocol should allocate a new 190 TCP port. If TCP is being used to encapsulate traffic that is 191 normally sent over UDP, then the the most obvious port choice for the 192 TCP encapsulated version is the equivalent port value in the TCP port 193 namespace. 195 Simply using TCP instead of UDP may be enough in some cases to 196 mitigate the connectivity problems of using UDP with NATs and other 197 middleboxes. However, it may be useful to also add a layer of 198 encryption to the stream using TLS to obfuscate the contents of the 199 stream. This may be done for security and privacy reasons, or to 200 prevent middleboxes from mishandling encapsulated traffic or 201 ossifying around a particular format for encapsulation. 203 5. Performance Considerations 205 Many encapsulation or tunnelling protocols utilize an underlying 206 transport like UDP, which does not provide stateful features such as 207 loss recovery or congestion control. Because encapsulation using TCP 208 involves an additional layer of state that is shared among all 209 traffic inside the tunnel, there are additional performance 210 considerations to address. 212 Even though this document describes encapsulating datagrams or 213 messages inside a TCP stream, some protocols, such as ESP, themselves 214 often encapsulate additional TCP streams, such as when transmitting 215 data for a VPN protocol [RFC8229]. This introduces several potential 216 sources of suboptimal behavior, as multiple TCP contexts act upon the 217 same traffic. 219 For the purposes of this discussion, we will refer to the TCP 220 encapsulation context as the "outer" TCP context, while the TCP 221 context applicable to any encapsulated protocol will be referred to 222 as the "inner" TCP context. 224 The use of an outer TCP context may cause signals from the network to 225 be hidden from the inner TCP contexts. Depending on the signals that 226 the inner TCP contexts use for indicating congestion, events that 227 would otherwise result in a modification of behavior may go 228 unnoticed, or may build up until a large modification of behavior is 229 necessary. Generally, the main areas of concern are signals that 230 inform loss recovery, Bufferbloat and delay avoidance, and head of 231 line blocking between streams. 233 5.1. Loss Recovery 235 5.1.1. Concern 237 The outer TCP context experiences packet loss on the network 238 directly, while any inner TCP contexts present observe the effects of 239 that loss on the delivery of their packets by the encapsulation 240 layer. Furthermore, inner TCP contexts still observe direct network 241 effects for any network segments that are traversed outside of the 242 encapsulation, as is common with a VPN. 244 In this way, the outer TCP context masks packet loss from the inner 245 contexts by retransmitting encapsulated segments to recover from 246 those losses. An inner context observes this as a delay while the 247 packets are retransmitted rather than a loss. This can lead to 248 spurious retransmissions if the recovery of the lost packets takes 249 longer than the inner context's retransmission timeout (RTO). Since 250 the outer context is retransmitting the packets to make up for the 251 losses, the spurious retransmissions waste bandwidth that could be 252 used for packets that advance the progress of the flows being 253 encapsulated. A RTO event on an inner TCP context also hinders 254 performance beyond generating spurious retransmissions, as many TCP 255 congestion control algorithms dramatically reduce the sending rate 256 after an RTO is observed. 258 When recovery from a loss event on the outer TCP context completes, 259 the network or endpoint on the other end of the encapsulation will 260 receive a potentially large burst of packets as the retransmitted 261 packets fill in any gaps and the entire set of pending data can be 262 delivered. 264 If content from multiple inner flows is shared within a single TCP 265 packet in the outer context, the effects of lost packets from the 266 outer context will be experienced by more than one inner flow at a 267 time. However, this loss is actually shared by all inner flows, 268 since forward progress for the entire encapsulation tunnel is 269 generally blocked until the lost segments can be filled in. This is 270 discussed further in Section 5.3. 272 5.1.2. Mitigation 274 Generally, TCP congestion controls and loss recovery algorithms are 275 capable of recovering from loss events very efficiently, and the 276 inner TCP contexts observe brief periods of added delay without much 277 penalty. 279 A TCP congestion control should be selected and tuned to be able to 280 gracefully handle extremely variable RTT values, which may already 281 the case for some congestion controls, as RTT variance is often 282 greatly increased in mobile and cellular networks. 284 Additionally, use of a TCP congestion control that considers delay to 285 be a sign of congestion may help the coordination between inner and 286 outer TCP contexts. LEDBAT [RFC6817] and BBR 287 [I-D.cardwell-iccrg-bbr-congestion-control] are two examples of delay 288 based congestion control that an inner TCP context could use to 289 properly interpret loss events experienced by the outer TCP context. 290 Care must be taken to ensure that any TCP congestion control in use 291 is also appropriate for an inner context to use on any network 292 segments that are traversed outside of the encapsulation. 294 Since any losses will be handled by the outer TCP context, it might 295 seem reasonable to modify the the inner TCP contexts' loss recovery 296 algorithms to prevent retransmissions, there are often network 297 segments outside of the encapsulated segments that still rely on the 298 inner contexts' loss recovery algorithms. Instead, spurious 299 retransmissions can be reduced by ensuring that RTO values are tuned 300 such that the outer TCP context will fully time out before any inner 301 TCP contexts. 303 5.2. Bufferbloat 305 5.2.1. Concern 307 "Bufferbloat", or delay introduced by consistently full large buffers 308 along a network path [TSV2011] [BB2011], can increase observed RTTs 309 along a network path, which can harm the performance of latency 310 sensitive applications. Any spurious retransmissions sent on the 311 network take place in queues that would otherwise be filled by useful 312 data. In this case, any retransmission sent by an inner TCP context 313 for a loss or timeout along the network segments also covered by the 314 outer TCP context is considered to be spurious. This can pose a 315 performance problem for implementations that rely on interactive data 316 transfer. 318 Additionally, because there may be multiple inner TCP contexts being 319 multiplexed over a single outer TCP context, even a minor reduction 320 in sending rate by each of the inner contexts can result in a 321 dramatic decrease in data sent through the outer context. Similarly, 322 an increase in sending rate is also amplified. 324 5.2.2. Mitigation 326 Great care should be taken in tuning the inner TCP congestion control 327 to avoid spurious retransmissions as much as possible. However, in 328 order to provide effective loss recovery for the segments of the 329 network outside the tunnel, the set of parameters used for tuning 330 needs to be viable both inside and outside the tunnel. Adjusting the 331 retransmission timeout (RTO) value for the TCP congestion control on 332 the inner TCP context to be greater than that of the out TCP context 333 will often help to reduce the number of spurious retransmissions 334 generated while the outer TCP context attempts to catch up with lost 335 or reordered packets. 337 In most cases, fast retransmit will be sufficient to recover from 338 losses on network segments after the inner flows leave the tunnel, 339 although loss events that trigger a full RTO on those last-mile 340 segments will carry a higher penalty with such tuning. However, in 341 many deployments, the last-mile segments will often observe lower 342 loss rates than the first-mile segments, leading to a balance that 343 often favors spurious retransmission avoidance on the first-mile over 344 loss recovery speed on the last-mile. 346 5.3. Head of Line Blocking 348 5.3.1. Concern 350 Because TCP provides in-order delivery and reliability, even if there 351 are multiple flows being multiplexed over the encapsulation layer, 352 loss events, spurious retransmissions, or other recovery efforts will 353 cause data for all other flows to back up and not be delivered to the 354 client. In deployments where there are additional network segments 355 to traverse beyond the encapsulation boundary, this may mean that 356 flows are not delivered onto those segments until recovery for the 357 outer TCP context is complete. 359 With UDP encapsulation, packet reordering and loss did not 360 necessarily prevent data from being delivered, even if it was 361 delivered out of order. Because TCP groups all data being 362 encapsulated into one outer congestion control and loss recovery 363 context, this may cause significant delays for flows not directly 364 impacted by a recovery event. 366 Reordering on the network will also cause problems in this case, as 367 it will often trigger fast retransmissions on the outer TCP context, 368 blocking all inner contexts from being able to deliver data until the 369 retransmissions are complete. However, a well behaved TCP will 370 reorder the data that arrived out of order and deliver it before the 371 retransmissions arrive, reducing the detrimental impact of such 372 reordering. 374 5.3.2. Mitigation 376 One option to help address the head of line blocking would be to run 377 multiple tunnels, one for throughput sensitive flows and one for 378 latency sensitive flows. This can help to reduce the amount of time 379 that a latency sensitive flow can possibly be blocked on recovery for 380 any other flow. Latency sensitive flows should take extra care to 381 ensure that only the necessary amount of data is in flight at any 382 given time. 384 Explicit Congestion Notification (ECN) ([RFC3168], [RFC5562]) could 385 also be used to communicate between outer and inner TCP contexts 386 during any recovery scenario. In a strategy similar to that taken by 387 tunnelling of ECN fields in IP-in-IP tunnels [RFC6040], if an 388 implementation supports such behavior, any ECN markings communicated 389 to the outer TCP context by the network could be passed through to 390 any inner TCP contexts transported by a given packet. Alternately, 391 an implementation could elect to pass through such markings to all 392 inner TCP contexts if a greater reduction in sending rate was deemed 393 to be necessary. 395 6. Security Considerations 397 Any attacker on the path that observes the encapsulation could 398 potentially discard packets from the outer TCP context and cause 399 significant delays due to head of line blocking. However, an 400 attacker in a position to arbitrarily discard packets could have a 401 similar effect on the inner TCP context directly or on any other 402 encapsulation schemes. 404 7. IANA Considerations 406 This document has no request to IANA. 408 8. Informative References 410 [BB2011] "Bufferbloat: Dark Buffers in the Internet", n.d.. 412 [I-D.cardwell-iccrg-bbr-congestion-control] 413 Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson, 414 "BBR Congestion Control", draft-cardwell-iccrg-bbr- 415 congestion-control-00 (work in progress), July 2017. 417 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 418 of Explicit Congestion Notification (ECN) to IP", 419 RFC 3168, DOI 10.17487/RFC3168, September 2001, 420 . 422 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 423 Ramakrishnan, "Adding Explicit Congestion Notification 424 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 425 DOI 10.17487/RFC5562, June 2009, 426 . 428 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 429 Notification", RFC 6040, DOI 10.17487/RFC6040, November 430 2010, . 432 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 433 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 434 DOI 10.17487/RFC6817, December 2012, 435 . 437 [RFC8229] Pauly, T., Touati, S., and R. Mantha, "TCP Encapsulation 438 of IKE and IPsec Packets", RFC 8229, DOI 10.17487/RFC8229, 439 August 2017, . 441 [TSV2011] "Bufferbloat: Dark Buffers in the Internet", March 2011. 443 Authors' Addresses 445 Tommy Pauly 446 Apple Inc. 447 One Apple Park Way 448 Cupertino, California 95014 449 United States of America 451 Email: tpauly@apple.com 453 Eric Kinnear 454 Apple Inc. 455 One Apple Park Way 456 Cupertino, California 95014 457 United States of America 459 Email: ekinnear@apple.com