idnits 2.17.1 draft-ietf-taps-impl-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 41 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1306: '... Implementations SHOULD ensure that th...' RFC 2119 keyword, line 1919: '... Transport Services API MUST provide a...' RFC 2119 keyword, line 1926: '... [RFC6525] MUST be supported by both the client and the server side....' RFC 2119 keyword, line 1928: '...locking, stream mapping SHOULD only be...' RFC 2119 keyword, line 1936: '... been created, MUST always use strea...' (3 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (9 January 2022) is 843 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-19) exists of draft-ietf-taps-arch-12 == Outdated reference: A later version (-26) exists of draft-ietf-taps-interface-14 ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) -- Obsolete informational reference (is this intentional?): RFC 5766 (Obsoleted by RFC 8656) -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TAPS Working Group A. Brunstrom, Ed. 3 Internet-Draft Karlstad University 4 Intended status: Informational T. Pauly, Ed. 5 Expires: 13 July 2022 Apple Inc. 6 T. Enghardt 7 Netflix 8 P. Tiesel 9 SAP SE 10 M. Welzl 11 University of Oslo 12 9 January 2022 14 Implementing Interfaces to Transport Services 15 draft-ietf-taps-impl-11 17 Abstract 19 The Transport Services system enables applications to use transport 20 protocols flexibly for network communication and defines a protocol- 21 independent Transport Services Application Programming Interface 22 (API) that is based on an asynchronous, event-driven interaction 23 pattern. This document serves as a guide to implementation on how to 24 build such a system. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on 13 July 2022. 43 Copyright Notice 45 Copyright (c) 2022 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 50 license-info) in effect on the date of publication of this document. 51 Please review these documents carefully, as they describe your rights 52 and restrictions with respect to this document. Code Components 53 extracted from this document must include Revised BSD License text as 54 described in Section 4.e of the Trust Legal Provisions and are 55 provided without warranty as described in the Revised BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 61 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 62 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 63 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 64 4. Implementing Connection Establishment . . . . . . . . . . . . 7 65 4.1. Structuring Candidates as a Tree . . . . . . . . . . . . 8 66 4.1.1. Branch Types . . . . . . . . . . . . . . . . . . . . 10 67 4.1.2. Branching Order-of-Operations . . . . . . . . . . . . 12 68 4.1.3. Sorting Branches . . . . . . . . . . . . . . . . . . 14 69 4.2. Candidate Gathering . . . . . . . . . . . . . . . . . . . 15 70 4.2.1. Gathering Endpoint Candidates . . . . . . . . . . . . 15 71 4.3. Candidate Racing . . . . . . . . . . . . . . . . . . . . 17 72 4.3.1. Simultaneous . . . . . . . . . . . . . . . . . . . . 17 73 4.3.2. Staggered . . . . . . . . . . . . . . . . . . . . . . 18 74 4.3.3. Failover . . . . . . . . . . . . . . . . . . . . . . 19 75 4.4. Completing Establishment . . . . . . . . . . . . . . . . 19 76 4.4.1. Determining Successful Establishment . . . . . . . . 20 77 4.5. Establishing multiplexed connections . . . . . . . . . . 21 78 4.6. Handling connectionless protocols . . . . . . . . . . . . 21 79 4.7. Implementing listeners . . . . . . . . . . . . . . . . . 21 80 4.7.1. Implementing listeners for Connected Protocols . . . 22 81 4.7.2. Implementing listeners for Connectionless 82 Protocols . . . . . . . . . . . . . . . . . . . . . . 22 83 4.7.3. Implementing listeners for Multiplexed Protocols . . 22 84 5. Implementing Sending and Receiving Data . . . . . . . . . . . 23 85 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 23 86 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 23 87 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 25 88 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 25 89 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 25 90 5.3. Handling of data for fast-open protocols . . . . . . . . 26 91 6. Implementing Message Framers . . . . . . . . . . . . . . . . 27 92 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 28 93 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 29 94 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 29 95 7. Implementing Connection Management . . . . . . . . . . . . . 30 96 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 31 97 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 31 98 8. Implementing Connection Termination . . . . . . . . . . . . . 33 99 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 34 100 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 34 101 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 35 102 10. Specific Transport Protocol Considerations . . . . . . . . . 36 103 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 37 104 10.2. MPTCP . . . . . . . . . . . . . . . . . . . . . . . . . 39 105 10.3. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 39 106 10.4. UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 40 107 10.5. UDP Multicast Receive . . . . . . . . . . . . . . . . . 40 108 10.6. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 42 109 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 110 12. Security Considerations . . . . . . . . . . . . . . . . . . . 45 111 12.1. Considerations for Candidate Gathering . . . . . . . . . 45 112 12.2. Considerations for Candidate Racing . . . . . . . . . . 45 113 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 46 114 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 46 115 14.1. Normative References . . . . . . . . . . . . . . . . . . 46 116 14.2. Informative References . . . . . . . . . . . . . . . . . 47 117 Appendix A. API Mapping Template . . . . . . . . . . . . . . . . 49 118 Appendix B. Additional Properties . . . . . . . . . . . . . . . 50 119 B.1. Properties Affecting Sorting of Branches . . . . . . . . 50 120 Appendix C. Reasons for errors . . . . . . . . . . . . . . . . . 51 121 Appendix D. Existing Implementations . . . . . . . . . . . . . . 52 122 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 52 124 1. Introduction 126 The Transport Services architecture [I-D.ietf-taps-arch] defines a 127 system that allows applications to flexibly use transport networking 128 protocols. The API that such a system exposes to applications is 129 defined as the Transport Services API [I-D.ietf-taps-interface]. 130 This API is designed to be generic across multiple transport 131 protocols and sets of protocols features. 133 This document serves as a guide to implementation on how to build a 134 system that provides a Transport Services API. It is the job of an 135 implementation of a Transport Services system to turn the requests of 136 an application into decisions on how to establish connections, and 137 how to transfer data over those connections once established. The 138 terminology used in this document is based on the Architecture 139 [I-D.ietf-taps-arch]. 141 2. Implementing Connection Objects 143 The connection objects that are exposed to applications for Transport 144 Services are: 146 * the Preconnection, the bundle of Properties that describes the 147 application constraints on, and preferences for, the transport; 149 * the Connection, the basic object that represents a flow of data as 150 Messages in either direction between the Local and Remote 151 Endpoints; 153 * and the Listener, a passive waiting object that delivers new 154 Connections. 156 Preconnection objects should be implemented as bundles of properties 157 that an application can both read and write. A Preconnection object 158 influences a Connection only at one point in time: when the 159 Connection is created. Connection objects represent the interface 160 between the application and the implementation to manage transport 161 state, and conduct data transfer. During the process of 162 establishment (Section 4), the Connection will not be bound to a 163 specific transport protocol instance, since multiple candidate 164 Protocol Stacks might be raced. 166 Once a Preconnection has been used to create an outbound Connection 167 or a Listener, the implementation should ensure that the copy of the 168 properties held by the Connection or Listener is not affected when 169 the application makes changes to a Preconnection object. This may 170 involve the implementation performing a deep-copy, copying the object 171 with all the objects that it references. 173 Once the Connection is established, Transport Services implementation 174 maps actions and events to the details of the chosen Protocol Stack. 175 For example, the same Connection object may ultimately represent a 176 single instance of one transport protocol (e.g., a TCP connection, a 177 TLS session over TCP, a UDP flow with fully-specified Local and 178 Remote Endpoints, a DTLS session, a SCTP stream, a QUIC stream, or an 179 HTTP/2 stream). The properties held by a Connection or Listener is 180 independent of other connections that are not part of the same 181 Connection Group. 183 Connection establishment is only a local operation for a Datagram 184 transport (e.g., UDP(-Lite)), which serves to simplify the local 185 send/receive functions and to filter the traffic for the specified 186 addresses and ports [RFC8085]. 188 Once Initiate has been called, the Selection Properties and Endpoint 189 information are immutable (i.e, an application is not able to later 190 modify Selection Properties on the original Preconnection object). 191 Listener objects are created with a Preconnection, at which point 192 their configuration should be considered immutable by the 193 implementation. The process of listening is described in 194 Section 4.7. 196 3. Implementing Pre-Establishment 198 During pre-establishment the application specifies one or more 199 Endpoints to be used for communication as well as protocol 200 preferences and constraints via Selection Properties and, if desired, 201 also Connection Properties. Generally, Connection Properties should 202 be configured as early as possible, because they can serve as input 203 to decisions that are made by the implementation (e.g., the Capacity 204 Profile can guide usage of a protocol offering scavenger-type 205 congestion control). 207 The implementation stores these properties as a part of the 208 Preconnection object for use during connection establishment. For 209 Selection Properties that are not provided by the application, the 210 implementation must use the default values specified in the Transport 211 Services API ([I-D.ietf-taps-interface]). 213 3.1. Configuration-time errors 215 The Transport Services system should have a list of supported 216 protocols available, which each have transport features reflecting 217 the capabilities of the protocol. Once an application specifies its 218 Transport Properties, the transport system matches the required and 219 prohibited properties against the transport features of the available 220 protocols. 222 In the following cases, failure should be detected during pre- 223 establishment: 225 * A request by an application for Protocol Properties that cannot be 226 satisfied by any of the available protocols. For example, if an 227 application requires "Configure Reliability per Message", but no 228 such feature is available in any protocol the host running the 229 transport system on the host running the transport system this 230 should result in an error, e.g., when SCTP is not supported by the 231 operating system. 233 * A request by an application for Protocol Properties that are in 234 conflict with each other, i.e., the required and prohibited 235 properties cannot be satisfied by the same protocol. For example, 236 if an application prohibits "Reliable Data Transfer" but then 237 requires "Configure Reliability per Message", this mismatch should 238 result in an error. 240 To avoid allocating resources that are not finally needed, it is 241 important that configuration-time errors fail as early as possible. 243 3.2. Role of system policy 245 The properties specified during pre-establishment have a close 246 relationship to system policy. The implementation is responsible for 247 combining and reconciling several different sources of preferences 248 when establishing Connections. These include, but are not limited 249 to: 251 1. Application preferences, i.e., preferences specified during the 252 pre-establishment via Selection Properties. 254 2. Dynamic system policy, i.e., policy compiled from internally and 255 externally acquired information about available network 256 interfaces, supported transport protocols, and current/previous 257 Connections. Examples of ways to externally retrieve policy- 258 support information are through OS-specific statistics/ 259 measurement tools and tools that reside on middleboxes and 260 routers. 262 3. Default implementation policy, i.e., predefined policy by OS or 263 application. 265 In general, any protocol or path used for a connection must conform 266 to all three sources of constraints. A violation that occurs at any 267 of the policy layers should cause a protocol or path to be considered 268 ineligible for use. For an example of application preferences 269 leading to constraints, an application may prohibit the use of 270 metered network interfaces for a given Connection to avoid user cost. 271 Similarly, the system policy at a given time may prohibit the use of 272 such a metered network interface from the application's process. 273 Lastly, the implementation itself may default to disallowing certain 274 network interfaces unless explicitly requested by the application and 275 allowed by the system. 277 It is expected that the database of system policies and the method of 278 looking up these policies will vary across various platforms. An 279 implementation should attempt to look up the relevant policies for 280 the system in a dynamic way to make sure it is reflecting an accurate 281 version of the system policy, since the system's policy regarding the 282 application's traffic may change over time due to user or 283 administrative changes. 285 4. Implementing Connection Establishment 287 The process of establishing a network connection begins when an 288 application expresses intent to communicate with a Remote Endpoint by 289 calling Initiate. (At this point, any constraints or requirements 290 the application may have on the connection are available from pre- 291 establishment.) The process can be considered complete once there is 292 at least one Protocol Stack that has completed any required setup to 293 the point that it can transmit and receive the application's data. 295 Connection establishment is divided into two top-level steps: 296 Candidate Gathering, to identify the paths, protocols, and endpoints 297 to use, and Candidate Racing (see Section 4.2.2 of 298 [I-D.ietf-taps-arch]), in which the necessary protocol handshakes are 299 conducted so that the transport system can select which set to use. 301 This document structures the candidates for racing as a tree as 302 terminological convention. While a a tree structure is not the only 303 way in which racing can be implemented, it does ease the illustration 304 of how racing works. 306 The most simple example of this process might involve identifying the 307 single IP address to which the implementation wishes to connect, 308 using the system's current default interface or path, and starting a 309 TCP handshake to establish a stream to the specified IP address. 310 However, each step may also differ depending on the requirements of 311 the connection: if the endpoint is defined as a hostname and port, 312 then there may be multiple resolved addresses that are available; 313 there may also be multiple interfaces or paths available, other than 314 the default system interface; and some protocols may not need any 315 transport handshake to be considered "established" (such as UDP), 316 while other connections may utilize layered protocol handshakes, such 317 as TLS over TCP. 319 Whenever an implementation has multiple options for connection 320 establishment, it can view the set of all individual connection 321 establishment options as a single, aggregate connection 322 establishment. The aggregate set conceptually includes every valid 323 combination of endpoints, paths, and protocols. As an example, 324 consider an implementation that initiates a TCP connection to a 325 hostname + port endpoint, and has two valid interfaces available (Wi- 326 Fi and LTE). The hostname resolves to a single IPv4 address on the 327 Wi-Fi network, and resolves to the same IPv4 address on the LTE 328 network, as well as a single IPv6 address. The aggregate set of 329 connection establishment options can be viewed as follows: 331 Aggregate [Endpoint: www.example.com:80] [Interface: Any] [Protocol: TCP] 332 |-> [Endpoint: 192.0.2.1:80] [Interface: Wi-Fi] [Protocol: TCP] 333 |-> [Endpoint: 192.0.2.1:80] [Interface: LTE] [Protocol: TCP] 334 |-> [Endpoint: 2001:DB8::1.80] [Interface: LTE] [Protocol: TCP] 336 Any one of these sub-entries on the aggregate connection attempt 337 would satisfy the original application intent. The concern of this 338 section is the algorithm defining which of these options to try, 339 when, and in what order. 341 During Candidate Gathering, an implementation first excludes all 342 protocols and paths that match a Prohibit or do not match all Require 343 properties. Then, the implementation will sort branches according to 344 Preferred properties, Avoided properties, and possibly other 345 criteria. 347 4.1. Structuring Candidates as a Tree 349 As noted above, the considereration of multiple candidates in a 350 gathering and racing process can be conceptually structured as a 351 tree; this terminological convention is used throughout this 352 document. 354 Each leaf node of the tree represents a single, coherent connection 355 attempt, with an endpoint, a path, and a set of protocols that can 356 directly negotiate and send data on the network. Each node in the 357 tree that is not a leaf represents a connection attempt that is 358 either underspecified, or else includes multiple distinct options. 359 For example, when connecting on an IP network, a connection attempt 360 to a hostname and port is underspecified, because the connection 361 attempt requires a resolved IP address as its Remote Endpoint. In 362 this case, the node represented by the connection attempt to the 363 hostname is a parent node, with child nodes for each IP address. 364 Similarly, an implementation that is allowed to connect using 365 multiple interfaces will have a parent node of the tree for the 366 decision between the paths, with a branch for each interface. 368 The example aggregate connection attempt above can be drawn as a tree 369 by grouping the addresses resolved on the same interface into 370 branches: 372 || 373 +==========================+ 374 | www.example.com:80/Any | 375 +==========================+ 376 // \\ 377 +==========================+ +==========================+ 378 | www.example.com:80/Wi-Fi | | www.example.com:80/LTE | 379 +==========================+ +==========================+ 380 || // \\ 381 +====================+ +====================+ +======================+ 382 | 192.0.2.1:80/Wi-Fi | | 192.0.2.1:80/LTE | | 2001:DB8::1.80/LTE | 383 +====================+ +====================+ +======================+ 385 The rest of this section will use a notation scheme to represent this 386 tree. The parent (or trunk) node of the tree will be represented by 387 a single integer, such as "1". Each child of that node will have an 388 integer that identifies it, from 1 to the number of children. That 389 child node will be uniquely identified by concatenating its integer 390 to it's parents identifier with a dot in between, such as "1.1" and 391 "1.2". Each node will be summarized by a tuple of three elements: 392 endpoint, path, and protocol. The above example can now be written 393 more succinctly as: 395 1 [www.example.com:80, Any, TCP] 396 1.1 [www.example.com:80, Wi-Fi, TCP] 397 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 398 1.2 [www.example.com:80, LTE, TCP] 399 1.2.1 [192.0.2.1:80, LTE, TCP] 400 1.2.2 [2001:DB8::1.80, LTE, TCP] 402 When an implementation views this aggregate set of connection 403 attempts as a single connection establishment, it only will use one 404 of the leaf nodes to transfer data. Thus, when a single leaf node 405 becomes ready to use, then the entire connection attempt is ready to 406 use by the application. Another way to represent this is that every 407 leaf node updates the state of its parent node when it becomes ready, 408 until the trunk node of the tree is ready, which then notifies the 409 application that the connection as a whole is ready to use. 411 A connection establishment tree may be degenerate, and only have a 412 single leaf node, such as a connection attempt to an IP address over 413 a single interface with a single protocol. 415 1 [192.0.2.1:80, Wi-Fi, TCP] 417 A parent node may also only have one child (or leaf) node, such as a 418 when a hostname resolves to only a single IP address. 420 1 [www.example.com:80, Wi-Fi, TCP] 421 1.1 [192.0.2.1:80, Wi-Fi, TCP] 423 4.1.1. Branch Types 425 There are three types of branching from a parent node into one or 426 more child nodes. Any parent node of the tree must only use one type 427 of branching. 429 4.1.1.1. Derived Endpoints 431 If a connection originally targets a single endpoint, there may be 432 multiple endpoints of different types that can be derived from the 433 original. The connection library creates an ordered list of the 434 derived endpoints according to application preference, system policy 435 and expected performance. 437 DNS hostname-to-address resolution is the most common method of 438 endpoint derivation. When trying to connect to a hostname endpoint 439 on a traditional IP network, the implementation should send DNS 440 queries for both A (IPv4) and AAAA (IPv6) records if both are 441 supported on the local interface. The algorithm for ordering and 442 racing these addresses should follow the recommendations in Happy 443 Eyeballs [RFC8305]. 445 1 [www.example.com:80, Wi-Fi, TCP] 446 1.1 [2001:DB8::1.80, Wi-Fi, TCP] 447 1.2 [192.0.2.1:80, Wi-Fi, TCP] 448 1.3 [2001:DB8::2.80, Wi-Fi, TCP] 449 1.4 [2001:DB8::3.80, Wi-Fi, TCP] 451 DNS-Based Service Discovery [RFC6763] can also provide an endpoint 452 derivation step. When trying to connect to a named service, the 453 client may discover one or more hostname and port pairs on the local 454 network using multicast DNS [RFC6762]. These hostnames should each 455 be treated as a branch that can be attempted independently from other 456 hostnames. Each of these hostnames might resolve to one or more 457 addresses, which would create multiple layers of branching. 459 1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP] 460 1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP] 461 1.1.1 [31.133.160.18.631, Wi-Fi, TCP] 463 4.1.1.2. Alternate Paths 465 If a client has multiple network interfaces available to it, e.g., a 466 mobile client with both Wi-Fi and Cellular connectivity, it can 467 attempt a connection over any of the interfaces. This represents a 468 branch point in the connection establishment. Similar to a derived 469 endpoint, the interfaces should be ranked based on preference, system 470 policy, and performance. Attempts should be started on one 471 interface, and then on other interfaces successively after delays 472 based on expected round-trip-time or other available metrics. 474 1 [192.0.2.1:80, Any, TCP] 475 1.1 [192.0.2.1:80, Wi-Fi, TCP] 476 1.2 [192.0.2.1:80, LTE, TCP] 478 This same approach applies to any situation in which the client is 479 aware of multiple links or views of the network. Multiple Paths, 480 each with a coherent set of addresses, routes, DNS server, and more, 481 may share a single interface. A path may also represent a virtual 482 interface service such as a Virtual Private Network (VPN). 484 The list of available paths should be constrained by any requirements 485 or prohibitions the application sets, as well as system policy. 487 4.1.1.3. Protocol Options 489 Differences in possible protocol compositions and options can also 490 provide a branching point in connection establishment. This allows 491 clients to be resilient to situations in which a certain protocol is 492 not functioning on a server or network. 494 This approach is commonly used for connections with optional proxy 495 server configurations. A single connection might have several 496 options available: an HTTP-based proxy, a SOCKS-based proxy, or no 497 proxy. These options should be ranked and attempted in succession. 499 1 [www.example.com:80, Any, HTTP/TCP] 500 1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP] 501 1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP] 502 1.3 [www.example.com:80, Any, HTTP/TCP] 503 1.3.1 [192.0.2.1:80, Any, HTTP/TCP] 505 This approach also allows a client to attempt different sets of 506 application and transport protocols that, when available, could 507 provide preferable features. For example, the protocol options could 508 involve QUIC [I-D.ietf-quic-transport] over UDP on one branch, and 509 HTTP/2 [RFC7540] over TLS over TCP on the other: 511 1 [www.example.com:443, Any, Any HTTP] 512 1.1 [www.example.com:443, Any, QUIC/UDP] 513 1.1.1 [192.0.2.1:443, Any, QUIC/UDP] 514 1.2 [www.example.com:443, Any, HTTP2/TLS/TCP] 515 1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP] 517 Another example is racing SCTP with TCP: 519 1 [www.example.com:80, Any, Any Stream] 520 1.1 [www.example.com:80, Any, SCTP] 521 1.1.1 [192.0.2.1:80, Any, SCTP] 522 1.2 [www.example.com:80, Any, TCP] 523 1.2.1 [192.0.2.1:80, Any, TCP] 525 Implementations that support racing protocols and protocol options 526 should maintain a history of which protocols and protocol options 527 successfully established, on a per-network and per-endpoint basis 528 (see Section 9.2). This information can influence future racing 529 decisions to prioritize or prune branches. 531 4.1.2. Branching Order-of-Operations 533 Branch types must occur in a specific order relative to one another 534 to avoid creating leaf nodes with invalid or incompatible settings. 535 In the example above, it would be invalid to branch for derived 536 endpoints (the DNS results for www.example.com) before branching 537 between interface paths, since there are situations when the results 538 will be different across networks due to private names or different 539 supported IP versions. Implementations must be careful to branch in 540 an order that results in usable leaf nodes whenever there are 541 multiple branch types that could be used from a single node. 543 The order of operations for branching should be: 545 1. Alternate Paths 547 2. Protocol Options 549 3. Derived Endpoints 550 where a lower number indicates higher precedence and therefore higher 551 placement in the tree. Branching between paths is the first in the 552 list because results across multiple interfaces are likely not 553 related to one another: endpoint resolution may return different 554 results, especially when using locally resolved host and service 555 names, and which protocols are supported and preferred may differ 556 across interfaces. Thus, if multiple paths are attempted, the 557 overall connection can be seen as a race between the available paths 558 or interfaces. 560 Protocol options are next checked in order. Whether or not a set of 561 protocol, or protocol-specific options, can successfully connect is 562 generally not dependent on which specific IP address is used. 563 Furthermore, the protocol stacks being attempted may influence or 564 altogether change the endpoints being used. Adding a proxy to a 565 connection's branch will change the endpoint to the proxy's IP 566 address or hostname. Choosing an alternate protocol may also modify 567 the ports that should be selected. 569 Branching for derived endpoints is the final step, and may have 570 multiple layers of derivation or resolution, such as DNS service 571 resolution and DNS hostname resolution. 573 For example, if the application has indicated both a preference for 574 WiFi over LTE and for a feature only available in SCTP, branches will 575 be first sorted accord to path selection, with WiFi at the top. 576 Then, branches with SCTP will be sorted to the top within their 577 subtree according to the properties influencing protocol selection. 578 However, if the implementation has current cache information that 579 SCTP is not available on the path over WiFi, there is no SCTP node in 580 the WiFi subtree. Here, the path over WiFi will be tried first, and, 581 if connection establishment succeeds, TCP will be used. So the 582 Selection Property of preferring WiFi takes precedence over the 583 Property that led to a preference for SCTP. 585 1. [www.example.com:80, Any, Any Stream] 586 1.1 [192.0.2.1:80, Wi-Fi, Any Stream] 587 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 588 1.2 [192.0.3.1:80, LTE, Any Stream] 589 1.2.1 [192.0.3.1:80, LTE, SCTP] 590 1.2.2 [192.0.3.1:80, LTE, TCP] 592 4.1.3. Sorting Branches 594 Implementations should sort the branches of the tree of connection 595 options in order of their preference rank, from most preferred to 596 least preferred. Leaf nodes on branches with higher rankings 597 represent connection attempts that will be raced first. 598 Implementations should order the branches to reflect the preferences 599 expressed by the application for its new connection, including 600 Selection Properties, which are specified in 601 [I-D.ietf-taps-interface]. 603 In addition to the properties provided by the application, an 604 implementation may include additional criteria such as cached 605 performance estimates, see Section 9.2, or system policy, see 606 Section 3.2, in the ranking. Two examples of how Selection and 607 Connection Properties may be used to sort branches are provided 608 below: 610 * "Interface Instance or Type": If the application specifies an 611 interface type to be preferred or avoided, implementations should 612 accordingly rank the paths. If the application specifies an 613 interface type to be required or prohibited, an implementation is 614 expeceted to not include the non-conforming paths. 616 * "Capacity Profile": An implementation can use the Capacity Profile 617 to prefer paths that match an application's expected traffic 618 pattern. This match will use cached performance estimates, see 619 Section 9.2: 621 - Scavenger: Prefer paths with the highest expected available 622 capacity, but minimising impact on other traffic, based on the 623 observed maximum throughput; 625 - Low Latency/Interactive: Prefer paths with the lowest expected 626 Round Trip Time, based on observed round trip time estimates; 628 - Low Latency/Non-Interactive: Prefer paths with a low expected 629 Round Trip Time, but can tolerate delay variation; 631 - Constant-Rate Streaming: Prefer paths that are expected to 632 satisy the requested Stream Send or Stream Receive Bitrate, 633 based on the observed maximum throughput; 635 - Capacity-Seeking: Prefer adapting to paths to determine the 636 highest available capacity, based on the observed maximum 637 throughput. 639 Implementations process the Properties in the following order: 640 Prohibit, Require, Prefer, Avoid. If Selection Properties contain 641 any prohibited properties, the implementation should first purge 642 branches containing nodes with these properties. For required 643 properties, it should only keep branches that satisfy these 644 requirements. Finally, it should order the branches according to the 645 preferred properties, and finally use any avoided properties as a 646 tiebreaker. When ordering branches, an implementation can give more 647 weight to properties that the application has explicitly set, than to 648 the properties that are default. 650 The available protocols and paths on a specific system and in a 651 specific context can change; therefore, the result of sorting and the 652 outcome of racing may vary, even when using the same Selection and 653 Connection Properties. However, an implementation ought to provide a 654 consistent outcome to applications, e.g., by preferring protocols and 655 paths that are already used by existing Connections that specified 656 similar Properties. 658 4.2. Candidate Gathering 660 The step of gathering candidates involves identifying which paths, 661 protocols, and endpoints may be used for a given Connection. This 662 list is determined by the requirements, prohibitions, and preferences 663 of the application as specified in the Selection Properties. 665 4.2.1. Gathering Endpoint Candidates 667 Both Local and Remote Endpoint Candidates must be discovered during 668 connection establishment. To support Interactive Connectivity 669 Establishment (ICE) [RFC8445], or similar protocols that involve out- 670 of-band indirect signalling to exchange candidates with the Remote 671 Endpoint, it is important to query the set of candidate Local 672 Endpoints, and provide the protocol stack with a set of candidate 673 Remote Endpoints, before the Local Endpoint attempts to establish 674 connections. 676 4.2.1.1. Local Endpoint candidates 678 The set of possible Local Endpoints is gathered. In the simple case, 679 this merely enumerates the local interfaces and protocols, and 680 allocates ephemeral source ports. For example, a system that has 681 WiFi and Ethernet and supports IPv4 and IPv6 might gather four 682 candidate Local Endpoints (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 683 on WiFi, and IPv6 on WiFi) that can form the source for a transient. 685 If NAT traversal is required, the process of gathering Local 686 Endpoints becomes broadly equivalent to the ICE candidate gathering 687 phase (see Section 5.1.1. of [RFC8445]). The endpoint determines its 688 server reflexive Local Endpoints (i.e., the translated address of a 689 Local Endpoint, on the other side of a NAT, e.g via a STUN sever 690 [RFC5389]) and relayed Local Endpoints (e.g., via a TURN server 691 [RFC5766] or other relay), for each interface and network protocol. 692 These are added to the set of candidate Local Endpoints for this 693 connection. 695 Gathering Local Endpoints is primarily a local operation, although it 696 might involve exchanges with a STUN server to derive server reflexive 697 Local Endpoints, or with a TURN server or other relay to derive 698 relayed Local Endpoints. However, it does not involve communication 699 with the Remote Endpoint. 701 4.2.1.2. Remote Endpoint Candidates 703 The Remote Endpoint is typically a name that needs to be resolved 704 into a set of possible addresses that can be used for communication. 705 Resolving the Remote Endpoint is the process of recursively 706 performing such name lookups, until fully resolved, to return the set 707 of candidates for the Remote Endpoint of this connection. 709 How this resolution is done will depend on the type of the Remote 710 Endpoint, and can also be specific to each Local Endpoint. A common 711 case is when the Remote Endpoint is a DNS name, in which case it is 712 resolved to give a set of IPv4 and IPv6 addresses representing that 713 name. Some types of Remote Endpoint might require more complex 714 resolution. Resolving the Remote Endpoint for a peer-to-peer 715 connection might involve communication with a rendezvous server, 716 which in turn contacts the peer to gain consent to communicate and 717 retrieve its set of candidate Local Endpoints, which are returned and 718 form the candidate remote addresses for contacting that peer. 720 Resolving the Remote Endpoint is not a local operation. It will 721 involve a directory service, and can require communication with the 722 Remote Endpoint to rendezvous and exchange peer addresses. This can 723 expose some or all of the candidate Local Endpoints to the Remote 724 Endpoint. 726 4.3. Candidate Racing 728 The primary goal of the Candidate Racing process is to successfully 729 negotiate a protocol stack to an endpoint over an interface to 730 connect a single leaf node of the tree with as little delay and as 731 few unnecessary connections attempts as possible. Optimizing these 732 two factors improves the user experience, while minimizing network 733 load. 735 This section covers the dynamic aspect of connection establishment. 736 The tree described above is a useful conceptual and architectural 737 model. However, an implementation is unable to know the full tree 738 before it is formed and many of the possible branches ultimately 739 might not be used. 741 There are three different approaches to racing the attempts for 742 different nodes of the connection establishment tree: 744 1. Simultaneous 746 2. Staggered 748 3. Failover 750 Each approach is appropriate in different use-cases and branch types. 751 However, to avoid consuming unnecessary network resources, 752 implementations should not use simultaneous racing as a default 753 approach. 755 The timing algorithms for racing should remain independent across 756 branches of the tree. Any timers or racing logic is isolated to a 757 given parent node, and is not ordered precisely with regards to other 758 children of other nodes. 760 4.3.1. Simultaneous 762 Simultaneous racing is when multiple alternate branches are started 763 without waiting for any one branch to make progress before starting 764 the next alternative. This means the attempts are effectively 765 simultaneous. Simultaneous racing should be avoided by 766 implementations, since it consumes extra network resources and 767 establishes state that might not be used. 769 4.3.2. Staggered 771 Staggered racing can be used whenever a single node of the tree has 772 multiple child nodes. Based on the order determined when building 773 the tree, the first child node will be initiated immediately, 774 followed by the next child node after some delay. Once that second 775 child node is initiated, the third child node (if present) will begin 776 after another delay, and so on until all child nodes have been 777 initiated, or one of the child nodes successfully completes its 778 negotiation. 780 Staggered racing attempts can proceed in parallel. Implementations 781 should not terminate an earlier child connection attempt upon 782 starting a secondary child. 784 If a child node fails to establish connectivity (as in Section 4.4.1) 785 before the delay time has expired for the next child, the next child 786 should be started immediately. 788 Staggered racing between IP addresses for a generic Connection should 789 follow the Happy Eyeballs algorithm described in [RFC8305]. 790 [RFC8421] provides guidance for racing when performing Interactive 791 Connectivity Establishment (ICE). 793 Generally, the delay before starting a given child node ought to be 794 based on the length of time the previously started child node is 795 expected to take before it succeeds or makes progress in connection 796 establishment. Algorithms like Happy Eyeballs choose a delay based 797 on how long the transport connection handshake is expected to take. 798 When performing staggered races in multiple branch types (such as 799 racing between network interfaces, and then racing between IP 800 addresses), a longer delay may be chosen for some branch types. For 801 example, when racing between network interfaces, the delay should 802 also take into account the amount of time it takes to prepare the 803 network interface (such as radio association) and name resolution 804 over that interface, in addition to the delay that would be added for 805 a single transport connection handshake. 807 Since the staggered delay can be chosen based on dynamic information, 808 such as predicted round-trip time, implementations should define 809 upper and lower bounds for delay times. These bounds are 810 implementation-specific, and may differ based on which branch type is 811 being used. 813 4.3.3. Failover 815 If an implementation or application has a strong preference for one 816 branch over another, the branching node may choose to wait until one 817 child has failed before starting the next. Failure of a leaf node is 818 determined by its protocol negotiation failing or timing out; failure 819 of a parent branching node is determined by all of its children 820 failing. 822 An example in which failover is recommended is a race between a 823 protocol stack that uses a proxy and a protocol stack that bypasses 824 the proxy. Failover is useful in case the proxy is down or 825 misconfigured, but any more aggressive type of racing may end up 826 unnecessarily avoiding a proxy that was preferred by policy. 828 4.4. Completing Establishment 830 The process of connection establishment completes when one leaf node 831 of the tree has successfully completed negotiation with the Remote 832 Endpoint, or else all nodes of the tree have failed to connect. The 833 first leaf node to complete its connection is then used by the 834 application to send and receive data. 836 Successes and failures of a given attempt should be reported up to 837 parent nodes (towards the trunk of the tree). For example, in the 838 following case, if 1.1.1 fails to connect, it reports the failure to 839 1.1. Since 1.1 has no other child nodes, it also has failed and 840 reports that failure to 1. Because 1.2 has not yet failed, 1 is not 841 considered to have failed. Since 1.2 has not yet started, it is 842 started and the process continues. Similarly, if 1.1.1 successfully 843 connects, then it marks 1.1 as connected, which propagates to the 844 trunk node 1. At this point, the connection as a whole is considered 845 to be successfully connected and ready to process application data. 847 1 [www.example.com:80, Any, TCP] 848 1.1 [www.example.com:80, Wi-Fi, TCP] 849 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 850 1.2 [www.example.com:80, LTE, TCP] 851 ... 853 If a leaf node has successfully completed its connection, all other 854 attempts should be made ineligible for use by the application for the 855 original request. New connection attempts that involve transmitting 856 data on the network ought not to be started after another leaf node 857 has already successfully completed, because the connection as a whole 858 has now been established. An implementation may choose to let 859 certain handshakes and negotiations complete in order to gather 860 metrics to influence future connections. Keeping additional 861 connections is generally not recommended since those attempts were 862 slower to connect and may exhibit less desirable properties. 864 4.4.1. Determining Successful Establishment 866 Implementations may select the criteria by which a leaf node is 867 considered to be successfully connected differently on a per-protocol 868 basis. If the only protocol being used is a transport protocol with 869 a clear handshake, like TCP, then the obvious choice is to declare 870 that node "connected" when the last packet of the three-way handshake 871 has been received. If the only protocol being used is an 872 connectionless protocol, like UDP, the implementation may consider 873 the node fully "connected" the moment it determines a route is 874 present, before sending any packets on the network, see further 875 Section 4.6. 877 For protocol stacks with multiple handshakes, the decision becomes 878 more nuanced. If the protocol stack involves both TLS and TCP, an 879 implementation could determine that a leaf node is connected after 880 the TCP handshake is complete, or it can wait for the TLS handshake 881 to complete as well. The benefit of declaring completion when the 882 TCP handshake finishes, and thus stopping the race for other branches 883 of the tree, is reduced burden on the network and Remote Endpoints 884 from further connection attempts that are likely to be abandoned. On 885 the other hand, by waiting until the TLS handshake is complete, an 886 implementation avoids the scenario in which a TCP handshake completes 887 quickly, but TLS negotiation is either very slow or fails altogether 888 in particular network conditions or to a particular endpoint. To 889 avoid the issue of TLS possibly failing, the implementation should 890 not generate a Ready event for the Connection until TLS is 891 established. 893 If all of the leaf nodes fail to connect during racing, i.e. none of 894 the configurations that satisfy all requirements given in the 895 Transport Properties actually work over the available paths, then the 896 transport system should notify the application with an InitiateError 897 event. An InitiateError event should also be generated in case the 898 transport system finds no usable candidates to race. 900 4.5. Establishing multiplexed connections 902 Multiplexing several Connections over a single underlying transport 903 connection requires that the Connections to be multiplexed belong to 904 the same Connection Group (as is indicated by the application using 905 the Clone call). When the underlying transport connection supports 906 multi-streaming, the Transport Services System can map each 907 Connection in the Connection Group to a different stream. Thus, when 908 the Connections that are offered to an application by the Transport 909 Services API are multiplexed, the Transport Services implementation 910 can establish a new Connection by simply beginning to use a new 911 stream of an already established transport Connection and there is no 912 need for a connection establishment procedure. This, then, also 913 means that there may not be any "establishment" message (like a TCP 914 SYN), but the application can simply start sending or receiving. 915 Therefore, when the Initiate action of a Transport Services API is 916 called without Messages being handed over, it cannot be guaranteed 917 that the Remote Endpoint will have any way to know about this, and 918 hence a passive endpoint's ConnectionReceived event might not be 919 called until data is received. Instead, calling the 920 ConnectionReceived event could be delayed until the first Message 921 arrives. 923 4.6. Handling connectionless protocols 925 While protocols that use an explicit handshake to validate a 926 Connection to a peer can be used for racing multiple establishment 927 attempts in parallel, connectionless protocols such as raw UDP do not 928 offer a way to validate the presence of a peer or the usability of a 929 Connection without application feedback. An implementation should 930 consider such a protocol stack to be established as soon as the 931 Transport Services system has selected a path on which to send data. 933 However, if a peer is not reachable over the network using the 934 connectionless protocol, or data cannot be exchanged for any other 935 reason, the application may want to attempt using another candidate 936 Protocol Stack. The implementation should maintain the list of other 937 candidate Protocol Stacks that were eligible to use. 939 4.7. Implementing listeners 941 When an implementation is asked to Listen, it registers with the 942 system to wait for incoming traffic to the Local Endpoint. If no 943 Local Endpoint is specified, the implementation should use an 944 ephemeral port. 946 If the Selection Properties do not require a single network interface 947 or path, but allow the use of multiple paths, the Listener object 948 should register for incoming traffic on all of the network interfaces 949 or paths that conform to the Properties. The set of available paths 950 can change over time, so the implementation should monitor network 951 path changes, and change the registration of the Listener across all 952 usable paths as appropriate. When using multiple paths, the Listener 953 is generally expected to use the same port for listening on each. 955 If the Selection Properties allow multiple protocols to be used for 956 listening, and the implementation supports it, the Listener object 957 should support receiving inbound connections for each eligible 958 protocol on each eligible path. 960 4.7.1. Implementing listeners for Connected Protocols 962 Connected protocols such as TCP and TLS-over-TCP have a strong 963 mapping between the Local and Remote Endpoints (four-tuple) and their 964 protocol connection state. These map into Connection objects. 965 Whenever a new inbound handshake is being started, the Listener 966 should generate a new Connection object and pass it to the 967 application. 969 4.7.2. Implementing listeners for Connectionless Protocols 971 Connectionless protocols such as UDP and UDP-lite generally do not 972 provide the same mechanisms that connected protocols do to offer 973 Connection objects. Implementations should wait for incoming packets 974 for connectionless protocols on a listening port and should perform 975 four-tuple matching of packets to either existing Connection objects 976 or the creation of new Connection objects. On platforms with 977 facilities to create a "virtual connection" for connectionless 978 protocols implementations should use these mechanisms to minimise the 979 handling of datagrams intended for already created Connection 980 objects. 982 4.7.3. Implementing listeners for Multiplexed Protocols 984 Protocols that provide multiplexing of streams into a single four- 985 tuple can listen both for entirely new connections (a new HTTP/2 986 stream on a new TCP connection, for example) and for new sub- 987 connections (a new HTTP/2 stream on an existing connection). If the 988 abstraction of Connection presented to the application is mapped to 989 the multiplexed stream, then the Listener should deliver new 990 Connection objects in the same way for either case. The 991 implementation should allow the application to introspect the 992 Connection Group marked on the Connections to determine the grouping 993 of the multiplexing. 995 5. Implementing Sending and Receiving Data 997 The most basic mapping for sending a Message is an abstraction of 998 datagrams, in which the transport protocol naturally deals in 999 discrete packets. Each Message here corresponds to a single 1000 datagram. Generally, these will be short enough that sending and 1001 receiving will always use a complete Message. 1003 For protocols that expose byte-streams, the only delineation provided 1004 by the protocol is the end of the stream in a given direction. Each 1005 Message in this case corresponds to the entire stream of bytes in a 1006 direction. These Messages may be quite long, in which case they can 1007 be sent in multiple parts. 1009 Protocols that provide the framing (such as length-value protocols, 1010 or protocols that use delimiters) may support Message sizes that do 1011 not fit within a single datagram. Each Message for framing protocols 1012 corresponds to a single frame, which may be sent either as a complete 1013 Message in the underlying protocol, or in multiple parts. 1015 5.1. Sending Messages 1017 The effect of the application sending a Message is determined by the 1018 top-level protocol in the established Protocol Stack. That is, if 1019 the top-level protocol provides an abstraction of framed messages 1020 over a connection, the receiving application will be able to obtain 1021 multiple Messages on that connection, even if the framing protocol is 1022 built on a byte-stream protocol like TCP. 1024 5.1.1. Message Properties 1026 * Lifetime: this should be implemented by removing the Message from 1027 the queue of pending Messages after the Lifetime has expired. A 1028 queue of pending Messages within the transport system 1029 implementation that have yet to be handed to the Protocol Stack 1030 can always support this property, but once a Message has been sent 1031 into the send buffer of a protocol, only certain protocols may 1032 support removing a message. For example, an implementation cannot 1033 remove bytes from a TCP send buffer, while it can remove data from 1034 a SCTP send buffer using the partial reliability extension 1035 [RFC8303]. When there is no standing queue of Messages within the 1036 system, and the Protocol Stack does not support the removal of a 1037 Message from the stack's send buffer, this property may be 1038 ignored. 1040 * Priority: this represents the ability to prioritize a Message over 1041 other Messages. This can be implemented by the system re-ordering 1042 Messages that have yet to be handed to the Protocol Stack, or by 1043 giving relative priority hints to protocols that support 1044 priorities per Message. For example, an implementation of HTTP/2 1045 could choose to send Messages of different Priority on streams of 1046 different priority. 1048 * Ordered: when this is false, this disables the requirement of in- 1049 order-delivery for protocols that support configurable ordering. 1050 When the protocol stack does not support configurable ordering, 1051 this property may be ignored. 1053 * Safely Replayable: when this is true, this means that the Message 1054 can be used by a transport mechanism that might transfer it 1055 multiple times - e.g., as a result of racing multiple transports 1056 or as part of TCP Fast Open. Also, protocols that do not protect 1057 against duplicated messages, such as UDP (when used directly, 1058 without a protocol layered atop), can only be used with Messages 1059 that are Safely Replayable. When a transport system is permitted 1060 to replay messages, replay protection could be provided by the 1061 application. 1063 * Final: when this is true, this means that the sender will not send 1064 any further messages. The Connection need not be closed (in case 1065 the Protocol Stack supports half-close operation, like TCP). Any 1066 messages sent after a Final message will result in a SendError. 1068 * Corruption Protection Length: when this is set to any value other 1069 than Full Coverage, it sets the minimum protection in protocols 1070 that allow limiting the checksum length (e.g. UDP-Lite). If the 1071 protocol stack does not support checksum length limitation, this 1072 property may be ignored. 1074 * Reliable Data Transfer (Message): When true, the property 1075 specifies that the Message must be reliably transmitted. When 1076 false, and if unreliable transmission is supported by the 1077 underlying protocol, then the Message should be unreliably 1078 transmitted. If the underlying protocol does not support 1079 unreliable transmission, the Message should be reliably 1080 transmitted. 1082 * Message Capacity Profile Override: When true, this expresses a 1083 wish to override the Generic Connection Property Capacity Profile 1084 for this Message. Depending on the value, this can, for example, 1085 be implemented by changing the DSCP value of the associated packet 1086 (note that the guidelines in Section 6 of [RFC7657] apply; e.g., 1087 the DSCP value should not be changed for different packets within 1088 a reliable transport protocol session or DCCP connection). 1090 * No Fragmentation: When set, this property limits the message size 1091 to the Maximum Message Size Before Fragmentation or Segmentation 1092 (see Section 10.1.7 of [I-D.ietf-taps-interface]). Messages 1093 larger than this size generate an error. Setting this avoids 1094 transport-layer segmentation or network-layer fragmentation. When 1095 used with transports running over IP version 4 the Don't Fragment 1096 bit will be set to avoid on-path IP fragmentation ([RFC8304]). 1098 5.1.2. Send Completion 1100 The application should be notified whenever a Message or partial 1101 Message has been consumed by the Protocol Stack, or has failed to 1102 send. The time at which a Message is considered to have been 1103 consumed by the Protocol Stack may vary depending on the protocol. 1104 For example, for a basic datagram protocol like UDP, this may 1105 correspond to the time when the packet is sent into the interface 1106 driver. For a protocol that buffers data in queues, like TCP, this 1107 may correspond to when the data has entered the send buffer. The 1108 time at which a message failed to send is when Transport Services 1109 implementation (including the Protocol Stack) has not successfully 1110 sent the entire Message content or partial Message content on any 1111 open candidate connection; this can depend on protocol-specific 1112 timeouts. 1114 5.1.3. Batching Sends 1116 Since sending a Message may involve a context switch between the 1117 application and the Transport Services system, sending patterns that 1118 involve multiple small Messages can incur high overhead if each needs 1119 to be enqueued separately. To avoid this, the application can 1120 indicate a batch of Send actions through the API. When this is used, 1121 the implementation can defer the processing of Messages until the 1122 batch is complete. 1124 5.2. Receiving Messages 1126 Similar to sending, Receiving a Message is determined by the top- 1127 level protocol in the established Protocol Stack. The main 1128 difference with Receiving is that the size and boundaries of the 1129 Message are not known beforehand. The application can communicate in 1130 its Receive action the parameters for the Message, which can help the 1131 Transport Services implementation know how much data to deliver and 1132 when. For example, if the application only wants to receive a 1133 complete Message, the implementation should wait until an entire 1134 Message (datagram, stream, or frame) is read before delivering any 1135 Message content to the application. This requires the implementation 1136 to understand where messages end, either via a supplied deframer or 1137 because the top-level protocol in the established Protocol Stack 1138 preserves message boundaries. If the top-level protocol only 1139 supports a byte-stream and no framers were supported, the application 1140 can control the flow of received data by specifying the minimum 1141 number of bytes of Message content it wants to receive at one time. 1143 If a Connection finishes before a requested Receive action can be 1144 satisfied, the Transport Services API should deliver any partial 1145 Message content outstanding, or if none is available, an indication 1146 that there will be no more received Messages. 1148 5.3. Handling of data for fast-open protocols 1150 Several protocols allow sending higher-level protocol or application 1151 data during their protocol establishment, such as TCP Fast Open 1152 [RFC7413] and TLS 1.3 [RFC8446]. This approach is referred to as 1153 sending Zero-RTT (0-RTT) data. This is a desirable feature, but 1154 poses challenges to an implementation that uses racing during 1155 connection establishment. 1157 The amount of data that can be sent as 0-RTT data varies by protocol 1158 and can be queried by the application using the Maximum Message Size 1159 Concurrent with Connection Establishment Connection Property. An 1160 implementation can set this property according to the protocols that 1161 it will race based on the given Selection Properties when the 1162 application requests to establish a connection. 1164 If the application has 0-RTT data to send in any protocol handshakes, 1165 it needs to provide this data before the handshakes have begun. When 1166 racing, this means that the data should be provided before the 1167 process of connection establishment has begun. If the application 1168 wants to send 0-RTT data, it must indicate this to the implementation 1169 by setting the Safely Replayable send parameter to true when sending 1170 the data. In general, 0-RTT data may be replayed (for example, if a 1171 TCP SYN contains data, and the SYN is retransmitted, the data will be 1172 retransmitted as well but may be considered as a new connection 1173 instead of a retransmission). Also, when racing connections, 1174 different leaf nodes have the opportunity to send the same data 1175 independently. If data is truly safely replayable, this should be 1176 permissible. 1178 Once the application has provided its 0-RTT data, a Transport 1179 Services implementation should keep a copy of this data and provide 1180 it to each new leaf node that is started and for which a 0-RTT 1181 protocol is being used. 1183 It is also possible that protocol stacks within a particular leaf 1184 node use 0-RTT handshakes without any safely replayable application 1185 data. For example, TCP Fast Open could use a Client Hello from TLS 1186 as its 0-RTT data, shortening the cumulative handshake time. 1188 0-RTT handshakes often rely on previous state, such as TCP Fast Open 1189 cookies, previously established TLS tickets, or out-of-band 1190 distributed pre-shared keys (PSKs). Implementations should be aware 1191 of security concerns around using these tokens across multiple 1192 addresses or paths when racing. In the case of TLS, any given ticket 1193 or PSK should only be used on one leaf node, since servers will 1194 likely reject duplicate tickets in order to prevent replays (see 1195 section-8.1 [RFC8446]). If implementations have multiple tickets 1196 available from a previous connection, each leaf node attempt can use 1197 a different ticket. In effect, each leaf node will send the same 1198 early application data, yet encoded (encrypted) differently on the 1199 wire. 1201 6. Implementing Message Framers 1203 Message Framers are functions that define simple transformations 1204 between application Message data and raw transport protocol data. A 1205 Framer can encapsulate or encode outbound Messages, and decapsulate 1206 or decode inbound data into Messages. 1208 While many protocols can be represented as Message Framers, for the 1209 purposes of the Transport Services API, these are ways for 1210 applications or application frameworks to define their own Message 1211 parsing to be included within a Connection's Protocol Stack. As an 1212 example, TLS is exposed as a protocol natively supported by the 1213 Transport Services API, even though it could also serve the purpose 1214 of framing data over TCP. 1216 Most Message Framers fall into one of two categories: 1218 * Header-prefixed record formats, such as a basic Type-Length-Value 1219 (TLV) structure 1221 * Delimiter-separated formats, such as HTTP/1.1. 1223 Common Message Framers can be provided by a Transport Services 1224 implementation, but an implementation ought to allow custom Message 1225 Framers to be defined by the application or some other piece of 1226 software. This section describes one possible API for defining 1227 Message Framers as an example. 1229 6.1. Defining Message Framers 1231 A Message Framer is primarily defined by the code that handles events 1232 for a framer implementation, specifically how it handles inbound and 1233 outbound data parsing. The function that implements custom framing 1234 logic will be referred to as the "framer implementation", which may 1235 be provided by a Transport Services implementation or the application 1236 itself. The Message Framer refers to the object or function within 1237 the main Connection implementation that delivers events to the custom 1238 framer implementation whenever data is ready to be parsed or framed. 1240 When a Connection establishment attempt begins, an event can be 1241 delivered to notify the framer implementation that a new Connection 1242 is being created. Similarly, a stop event can be delivered when a 1243 Connection is being torn down. The framer implementation can use the 1244 Connection object to look up specific properties of the Connection or 1245 the network being used that may influence how to frame Messages. 1247 MessageFramer -> Start(Connection) 1248 MessageFramer -> Stop(Connection) 1250 When a Message Framer generates a Start event, the framer 1251 implementation has the opportunity to start writing some data prior 1252 to the Connection delivering its Ready event. This allows the 1253 implementation to communicate control data to the Remote Endpoint 1254 that can be used to parse Messages. 1256 MessageFramer.MakeConnectionReady(Connection) 1258 Similarly, when a Message Framer generates a Stop event, the framer 1259 implementation has the opportunity to write some final data or clear 1260 up its local state before the Closed event is delivered to the 1261 Application. The framer implementation can indicate that it has 1262 finished with this. 1264 MessageFramer.MakeConnectionClosed(Connection) 1266 At any time if the implementation encounters a fatal error, it can 1267 also cause the Connection to fail and provide an error. 1269 MessageFramer.FailConnection(Connection, Error) 1271 Should the framer implementation deem the candidate selected during 1272 racing unsuitable, it can signal this to the Transport Services API 1273 by failing the Connection prior to marking it as ready. If there are 1274 no other candidates available, the Connection will fail. Otherwise, 1275 the Connection will select a different candidate and the Message 1276 Framer will generate a new Start event. 1278 Before an implementation marks a Message Framer as ready, it can also 1279 dynamically add a protocol or framer above it in the stack. This 1280 allows protocols that need to add TLS conditionally, like STARTTLS 1281 [RFC3207], to modify the Protocol Stack based on a handshake result. 1283 otherFramer := NewMessageFramer() 1284 MessageFramer.PrependFramer(Connection, otherFramer) 1286 A Message Framer might also choose to go into a passthrough mode once 1287 an initial exchange or handshake has been completed, such as the 1288 STARTTLS case mentioned above. This can also be useful for proxy 1289 protocols like SOCKS [RFC1928] or HTTP CONNECT [RFC7230]. In such 1290 cases, a Message Framer implementation can intercept sending and 1291 receiving of messages at first, but then indicate that no more 1292 processing is needed. 1294 MessageFramer.StartPassthrough() 1296 6.2. Sender-side Message Framing 1298 Message Framers generate an event whenever a Connection sends a new 1299 Message. 1301 MessageFramer -> NewSentMessage 1303 Upon receiving this event, a framer implementation is responsible for 1304 performing any necessary transformations and sending the resulting 1305 data back to the Message Framer, which will in turn send it to the 1306 next protocol. Implementations SHOULD ensure that there is a way to 1307 pass the original data through without copying to improve 1308 performance. 1310 MessageFramer.Send(Connection, Data) 1312 To provide an example, a simple protocol that adds a length as a 1313 header would receive the NewSentMessage event, create a data 1314 representation of the length of the Message data, and then send a 1315 block of data that is the concatenation of the length header and the 1316 original Message data. 1318 6.3. Receiver-side Message Framing 1320 In order to parse a received flow of data into Messages, the Message 1321 Framer notifies the framer implementation whenever new data is 1322 available to parse. 1324 MessageFramer -> HandleReceivedData 1325 Upon receiving this event, the framer implementation can inspect the 1326 inbound data. The data is parsed from a particular cursor 1327 representing the unprocessed data. The application requests a 1328 specific amount of data it needs to have available in order to parse. 1329 If the data is not available, the parse fails. 1331 MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage) 1333 The framer implementation can directly advance the receive cursor 1334 once it has parsed data to effectively discard data (for example, 1335 discard a header once the content has been parsed). 1337 To deliver a Message to the application, the framer implementation 1338 can either directly deliver data that it has allocated, or deliver a 1339 range of data directly from the underlying transport and 1340 simultaneously advance the receive cursor. 1342 MessageFramer.AdvanceReceiveCursor(Connection, Length) 1343 MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) 1344 MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) 1346 Note that MessageFramer.DeliverAndAdvanceReceiveCursor allows the 1347 framer implementation to earmark bytes as part of a Message even 1348 before they are received by the transport. This allows the delivery 1349 of very large Messages without requiring the implementation to 1350 directly inspect all of the bytes. 1352 To provide an example, a simple protocol that parses a length as a 1353 header value would receive the HandleReceivedData event, and call 1354 Parse with a minimum and maximum set to the length of the header 1355 field. Once the parse succeeded, it would call AdvanceReceiveCursor 1356 with the length of the header field, and then call 1357 DeliverAndAdvanceReceiveCursor with the length of the body that was 1358 parsed from the header, marking the new Message as complete. 1360 7. Implementing Connection Management 1362 Once a Connection is established, the Transport Services API allows 1363 applications to interact with the Connection by modifying or 1364 inspecting Connection Properties. A Connection can also generate 1365 events in the form of Soft Errors. 1367 The set of Connection Properties that are supported for setting and 1368 getting on a Connection are described in [I-D.ietf-taps-interface]. 1369 For any properties that are generic, and thus could apply to all 1370 protocols being used by a Connection, the Transport Services 1371 implementation should store the properties in storage common to all 1372 protocols, and notify all protocol instances in the Protocol Stack 1373 whenever the properties have been modified by the application. For 1374 protocol-specfic properties, such as the User Timeout that applies to 1375 TCP, the Transport Services implementation only needs to update the 1376 relevant protocol instance. 1378 If an error is encountered in setting a property (for example, if the 1379 application tries to set a TCP-specific property on a Connection that 1380 is not using TCP), the action should fail gracefully. The 1381 application may be informed of the error, but the Connection itself 1382 should not be terminated. 1384 The Transport Services API should allow protocol instances in the 1385 Protocol Stack to pass up arbitrary generic or protocol-specific 1386 errors that can be delivered to the application as Soft Errors. 1387 These allow the application to be informed of ICMP errors, and other 1388 similar events. 1390 7.1. Pooled Connection 1392 For applications that do not need in-order delivery of Messages, the 1393 Transport Services implementation may distribute Messages of a single 1394 Connection across several underlying transport connections or 1395 multiple streams of multi-streaming connections between endpoints, as 1396 long as all of these satisfy the Selection Properties. The Transport 1397 Services implementation will then hide this connection management and 1398 only expose a single Connection object, which we here call a "Pooled 1399 Connection". This is in contrast to Connection Groups, which 1400 explicitly expose combined treatment of Connections, giving the 1401 application control over multiplexing, for example. 1403 Pooled Connections can be useful when the application using the 1404 Transport Services system implements a protocol such as HTTP, which 1405 employs request/response pairs and does not require in-order delivery 1406 of responses. This enables implementations of Transport Services 1407 systems to realize transparent connection coalescing, connection 1408 migration, and to perform per-message endpoint and path selection by 1409 choosing among multiple underlying connections. 1411 7.2. Handling Path Changes 1413 When a path change occurs, e.g., when the IP address of an interface 1414 changes or a new interface becomes available, the Transport Services 1415 implementation is responsible for notifying the Protocol Instance of 1416 the change. The path change may interrupt connectivity on a path for 1417 an active connection or provide an opportunity for a transport that 1418 supports multipath or migration to adapt to the new paths. Note 1419 that, in the model of the Transport Services API, migration is 1420 considered a part of multipath connectivity; it is just a limiting 1421 policy on multipath usage. If the multipath Selection Property is 1422 set to Disabled, migration is disallowed. 1424 For protocols that do not support multipath or migration, the 1425 Protocol Instances should be informed of the path change, but should 1426 not be forcibly disconnected if the previously used path becomes 1427 unavailable. There are many common user scenarios that can lead to a 1428 path becoming temporarily unavailable, and then recovering before the 1429 transport protocol reaches a timeout error. These are particularly 1430 common using mobile devices. Examples include: an Ethernet cable 1431 becoming unplugged and then plugged back in; a device losing a Wi-Fi 1432 signal while a user is in an elevator, and reattaching when the user 1433 leaves the elevator; and a user losing the radio signal while riding 1434 a train through a tunnel. If the device is able to rejoin a network 1435 with the same IP address, a stateful transport connection can 1436 generally resume. Thus, while it is useful for a Protocol Instance 1437 to be aware of a temporary loss of connectivity, the Transport 1438 Services implementation should not aggressively close connections in 1439 these scenarios. 1441 If the Protocol Stack includes a transport protocol that supports 1442 multipath connectivity, the Transport Services implementation should 1443 also inform the Protocol Instance of potentially new paths that 1444 become permissible based on the multipath Selection Property and the 1445 multipath-policy Connection Property choices made by the application. 1446 A protocol can then establish new subflows over new paths while an 1447 active path is still available or, if migration is supported, also 1448 after a break has been detected, and should attempt to tear down 1449 subflows over paths that are no longer used. The Connection Property 1450 multipath-policy of the Transport Services API allows an application 1451 to indicate when and how different paths should be used. However, 1452 detailed handling of these policies is still implementation-specific. 1453 For example, if the multipath Selection Property is set to active, 1454 the decision about when to create a new path or to announce a new 1455 path or set of paths to the Remote Endpoint, e.g., in the form of 1456 additional IP addresses, is implementation-specific. If the Protocol 1457 Stack includes a transport protocol that does not support multipath, 1458 but does support migrating between paths, the update to the set of 1459 available paths can trigger the connection to be migrated. 1461 In case of Pooled Connections Section 7.1, the Transport Services 1462 implementation may add connections over new paths to the pool if 1463 permissible based on the multipath policy and Selection Properties. 1464 In case a previously used path becomes unavailable, the transport 1465 system may disconnect all connections that require this path, but 1466 should not disconnect the pooled connection object exposed to the 1467 application. The strategy to do so is implementation-specific, but 1468 should be consistent with the behavior of multipath transports. 1470 8. Implementing Connection Termination 1472 With TCP, when an application closes a connection, this means that it 1473 has no more data to send (but expects all data that has been handed 1474 over to be reliably delivered). However, with TCP only, "close" does 1475 not mean that the application will stop receiving data. This is 1476 related to TCP's ability to support half-closed connections. 1478 SCTP is an example of a protocol that does not support such half- 1479 closed connections. Hence, with SCTP, the meaning of "close" is 1480 stricter: an application has no more data to send (but expects all 1481 data that has been handed over to be reliably delivered), and will 1482 also not receive any more data. 1484 Implementing a protocol independent transport system means that the 1485 exposed semantics must be the strictest subset of the semantics of 1486 all supported protocols. Hence, as is common with all reliable 1487 transport protocols, after a Close action, the application can expect 1488 to have its reliability requirements honored regarding the data 1489 provided to the Transport Services API, but it cannot expect to be 1490 able to read any more data after calling Close. 1492 Abort differs from Close only in that no guarantees are given 1493 regarding any data that the application sent to the Transport 1494 Services API before calling Abort. 1496 As explained in Section 4.5, when a new stream is multiplexed on an 1497 already existing connection of a Transport Protocol Instance, there 1498 is no need for a connection establishment procedure. Because the 1499 Connections that are offered by a Transport Services implementation 1500 can be implemented as streams that are multiplexed on a transport 1501 protocol's connection, it can therefore not be guaranteed an Initiate 1502 action from one endpoint provokes a ConnectionReceived event at its 1503 peer. 1505 For Close (provoking a Finished event) and Abort (provoking a 1506 ConnectionError event), the same logic applies: while it is desirable 1507 to be informed when a peer closes or aborts a Connection, whether 1508 this is possible depends on the underlying protocol, and no 1509 guarantees can be given. With SCTP, the transport system can use the 1510 stream reset procedure to cause a Finish event upon a Close action 1511 from the peer [NEAT-flow-mapping]. 1513 9. Cached State 1515 Beyond a single Connection's lifetime, it is useful for an 1516 implementation to keep state and history. This cached state can help 1517 improve future Connection establishment due to re-using results and 1518 credentials, and favoring paths and protocols that performed well in 1519 the past. 1521 Cached state may be associated with different endpoints for the same 1522 Connection, depending on the protocol generating the cached content. 1523 For example, session tickets for TLS are associated with specific 1524 endpoints, and thus should be cached based on a Connection's hostname 1525 endpoint (if applicable). However, performance characteristics of a 1526 path are more likely tied to the IP address and subnet being used. 1528 9.1. Protocol state caches 1530 Some protocols will have long-term state to be cached in association 1531 with endpoints. This state often has some time after which it is 1532 expired, so the implementation should allow each protocol to specify 1533 an expiration for cached content. 1535 Examples of cached protocol state include: 1537 * The DNS protocol can cache resolution answers (A and AAAA queries, 1538 for example), associated with a Time To Live (TTL) to be used for 1539 future hostname resolutions without requiring asking the DNS 1540 resolver again. 1542 * TLS caches session state and tickets based on a hostname, which 1543 can be used for resuming sessions with a server. 1545 * TCP can cache cookies for use in TCP Fast Open. 1547 Cached protocol state is primarily used during Connection 1548 establishment for a single Protocol Stack, but may be used to 1549 influence an implementation's preference between several candidate 1550 Protocol Stacks. For example, if two IP address endpoints are 1551 otherwise equally preferred, an implementation may choose to attempt 1552 a connection to an address for which it has a TCP Fast Open cookie. 1554 Applications can use the Transport Services API to request that a 1555 Connection Group maintain a separate cache for protocol state. 1556 Connections in the group will not use cached state from connections 1557 outside the group, and connections outside the group will not use 1558 state cached from connections inside the group. This may be 1559 necessary, for example, if application-layer identifiers rotate and 1560 clients wish to avoid linkability via trackable TLS tickets or TFO 1561 cookies. 1563 9.2. Performance caches 1565 In addition to protocol state, Protocol Instances should provide data 1566 into a performance-oriented cache to help guide future protocol and 1567 path selection. Some performance information can be gathered 1568 generically across several protocols to allow predictive comparisons 1569 between protocols on given paths: 1571 * Observed Round Trip Time 1573 * Connection Establishment latency 1575 * Connection Establishment success rate 1577 These items can be cached on a per-address and per-subnet 1578 granularity, and averaged between different values. The information 1579 should be cached on a per-network basis, since it is expected that 1580 different network attachments will have different performance 1581 characteristics. Besides Protocol Instances, other system entities 1582 may also provide data into performance-oriented caches. This could 1583 for instance be signal strength information reported by radio modems 1584 like Wi-Fi and mobile broadband or information about the battery- 1585 level of the device. Furthermore, the system may cache the observed 1586 maximum throughput on a path as an estimate of the available 1587 bandwidth. 1589 An implementation should use this information, when possible, to 1590 influence preference between candidate paths, endpoints, and protocol 1591 options. Eligible options that historically had significantly better 1592 performance than others should be selected first when gathering 1593 candidates (see Section 4.2) to ensure better performance for the 1594 application. 1596 The reasonable lifetime for cached performance values will vary 1597 depending on the nature of the value. Certain information, like the 1598 connection establishment success rate to a Remote Endpoint using a 1599 given protocol stack, can be stored for a long period of time (hours 1600 or longer), since it is expected that the capabilities of the Remote 1601 Endpoint are not changing very quickly. On the other hand, the Round 1602 Trip Time observed by TCP over a particular network path may vary 1603 over a relatively short time interval. For such values, the 1604 implementation should remove them from the cache more quickly, or 1605 treat older values with less confidence/weight. 1607 [I-D.ietf-tcpm-2140bis] provides guidance about sharing of TCP 1608 Control Block information between connections on initialization. 1610 10. Specific Transport Protocol Considerations 1612 Each protocol that is supported by a Transport Services 1613 implementation should have a well-defined API mapping. API mappings 1614 for a protocol are important for Connections in which a given 1615 protocol is the "top" of the Protocol Stack. For example, the 1616 mapping of the Send function for TCP applies to Connections in which 1617 the application directly sends over TCP. 1619 Each protocol has a notion of Connectedness. Possible values for 1620 Connectedness are: 1622 * Connectionless. Connectionless protocols do not establish 1623 explicit state between endpoints, and do not perform a handshake 1624 during Connection establishment. 1626 * Connected. Connected protocols establish state between endpoints, 1627 and perform a handshake during Connection establishment. The 1628 handshake may be 0-RTT to send data or resume a session, but 1629 bidirectional traffic is required to confirm connectedness. 1631 * Multiplexing Connected. Multiplexing Connected protocols share 1632 properties with Connected protocols, but also explictly support 1633 opening multiple application-level flows. This means that they 1634 can support cloning new Connection objects without a new explicit 1635 handshake. 1637 Protocols also define a notion of Data Unit. Possible values for 1638 Data Unit are: 1640 * Byte-stream. Byte-stream protocols do not define any Message 1641 boundaries of their own apart from the end of a stream in each 1642 direction. 1644 * Datagram. Datagram protocols define Message boundaries at the 1645 same level of transmission, such that only complete (not partial) 1646 Messages are supported. 1648 * Message. Message protocols support Message boundaries that can be 1649 sent and received either as complete or partial Messages. Maximum 1650 Message lengths can be defined, and Messages can be partially 1651 reliable. 1653 Below, terms in capitals with a dot (e.g., "CONNECT.SCTP") refer to 1654 the primitives with the same name in section 4 of [RFC8303]. For 1655 further implementation details, the description of these primitives 1656 in [RFC8303] points to section 3 of [RFC8303] and section 3 of 1657 [RFC8304], which refers back to the relevant specifications for each 1658 protocol. This back-tracking method applies to all elements of 1659 [RFC8923] (see appendix D of [I-D.ietf-taps-interface]): they are 1660 listed in appendix A of [RFC8923] with an implementation hint in the 1661 same style, pointing back to section 4 of [RFC8303]. 1663 This document defines the API mappings for protocols defined in 1664 [RFC8923]. Other protocol mappings can be provided as separate 1665 documents, following the mapping template Appendix A. 1667 10.1. TCP 1669 Connectedness: Connected 1671 Data Unit: Byte-stream 1673 API mappings for TCP are as follows: 1675 Connection Object: TCP connections between two hosts map directly to 1676 Connection objects. 1678 Initiate: CONNECT.TCP. Calling Initiate on a TCP Connection causes 1679 it to reserve a local port, and send a SYN to the Remote Endpoint. 1681 InitiateWithSend: CONNECT.TCP with parameter user message. Early 1682 safely replayable data is sent on a TCP Connection in the SYN, as 1683 TCP Fast Open data. 1685 Ready: A TCP Connection is ready once the three-way handshake is 1686 complete. 1688 InitiateError: Failure of CONNECT.TCP. TCP can throw various errors 1689 during connection setup. Specifically, it is important to handle 1690 a RST being sent by the peer during the handshake. 1692 ConnectionError: Once established, TCP throws errors whenever the 1693 connection is disconnected, such as due to receiving a RST from 1694 the peer. 1696 Listen: LISTEN.TCP. Calling Listen for TCP binds a local port and 1697 prepares it to receive inbound SYN packets from peers. 1699 ConnectionReceived: TCP Listeners will deliver new connections once 1700 they have replied to an inbound SYN with a SYN-ACK. 1702 Clone: Calling Clone on a TCP Connection creates a new Connection 1703 with equivalent parameters. These Connections, and Connections 1704 generated via later calls to Clone on an Establied Connection, 1705 form a Connection Group. To realize entanglement for these 1706 Connections, with the exception of Connection Priority, changing a 1707 Connection Property on one of them must affect the Connection 1708 Properties of the others too. No guarantees of honoring the 1709 Connection Property Connection Priority are given, and thus it is 1710 safe for an implementation of a transport system to ignore this 1711 property. When it is reasonable to assume that Connections 1712 traverse the same path (e.g., when they share the same 1713 encapsulation), support for it can also experimentally be 1714 implemented using a congestion control coupling mechanism (see for 1715 example [TCP-COUPLING] or [RFC3124]). 1717 Send: SEND.TCP. TCP does not on its own preserve Message 1718 boundaries. Calling Send on a TCP connection lays out the bytes 1719 on the TCP send stream without any other delineation. Any Message 1720 marked as Final will cause TCP to send a FIN once the Message has 1721 been completely written, by calling CLOSE.TCP immediately upon 1722 successful termination of SEND.TCP. Note that transmitting a 1723 Message marked as Final should not cause the Closed event to be 1724 delivered to the application, as it will still be possible to 1725 receive data until the peer closes or aborts the TCP connection. 1727 Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without 1728 any Message delineation. All data delivered in the Received or 1729 ReceivedPartial event will be part of a single stream-wide Message 1730 that is marked Final (unless a Message Framer is used). 1731 EndOfMessage will be delivered when the TCP Connection has 1732 received a FIN (CLOSE-EVENT.TCP) from the peer. Note that 1733 reception of a FIN should not cause the Closed event to be 1734 delivered to the application, as it will still be possible for the 1735 application to send data. 1737 Close: Calling Close on a TCP Connection indicates that the 1738 Connection should be gracefully closed (CLOSE.TCP) by sending a 1739 FIN to the peer. It will then still be possible to receive data 1740 until the peer closes or aborts the TCP connection. The Closed 1741 event will be issued upon reception of a FIN. 1743 Abort: Calling Abort on a TCP Connection indicates that the 1744 Connection should be immediately closed by sending a RST to the 1745 peer (ABORT.TCP). 1747 10.2. MPTCP 1749 Connectedness: Connected 1751 Data Unit: Byte-stream 1753 the Transport Services API mappings for MPTCP are identical to TCP. 1754 MPTCP adds support for multipath properties, such as "Multipath 1755 Transport" and "Policy for using Multipath Transports". 1757 10.3. UDP 1759 Connectedness: Connectionless 1761 Data Unit: Datagram 1763 API mappings for UDP are as follows: 1765 Connection Object: UDP connections represent a pair of specific IP 1766 addresses and ports on two hosts. 1768 Initiate: CONNECT.UDP. Calling Initiate on a UDP Connection causes 1769 it to reserve a local port, but does not generate any traffic. 1771 InitiateWithSend: Early data on a UDP Connection does not have any 1772 special meaning. The data is sent whenever the Connection is 1773 Ready. 1775 Ready: A UDP Connection is ready once the system has reserved a 1776 local port and has a path to send to the Remote Endpoint. 1778 InitiateError: UDP Connections can only generate errors on 1779 initiation due to port conflicts on the local system. 1781 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1782 Lite)) upon receiving ICMP notifications indicating failures in 1783 the network. 1785 Listen: LISTEN.UDP. Calling Listen for UDP binds a local port and 1786 prepares it to receive inbound UDP datagrams from peers. 1788 ConnectionReceived: UDP Listeners will deliver new connections once 1789 they have received traffic from a new Remote Endpoint. 1791 Clone: Calling Clone on a UDP Connection creates a new Connection 1792 with equivalent parameters. The two Connections are otherwise 1793 independent. 1795 Send: SEND.UDP(-Lite). Calling Send on a UDP connection sends the 1796 data as the payload of a complete UDP datagram. Marking Messages 1797 as Final does not change anything in the datagram's contents. 1798 Upon sending a UDP datagram, some relevant fields and flags in the 1799 IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in 1800 IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)). 1802 Receive: RECEIVE.UDP(-Lite). UDP only delivers complete Messages to 1803 Received, each of which represents a single datagram received in a 1804 UDP packet. Upon receiving a UDP datagram, the ECN flag from the 1805 IP header can be obtained (GET_ECN.UDP(-Lite)). 1807 Close: Calling Close on a UDP Connection (ABORT.UDP(-Lite)) releases 1808 the local port reservation. 1810 Abort: Calling Abort on a UDP Connection (ABORT.UDP(-Lite)) is 1811 identical to calling Close. 1813 10.4. UDP-Lite 1815 Connectedness: Connectionless 1817 Data Unit: Datagram 1819 The Transport Services API mappings for UDP-Lite are identical to 1820 UDP. Properties that require checksum coverage are not supported by 1821 UDP-Lite, such as "Corruption Protection Length", "Full Checksum 1822 Coverage on Sending", "Required Minimum Corruption Protection 1823 Coverage for Receiving", and "Full Checksum Coverage on Receiving". 1825 10.5. UDP Multicast Receive 1827 Connectedness: Connectionless 1829 Data Unit: Datagram 1831 API mappings for Receiving Multicast UDP are as follows: 1833 Connection Object: Established UDP Multicast Receive connections 1834 represent a pair of specific IP addresses and ports. The 1835 "unidirectional receive" transport property is required, and the 1836 Local Endpoint must be configured with a group IP address and a 1837 port. 1839 Initiate: Calling Initiate on a UDP Multicast Receive Connection 1840 causes an immediate InitiateError. This is an unsupported 1841 operation. 1843 InitiateWithSend: Calling InitiateWithSend on a UDP Multicast 1844 Receive Connection causes an immediate InitiateError. This is an 1845 unsupported operation. 1847 Ready: A UDP Multicast Receive Connection is ready once the system 1848 has received traffic for the appropriate group and port. 1850 InitiateError: UDP Multicast Receive Connections generate an 1851 InitiateError if Initiate is called. 1853 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1854 Lite)) upon receiving ICMP notifications indicating failures in 1855 the network. 1857 Listen: LISTEN.UDP. Calling Listen for UDP Multicast Receive binds 1858 a local port, prepares it to receive inbound UDP datagrams from 1859 peers, and issues a multicast host join. If a Remote Endpoint 1860 with an address is supplied, the join is Source-specific 1861 Multicast, and the path selection is based on the route to the 1862 Remote Endpoint. If a Remote Endpoint is not supplied, the join 1863 is Any-source Multicast, and the path selection is based on the 1864 outbound route to the group supplied in the Local Endpoint. 1866 There are cases where it is required to open multiple connections for 1867 the same address(es). For example, one Connection might be opened 1868 for a multicast group to for a multicast control bus, and another 1869 application later opens a separate Connection to the same group to 1870 send signals to and/or receive signals from the common bus. In such 1871 cases, the Transport Services system needs to explicitly enable re- 1872 use of the same set of addresses (equivalent to setting SO_REUSEADDR 1873 in the socket API). 1875 ConnectionReceived: UDP Multicast Receive Listeners will deliver new 1876 connections once they have received traffic from a new Remote 1877 Endpoint. 1879 Clone: Calling Clone on a UDP Multicast Receive Connection creates a 1880 new Connection with equivalent parameters. The two Connections 1881 are otherwise independent. 1883 Send: SEND.UDP(-Lite). Calling Send on a UDP Multicast Receive 1884 connection causes an immediate SendError. This is an unsupported 1885 operation. 1887 Receive: RECEIVE.UDP(-Lite). The Receive operation in a UDP 1888 Multicast Receive connection only delivers complete Messages to 1889 Received, each of which represents a single datagram received in a 1890 UDP packet. Upon receiving a UDP datagram, the ECN flag from the 1891 IP header can be obtained (GET_ECN.UDP(-Lite)). 1893 Close: Calling Close on a UDP Multicast Receive Connection 1894 (ABORT.UDP(-Lite)) releases the local port reservation and leaves 1895 the group. 1897 Abort: Calling Abort on a UDP Multicast Receive Connection 1898 (ABORT.UDP(-Lite)) is identical to calling Close. 1900 10.6. SCTP 1902 Connectedness: Connected 1904 Data Unit: Message 1906 API mappings for SCTP are as follows: 1908 Connection Object: Connection objects can be mapped to an SCTP 1909 association or a stream in an SCTP association. Mapping 1910 Connection objects to SCTP streams is called "stream mapping" and 1911 has additional requirements as follows. The following explanation 1912 assumes a client-server communication model. 1914 Stream mapping requires an association to already be in place between 1915 the client and the server, and it requires the server to understand 1916 that a new incoming stream should be represented as a new Connection 1917 Object by the Transport Services system. A new SCTP stream is 1918 created by sending an SCTP message with a new stream id. Thus, to 1919 implement stream mapping, the Transport Services API MUST provide a 1920 newly created Connection Object to the application upon the reception 1921 of such a message. The necessary semantics to implement a Transport 1922 Services system Close and Abort primitives are provided by the stream 1923 reconfiguration (reset) procedure described in [RFC6525]. This also 1924 allows to re-use a stream id after resetting ("closing") the stream. 1925 To implement this functionality, SCTP stream reconfiguration 1926 [RFC6525] MUST be supported by both the client and the server side. 1928 To avoid head-of-line blocking, stream mapping SHOULD only be 1929 implemented when both sides support message interleaving [RFC8260]. 1930 This allows a sender to schedule transmissions between multiple 1931 streams without risking that transmission of a large message on one 1932 stream might block transmissions on other streams for a long time. 1934 To avoid conflicts between stream ids, the following procedure is 1935 recommended: the first Connection, for which the SCTP association has 1936 been created, MUST always use stream id zero. All additional 1937 Connections are assigned to unused stream ids in growing order. To 1938 avoid a conflict when both endpoints map new Connections 1939 simultaneously, the peer which initiated association MUST use even 1940 stream ids whereas the remote side MUST map its Connections to odd 1941 stream ids. Both sides maintain a status map of the assigned stream 1942 ids. Generally, new streams SHOULD consume the lowest available 1943 (even or odd, depending on the side) stream id; this rule is relevant 1944 when lower ids become available because Connection objects associated 1945 with the streams are closed. 1947 SCTP stream mapping as described here has been implemented in a 1948 research prototype; a desription of this implementation is given in 1949 [NEAT-flow-mapping]. 1951 Initiate: If this is the only Connection object that is assigned to 1952 the SCTP Association or stream mapping is not used, CONNECT.SCTP 1953 is called. Else, unless the Selection Property 1954 activeReadBeforeSend is Preferred or Required, a new stream is 1955 used: if there are enough streams available, Initiate is a local 1956 operation that assigns a new stream id to the Connection object. 1957 The number of streams is negotiated as a parameter of the prior 1958 CONNECT.SCTP call, and it represents a trade-off between local 1959 resource usage and the number of Connection objects that can be 1960 mapped without requiring a reconfiguration signal. When running 1961 out of streams, ADD_STREAM.SCTP must be called. 1963 InitiateWithSend: If this is the only Connection object that is 1964 assigned to the SCTP association or stream mapping is not used, 1965 CONNECT.SCTP is called with the "user message" parameter. Else, a 1966 new stream is used (see Initiate for how to handle running out of 1967 streams), and this just sends the first message on a new stream. 1969 Ready: Initiate or InitiateWithSend returns without an error, i.e. 1970 SCTP's four-way handshake has completed. If an association with 1971 the peer already exists, stream mapping is used and enough streams 1972 are available, a Connection Object instantly becomes Ready after 1973 calling Initiate or InitiateWithSend. 1975 InitiateError: Failure of CONNECT.SCTP. 1977 ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. 1979 Listen: LISTEN.SCTP. If an association with the peer already exists 1980 and stream mapping is used, Listen just expects to receive a new 1981 message with a new stream id (chosen in accordance with the stream 1982 id assignment procedure described above). 1984 ConnectionReceived: LISTEN.SCTP returns without an error (a result 1985 of successful CONNECT.SCTP from the peer), or, in case of stream 1986 mapping, the first message has arrived on a new stream (in this 1987 case, Receive is also invoked). 1989 Clone: Calling Clone on an SCTP association creates a new Connection 1990 object and assigns it a new stream id in accordance with the 1991 stream id assignment procedure described above. If there are not 1992 enough streams available, ADD_STREAM.SCTP must be called. 1994 Priority (Connection): When this value is changed, or a Message with 1995 Message Property Priority is sent, and there are multiple 1996 Connection objects assigned to the same SCTP association, 1997 CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities 1998 of streams in the SCTP association. 2000 Send: SEND.SCTP. Message Properties such as Lifetime and Ordered 2001 map to parameters of this primitive. 2003 Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a 2004 ReceivedPartial event. 2006 Close: If this is the only Connection object that is assigned to the 2007 SCTP association, CLOSE.SCTP is called, and the Closed event will be 2008 delivered to the application upon the ensuing CLOSE-EVENT.SCTP. 2009 Else, the Connection object is one out of several Connection objects 2010 that are assigned to the same SCTP assocation, and RESET_STREAM.SCTP 2011 must be called, which informs the peer that the stream will no longer 2012 be used for mapping and can be used by future Initiate, 2013 InitiateWithSend or Listen calls. At the peer, the event 2014 RESET_STREAM-EVENT.SCTP will fire, which the peer must answer by 2015 issuing RESET_STREAM.SCTP too. The resulting local RESET_STREAM- 2016 EVENT.SCTP informs the Transport Services system that the stream id 2017 can now be re-used by the next Initiate, InitiateWithSend or Listen 2018 calls, and invokes a Closed event towards the application. 2020 Abort: If this is the only Connection object that is assigned to the 2021 SCTP association, ABORT.SCTP is called. Else, the Connection object 2022 is one out of several Connection objects that are assigned to the 2023 same SCTP assocation, and shutdown proceeds as described under Close. 2025 11. IANA Considerations 2027 RFC-EDITOR: Please remove this section before publication. 2029 This document has no actions for IANA. 2031 12. Security Considerations 2033 [I-D.ietf-taps-arch] outlines general security consideration and 2034 requirements for any system that implements the Transport Services 2035 archtecture. [I-D.ietf-taps-interface] provides further discussion 2036 on security and privacy implications of the Transport Services API. 2037 This document provides additional guidance on implementation 2038 specifics for the Transport Services API and as such the security 2039 considerations in both of these documents apply. The next two 2040 subsections discuss further considerations that are specific to 2041 mechanisms specified in this document. 2043 12.1. Considerations for Candidate Gathering 2045 Implementations should avoid downgrade attacks that allow network 2046 interference to cause the implementation to select less secure, or 2047 entirely insecure, combinations of paths and protocols. 2049 12.2. Considerations for Candidate Racing 2051 See Section 5.3 for security considerations around racing with 0-RTT 2052 data. 2054 An attacker that knows a particular device is racing several options 2055 during connection establishment may be able to block packets for the 2056 first connection attempt, thus inducing the device to fall back to a 2057 secondary attempt. This is a problem if the secondary attempts have 2058 worse security properties that enable further attacks. 2059 Implementations should ensure that all options have equivalent 2060 security properties to avoid incentivizing attacks. 2062 Since results from the network can determine how a connection attempt 2063 tree is built, such as when DNS returns a list of resolved endpoints, 2064 it is possible for the network to cause an implementation to consume 2065 significant on-device resources. Implementations should limit the 2066 maximum amount of state allowed for any given node, including the 2067 number of child nodes, especially when the state is based on results 2068 from the network. 2070 13. Acknowledgements 2072 This work has received funding from the European Union's Horizon 2020 2073 research and innovation programme under grant agreement No. 644334 2074 (NEAT) and No. 815178 (5GENESIS). 2076 This work has been supported by Leibniz Prize project funds of DFG - 2077 German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ 2078 FE 570/4-1). 2080 This work has been supported by the UK Engineering and Physical 2081 Sciences Research Council under grant EP/R04144X/1. 2083 This work has been supported by the Research Council of Norway under 2084 its "Toppforsk" programme through the "OCARINA" project. 2086 Thanks to Colin Perkins, Tom Jones, Karl-Johan Grinnemo, Gorry 2087 Fairhurst, for their contributions to the design of this 2088 specification. Thanks also to Stuart Cheshire, Josh Graessley, David 2089 Schinazi, and Eric Kinnear for their implementation and design 2090 efforts, including Happy Eyeballs, that heavily influenced this work. 2092 14. References 2094 14.1. Normative References 2096 [I-D.ietf-taps-arch] 2097 Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., and 2098 C. Perkins, "An Architecture for Transport Services", Work 2099 in Progress, Internet-Draft, draft-ietf-taps-arch-12, 3 2100 January 2022, . 2103 [I-D.ietf-taps-interface] 2104 Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., 2105 Kuehlewind, M., Perkins, C., Tiesel, P. S., Wood, C. A., 2106 Pauly, T., and K. Rose, "An Abstract Application Layer 2107 Interface to Transport Services", Work in Progress, 2108 Internet-Draft, draft-ietf-taps-interface-14, 3 January 2109 2022, . 2112 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 2113 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 2114 . 2116 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 2117 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 2118 DOI 10.17487/RFC7540, May 2015, 2119 . 2121 [RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of 2122 Transport Features Provided by IETF Transport Protocols", 2123 RFC 8303, DOI 10.17487/RFC8303, February 2018, 2124 . 2126 [RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the 2127 User Datagram Protocol (UDP) and Lightweight UDP (UDP- 2128 Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018, 2129 . 2131 [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: 2132 Better Connectivity Using Concurrency", RFC 8305, 2133 DOI 10.17487/RFC8305, December 2017, 2134 . 2136 [RFC8421] Martinsen, P., Reddy, T., and P. Patil, "Guidelines for 2137 Multihomed and IPv4/IPv6 Dual-Stack Interactive 2138 Connectivity Establishment (ICE)", BCP 217, RFC 8421, 2139 DOI 10.17487/RFC8421, July 2018, 2140 . 2142 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 2143 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 2144 . 2146 [RFC8923] Welzl, M. and S. Gjessing, "A Minimal Set of Transport 2147 Services for End Systems", RFC 8923, DOI 10.17487/RFC8923, 2148 October 2020, . 2150 14.2. Informative References 2152 [I-D.ietf-quic-transport] 2153 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 2154 and Secure Transport", Work in Progress, Internet-Draft, 2155 draft-ietf-quic-transport-34, 14 January 2021, 2156 . 2159 [I-D.ietf-tcpm-2140bis] 2160 Touch, J., Welzl, M., and S. Islam, "TCP Control Block 2161 Interdependence", Work in Progress, Internet-Draft, draft- 2162 ietf-tcpm-2140bis-11, 12 April 2021, 2163 . 2166 [NEAT-flow-mapping] 2167 "Transparent Flow Mapping for NEAT", IFIP NETWORKING 2017 2168 Workshop on Future of Internet Transport (FIT 2017) , 2169 2017. 2171 [RFC1928] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., and 2172 L. Jones, "SOCKS Protocol Version 5", RFC 1928, 2173 DOI 10.17487/RFC1928, March 1996, 2174 . 2176 [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", 2177 RFC 3124, DOI 10.17487/RFC3124, June 2001, 2178 . 2180 [RFC3207] Hoffman, P., "SMTP Service Extension for Secure SMTP over 2181 Transport Layer Security", RFC 3207, DOI 10.17487/RFC3207, 2182 February 2002, . 2184 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 2185 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 2186 DOI 10.17487/RFC5389, October 2008, 2187 . 2189 [RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using 2190 Relays around NAT (TURN): Relay Extensions to Session 2191 Traversal Utilities for NAT (STUN)", RFC 5766, 2192 DOI 10.17487/RFC5766, April 2010, 2193 . 2195 [RFC6525] Stewart, R., Tuexen, M., and P. Lei, "Stream Control 2196 Transmission Protocol (SCTP) Stream Reconfiguration", 2197 RFC 6525, DOI 10.17487/RFC6525, February 2012, 2198 . 2200 [RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762, 2201 DOI 10.17487/RFC6762, February 2013, 2202 . 2204 [RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service 2205 Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013, 2206 . 2208 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 2209 Protocol (HTTP/1.1): Message Syntax and Routing", 2210 RFC 7230, DOI 10.17487/RFC7230, June 2014, 2211 . 2213 [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services 2214 (Diffserv) and Real-Time Communication", RFC 7657, 2215 DOI 10.17487/RFC7657, November 2015, 2216 . 2218 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 2219 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 2220 March 2017, . 2222 [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, 2223 "Stream Schedulers and User Message Interleaving for the 2224 Stream Control Transmission Protocol", RFC 8260, 2225 DOI 10.17487/RFC8260, November 2017, 2226 . 2228 [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive 2229 Connectivity Establishment (ICE): A Protocol for Network 2230 Address Translator (NAT) Traversal", RFC 8445, 2231 DOI 10.17487/RFC8445, July 2018, 2232 . 2234 [TCP-COUPLING] 2235 "ctrlTCP: Reducing Latency through Coupled, Heterogeneous 2236 Multi-Flow TCP Congestion Control", IEEE INFOCOM Global 2237 Internet Symposium (GI) workshop (GI 2018) , n.d.. 2239 Appendix A. API Mapping Template 2241 Any protocol mapping for the Transport Services API should follow a 2242 common template. 2244 Connectedness: (Connectionless/Connected/Multiplexing Connected) 2246 Data Unit: (Byte-stream/Datagram/Message) 2248 Connection Object: 2250 Initiate: 2252 InitiateWithSend: 2254 Ready: 2256 InitiateError: 2258 ConnectionError: 2260 Listen: 2262 ConnectionReceived: 2264 Clone: 2266 Send: 2268 Receive: 2270 Close: 2272 Abort: 2274 Appendix B. Additional Properties 2276 This appendix discusses implementation considerations for additional 2277 parameters and properties that could be used to enhance transport 2278 protocol and/or path selection, or the transmission of messages given 2279 a Protocol Stack that implements them. These are not part of the 2280 interface, and may be removed from the final document, but are 2281 presented here to support discussion within the TAPS working group as 2282 to whether they should be added to a future revision of the base 2283 specification. 2285 B.1. Properties Affecting Sorting of Branches 2287 In addition to the Protocol and Path Selection Properties discussed 2288 in Section 4.1.3, the following properties under discussion can 2289 influence branch sorting: 2291 * Bounds on Send or Receive Rate: If the application indicates a 2292 bound on the expected Send or Receive bitrate, an implementation 2293 may prefer a path that can likely provide the desired bandwidth, 2294 based on cached maximum throughput, see Section 9.2. The 2295 application may know the Send or Receive Bitrate from metadata in 2296 adaptive HTTP streaming, such as MPEG-DASH. 2298 * Cost Preferences: If the application indicates a preference to 2299 avoid expensive paths, and some paths are associated with a 2300 monetary cost, an implementation should decrease the ranking of 2301 such paths. If the application indicates that it prohibits using 2302 expensive paths, paths that are associated with a cost should be 2303 purged from the decision tree. 2305 Appendix C. Reasons for errors 2307 The Transport Services API [I-D.ietf-taps-interface] allows for the 2308 several generic error types to specify a more detailed reason as to 2309 why an error occurred. This appendix lists some of the possible 2310 reasons. 2312 * InvalidConfiguration: The transport properties and endpoints 2313 provided by the application are either contradictory or 2314 incomplete. Examples include the lack of a Remote Endpoint on an 2315 active open or using a multicast group address while not 2316 requesting a unidirectional receive. 2318 * NoCandidates: The configuration is valid, but none of the 2319 available transport protocols can satisfy the transport properties 2320 provided by the application. 2322 * ResolutionFailed: The remote or local specifier provided by the 2323 application can not be resolved. 2325 * EstablishmentFailed: The Transport Services system was unable to 2326 establish a transport-layer connection to the Remote Endpoint 2327 specified by the application. 2329 * PolicyProhibited: The system policy prevents the transport system 2330 from performing the action requested by the application. 2332 * NotCloneable: The protocol stack is not capable of being cloned. 2334 * MessageTooLarge: The message size is too big for the transport 2335 system to handle. 2337 * ProtocolFailed: The underlying protocol stack failed. 2339 * InvalidMessageProperties: The message properties are either 2340 contradictory to the transport properties or they can not be 2341 satisfied by the transport system. 2343 * DeframingFailed: The data that was received by the underlying 2344 protocol stack could not be deframed. 2346 * ConnectionAborted: The connection was aborted by the peer. 2348 * Timeout: Delivery of a message was not possible after a timeout. 2350 Appendix D. Existing Implementations 2352 This appendix gives an overview of existing implementations, at the 2353 time of writing, of transport systems that are (to some degree) in 2354 line with this document. 2356 * Apple's Network.framework: 2358 - Network.framework is a transport-level API built for C, 2359 Objective-C, and Swift. It a connect-by-name API that supports 2360 transport security protocols. It provides userspace 2361 implementations of TCP, UDP, TLS, DTLS, proxy protocols, and 2362 allows extension via custom framers. 2364 - Documentation: https://developer.apple.com/documentation/ 2365 network (https://developer.apple.com/documentation/network) 2367 * NEAT and NEATPy: 2369 - NEAT is the output of the European H2020 research project 2370 "NEAT"; it is a user-space library for protocol-independent 2371 communication on top of TCP, UDP and SCTP, with many more 2372 features such as a policy manager. 2374 - Code: https://github.com/NEAT-project/neat (https://github.com/ 2375 NEAT-project/neat) 2377 - NEAT project: https://www.neat-project.org (https://www.neat- 2378 project.org) 2380 - NEATPy is a Python shim over NEAT which updates the NEAT API to 2381 be in line with version 6 of the Transport Services API draft. 2383 - Code: https://github.com/theagilepadawan/NEATPy 2384 (https://github.com/theagilepadawan/NEATPy) 2386 * PyTAPS: 2388 - A TAPS implementation based on Python asyncio, offering 2389 protocol-independent communication to applications on top of 2390 TCP, UDP and TLS, with support for multicast. 2392 - Code: https://github.com/fg-inet/python-asyncio-taps 2393 (https://github.com/fg-inet/python-asyncio-taps) 2395 Authors' Addresses 2396 Anna Brunstrom (editor) 2397 Karlstad University 2398 Universitetsgatan 2 2399 651 88 Karlstad 2400 Sweden 2402 Email: anna.brunstrom@kau.se 2404 Tommy Pauly (editor) 2405 Apple Inc. 2406 One Apple Park Way 2407 Cupertino, California 95014, 2408 United States of America 2410 Email: tpauly@apple.com 2412 Theresa Enghardt 2413 Netflix 2414 121 Albright Way 2415 Los Gatos, CA 95032, 2416 United States of America 2418 Email: ietf@tenghardt.net 2420 Philipp S. Tiesel 2421 SAP SE 2422 Konrad-Zuse-Ring 10 2423 14469 Potsdam 2424 Germany 2426 Email: philipp@tiesel.net 2428 Michael Welzl 2429 University of Oslo 2430 PO Box 1080 Blindern 2431 0316 Oslo 2432 Norway 2434 Email: michawe@ifi.uio.no