idnits 2.17.1 draft-ietf-taps-impl-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 41 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1239: '... Implementations SHOULD ensure that th...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (13 July 2020) is 1380 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-19) exists of draft-ietf-taps-arch-07 == Outdated reference: A later version (-26) exists of draft-ietf-taps-interface-06 ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-29 == Outdated reference: A later version (-11) exists of draft-ietf-tcpm-2140bis-05 -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) -- Obsolete informational reference (is this intentional?): RFC 5766 (Obsoleted by RFC 8656) Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TAPS Working Group A. Brunstrom, Ed. 3 Internet-Draft Karlstad University 4 Intended status: Informational T. Pauly, Ed. 5 Expires: 14 January 2021 Apple Inc. 6 T. Enghardt 7 Netflix 8 K-J. Grinnemo 9 Karlstad University 10 T. Jones 11 University of Aberdeen 12 P. Tiesel 13 TU Berlin 14 C. Perkins 15 University of Glasgow 16 M. Welzl 17 University of Oslo 18 13 July 2020 20 Implementing Interfaces to Transport Services 21 draft-ietf-taps-impl-07 23 Abstract 25 The Transport Services (TAPS) system enables applications to use 26 transport protocols flexibly for network communication and defines a 27 protocol-independent TAPS Application Programming Interface (API) 28 that is based on an asynchronous, event-driven interaction pattern. 29 This document serves as a guide to implementation on how to build 30 such a system. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on 14 January 2021. 49 Copyright Notice 51 Copyright (c) 2020 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 56 license-info) in effect on the date of publication of this document. 57 Please review these documents carefully, as they describe your rights 58 and restrictions with respect to this document. Code Components 59 extracted from this document must include Simplified BSD License text 60 as described in Section 4.e of the Trust Legal Provisions and are 61 provided without warranty as described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 67 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 68 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 69 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 70 4. Implementing Connection Establishment . . . . . . . . . . . . 7 71 4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 8 72 4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 8 73 4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 74 4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 11 75 4.1.4. Branching Order-of-Operations . . . . . . . . . . . . 13 76 4.1.5. Sorting Branches . . . . . . . . . . . . . . . . . . 14 77 4.2. Candidate Racing . . . . . . . . . . . . . . . . . . . . 16 78 4.2.1. Immediate . . . . . . . . . . . . . . . . . . . . . . 16 79 4.2.2. Delayed . . . . . . . . . . . . . . . . . . . . . . . 17 80 4.2.3. Failover . . . . . . . . . . . . . . . . . . . . . . 17 81 4.3. Completing Establishment . . . . . . . . . . . . . . . . 18 82 4.3.1. Determining Successful Establishment . . . . . . . . 19 83 4.4. Establishing multiplexed connections . . . . . . . . . . 19 84 4.5. Handling racing with "unconnected" protocols . . . . . . 20 85 4.6. Implementing listeners . . . . . . . . . . . . . . . . . 20 86 4.6.1. Implementing listeners for Connected Protocols . . . 21 87 4.6.2. Implementing listeners for Unconnected Protocols . . 21 88 4.6.3. Implementing listeners for Multiplexed Protocols . . 21 89 5. Implementing Sending and Receiving Data . . . . . . . . . . . 21 90 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 22 91 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 22 92 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 23 93 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 24 94 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 24 95 5.3. Handling of data for fast-open protocols . . . . . . . . 24 96 6. Implementing Message Framers . . . . . . . . . . . . . . . . 25 97 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 26 98 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 27 99 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 27 100 7. Implementing Connection Management . . . . . . . . . . . . . 28 101 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 29 102 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 29 103 8. Implementing Connection Termination . . . . . . . . . . . . . 30 104 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 31 105 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 31 106 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 32 107 10. Specific Transport Protocol Considerations . . . . . . . . . 33 108 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 109 10.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 35 110 10.3. UDP Multicast Receive . . . . . . . . . . . . . . . . . 37 111 10.4. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . 38 112 10.5. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 40 113 10.6. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 40 114 10.7. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 41 115 10.8. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 42 116 10.9. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 42 117 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 118 12. Security Considerations . . . . . . . . . . . . . . . . . . . 45 119 12.1. Considerations for Candidate Gathering . . . . . . . . . 45 120 12.2. Considerations for Candidate Racing . . . . . . . . . . 45 121 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 122 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 46 123 14.1. Normative References . . . . . . . . . . . . . . . . . . 46 124 14.2. Informative References . . . . . . . . . . . . . . . . . 47 125 Appendix A. Additional Properties . . . . . . . . . . . . . . . 48 126 A.1. Properties Affecting Sorting of Branches . . . . . . . . 48 127 Appendix B. Reasons for errors . . . . . . . . . . . . . . . . . 49 128 Appendix C. Existing Implementations . . . . . . . . . . . . . . 50 129 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 51 131 1. Introduction 133 The Transport Services architecture [I-D.ietf-taps-arch] defines a 134 system that allows applications to use transport networking protocols 135 flexibly. The interface such a system exposes to applications is 136 defined as the Transport Services API [I-D.ietf-taps-interface]. 137 This API is designed to be generic across multiple transport 138 protocols and sets of protocols features. 140 This document serves as a guide to implementation on how to build a 141 system that provides a Transport Services API. It is the job of an 142 implementation of a Transport Services system to turn the requests of 143 an application into decisions on how to establish connections, and 144 how to transfer data over those connections once established. The 145 terminology used in this document is based on the Architecture 146 [I-D.ietf-taps-arch]. 148 2. Implementing Connection Objects 150 The connection objects that are exposed to applications for Transport 151 Services are: 153 * the Preconnection, the bundle of Properties that describes the 154 application constraints on the transport; 156 * the Connection, the basic object that represents a flow of data as 157 Messages in either direction between the Local and Remote 158 Endpoints; 160 * and the Listener, a passive waiting object that delivers new 161 Connections. 163 Preconnection objects should be implemented as bundles of properties 164 that an application can both read and write. Once a Preconnection 165 has been used to create an outbound Connection or a Listener, the 166 implementation should ensure that the copy of the properties held by 167 the Connection or Listener is immutable. This may involve performing 168 a deep-copy if the application is still able to modify properties on 169 the original Preconnection object. 171 Connection objects represent the interface between the application 172 and the implementation to manage transport state, and conduct data 173 transfer. During the process of establishment (Section 4), the 174 Connection will be unbound to a specific transport flow, since there 175 may be multiple candidate Protocol Stacks being raced. Once the 176 Connection is established, the object should be considered mapped to 177 a specific Protocol Stack. The notion of a Connection maps to many 178 different protocols, depending on the Protocol Stack. For example, 179 the Connection may ultimately represent the interface into a TCP 180 connection, a TLS session over TCP, a UDP flow with fully-specified 181 local and remote endpoints, a DTLS session, a SCTP stream, a QUIC 182 stream, or an HTTP/2 stream. 184 Listener objects are created with a Preconnection, at which point 185 their configuration should be considered immutable by the 186 implementation. The process of listening is described in 187 Section 4.6. 189 3. Implementing Pre-Establishment 191 During pre-establishment the application specifies the Endpoints to 192 be used for communication as well as its preferences via Selection 193 Properties and, if desired, also Connection Properties. Generally, 194 Connection Properties should be configured as early as possible, 195 because they can serve as input to decisions that are made by the 196 implementation (e.g., the Capacity Profile can guide usage of a 197 protocol offering scavenger-type congestion control). 199 The implementation stores these properties as a part of the 200 Preconnection object for use during connection establishment. For 201 Selection Properties that are not provided by the application, the 202 implementation must use the default values specified in the Transport 203 Services API ([I-D.ietf-taps-interface]). 205 3.1. Configuration-time errors 207 The transport system should have a list of supported protocols 208 available, which each have transport features reflecting the 209 capabilities of the protocol. Once an application specifies its 210 Transport Properties, the transport system matches the required and 211 prohibited properties against the transport features of the available 212 protocols. 214 In the following cases, failure should be detected during pre- 215 establishment: 217 * A request by an application for Protocol Properties that include 218 requirements or prohibitions that cannot be satisfied by any of 219 the available protocols. For example, if an application requires 220 "Configure Reliability per Message", but no such protocol is 221 available on the host running the transport system this should 222 result in an error, e.g., when SCTP is not supported by the 223 operating system. 225 * A request by an application for Protocol Properties that are in 226 conflict with each other, i.e., the required and prohibited 227 properties cannot be satisfied by the same protocol. For example, 228 if an application prohibits "Reliable Data Transfer" but then 229 requires "Configure Reliability per Message", this mismatch should 230 result in an error. 232 To avoid allocating resources, it is important that such cases fail 233 as early as possible, e.g., to endpoint resolution, only to find out 234 later that there is no protocol that satisfies the requirements. 236 3.2. Role of system policy 238 The properties specified during pre-establishment have a close 239 relationship to system policy. The implementation is responsible for 240 combining and reconciling several different sources of preferences 241 when establishing Connections. These include, but are not limited 242 to: 244 1. Application preferences, i.e., preferences specified during the 245 pre-establishment via Selection Properties. 247 2. Dynamic system policy, i.e., policy compiled from internally and 248 externally acquired information about available network 249 interfaces, supported transport protocols, and current/previous 250 Connections. Examples of ways to externally retrieve policy- 251 support information are through OS-specific statistics/ 252 measurement tools and tools that reside on middleboxes and 253 routers. 255 3. Default implementation policy, i.e., predefined policy by OS or 256 application. 258 In general, any protocol or path used for a connection must conform 259 to all three sources of constraints. A violation of any of the 260 layers should cause a protocol or path to be considered ineligible 261 for use. For an example of application preferences leading to 262 constraints, an application may prohibit the use of metered network 263 interfaces for a given Connection to avoid user cost. Similarly, the 264 system policy at a given time may prohibit the use of such a metered 265 network interface from the application's process. Lastly, the 266 implementation itself may default to disallowing certain network 267 interfaces unless explicitly requested by the application and allowed 268 by the system. 270 It is expected that the database of system policies and the method of 271 looking up these policies will vary across various platforms. An 272 implementation should attempt to look up the relevant policies for 273 the system in a dynamic way to make sure it is reflecting an accurate 274 version of the system policy, since the system's policy regarding the 275 application's traffic may change over time due to user or 276 administrative changes. 278 4. Implementing Connection Establishment 280 The process of establishing a network connection begins when an 281 application expresses intent to communicate with a remote endpoint by 282 calling Initiate. (At this point, any constraints or requirements 283 the application may have on the connection are available from pre- 284 establishment.) The process can be considered complete once there is 285 at least one Protocol Stack that has completed any required setup to 286 the point that it can transmit and receive the application's data. 288 Connection establishment is divided into two top-level steps: 289 Candidate Gathering, to identify the paths, protocols, and endpoints 290 to use, and Candidate Racing, in which the necessary protocol 291 handshakes are conducted so that the transport system can select 292 which set to use. This document structures candidates for racing as 293 a tree. 295 The most simple example of this process might involve identifying the 296 single IP address to which the implementation wishes to connect, 297 using the system's current default interface or path, and starting a 298 TCP handshake to establish a stream to the specified IP address. 299 However, each step may also vary depending on the requirements of the 300 connection: if the endpoint is defined as a hostname and port, then 301 there may be multiple resolved addresses that are available; there 302 may also be multiple interfaces or paths available, other than the 303 default system interface; and some protocols may not need any 304 transport handshake to be considered "established" (such as UDP), 305 while other connections may utilize layered protocol handshakes, such 306 as TLS over TCP. 308 Whenever an implementation has multiple options for connection 309 establishment, it can view the set of all individual connection 310 establishment options as a single, aggregate connection 311 establishment. The aggregate set conceptually includes every valid 312 combination of endpoints, paths, and protocols. As an example, 313 consider an implementation that initiates a TCP connection to a 314 hostname + port endpoint, and has two valid interfaces available (Wi- 315 Fi and LTE). The hostname resolves to a single IPv4 address on the 316 Wi-Fi network, and resolves to the same IPv4 address on the LTE 317 network, as well as a single IPv6 address. The aggregate set of 318 connection establishment options can be viewed as follows: 320 Aggregate [Endpoint: www.example.com:80] [Interface: Any] [Protocol: TCP] 321 |-> [Endpoint: 192.0.2.1:80] [Interface: Wi-Fi] [Protocol: TCP] 322 |-> [Endpoint: 192.0.2.1:80] [Interface: LTE] [Protocol: TCP] 323 |-> [Endpoint: 2001:DB8::1.80] [Interface: LTE] [Protocol: TCP] 324 Any one of these sub-entries on the aggregate connection attempt 325 would satisfy the original application intent. The concern of this 326 section is the algorithm defining which of these options to try, 327 when, and in what order. 329 During Candidate Gathering, an implementation first excludes all 330 protocols and paths that match a Prohibit or do not match all Require 331 properties. Then, the implementation will sort branches according to 332 Preferred properties, Avoided properties, and possibly other 333 criteria. 335 4.1. Candidate Gathering 337 The step of gathering candidates involves identifying which paths, 338 protocols, and endpoints may be used for a given Connection. This 339 list is determined by the requirements, prohibitions, and preferences 340 of the application as specified in the Selection Properties. 342 4.1.1. Gathering Endpoint Candidates 344 Both Local and Remote Endpoint Candidates must be discovered during 345 connection establishment. To support Interactive Connectivity 346 Establishment (ICE) [RFC8445], or similar protocols, that involve 347 out-of-band indirect signalling to exchange candidates with the 348 Remote Endpoint, it's important to be able to query the set of 349 candidate Local Endpoints, and give the protocol stack a set of 350 candidate Remote Endpoints, before it attempts to establish 351 connections. 353 4.1.1.1. Local Endpoint candidates 355 The set of possible Local Endpoints is gathered. In the simple case, 356 this merely enumerates the local interfaces and protocols, allocates 357 ephemeral source ports. For example, a system that has WiFi and 358 Ethernet and supports IPv4 and IPv6 might gather four candidate 359 locals (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 on WiFi, and IPv6 on 360 WiFi) that can form the source for a transient. 362 If NAT traversal is required, the process of gathering Local 363 Endpoints becomes broadly equivalent to the ICE candidate gathering 364 phase (see Section 5.1.1. of [RFC8445]). The endpoint determines its 365 server reflexive Local Endpoints (i.e., the translated address of a 366 local, on the other side of a NAT, e.g via a STUN sever [RFC5389]) 367 and relayed locals (e.g., via a TURN server [RFC5766] or other 368 relay), for each interface and network protocol. These are added to 369 the set of candidate Local Endpoints for this connection. 371 Gathering Local Endpoints is primarily a local operation, although it 372 might involve exchanges with a STUN server to derive server reflexive 373 locals, or with a TURN server or other relay to derive relayed 374 locals. However, it does not involve communication with the Remote 375 Endpoint. 377 4.1.1.2. Remote Endpoint Candidates 379 The Remote Endpoint is typically a name that needs to be resolved 380 into a set of possible addresses that can be used for communication. 381 Resolving the Remote Endpoint is the process of recursively 382 performing such name lookups, until fully resolved, to return the set 383 of candidates for the remote of this connection. 385 How this is done will depend on the type of the Remote Endpoint, and 386 can also be specific to each Local Endpoint. A common case is when 387 the Remote Endpoint is a DNS name, in which case it is resolved to 388 give a set of IPv4 and IPv6 addresses representing that name. Some 389 types of remote might require more complex resolution. Resolving the 390 Remote Endpoint for a peer-to-peer connection might involve 391 communication with a rendezvous server, which in turn contacts the 392 peer to gain consent to communicate and retrieve its set of candidate 393 locals, which are returned and form the candidate remote addresses 394 for contacting that peer. 396 Resolving the remote is not a local operation. It will involve a 397 directory service, and can require communication with the remote to 398 rendezvous and exchange peer addresses. This can expose some or all 399 of the candidate locals to the remote. 401 4.1.2. Structuring Options as a Tree 403 When an implementation responsible for connection establishment needs 404 to consider multiple options, it should logically structure these 405 options as a hierarchical tree. Each leaf node of the tree 406 represents a single, coherent connection attempt, with an Endpoint, a 407 Path, and a set of protocols that can directly negotiate and send 408 data on the network. Each node in the tree that is not a leaf 409 represents a connection attempt that is either underspecified, or 410 else includes multiple distinct options. For example, when 411 connecting on an IP network, a connection attempt to a hostname and 412 port is underspecified, because the connection attempt requires a 413 resolved IP address as its remote endpoint. In this case, the node 414 represented by the connection attempt to the hostname is a parent 415 node, with child nodes for each IP address. Similarly, an 416 implementation that is allowed to connect using multiple interfaces 417 will have a parent node of the tree for the decision between the 418 paths, with a branch for each interface. 420 The example aggregate connection attempt above can be drawn as a tree 421 by grouping the addresses resolved on the same interface into 422 branches: 424 || 425 +==========================+ 426 | www.example.com:80/Any | 427 +==========================+ 428 // \\ 429 +==========================+ +==========================+ 430 | www.example.com:80/Wi-Fi | | www.example.com:80/LTE | 431 +==========================+ +==========================+ 432 || // \\ 433 +====================+ +====================+ +======================+ 434 | 192.0.2.1:80/Wi-Fi | | 192.0.2.1:80/LTE | | 2001:DB8::1.80/LTE | 435 +====================+ +====================+ +======================+ 437 The rest of this section will use a notation scheme to represent this 438 tree. The parent (or trunk) node of the tree will be represented by 439 a single integer, such as "1". Each child of that node will have an 440 integer that identifies it, from 1 to the number of children. That 441 child node will be uniquely identified by concatenating its integer 442 to it's parents identifier with a dot in between, such as "1.1" and 443 "1.2". Each node will be summarized by a tuple of three elements: 444 Endpoint, Path, and Protocol. The above example can now be written 445 more succinctly as: 447 1 [www.example.com:80, Any, TCP] 448 1.1 [www.example.com:80, Wi-Fi, TCP] 449 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 450 1.2 [www.example.com:80, LTE, TCP] 451 1.2.1 [192.0.2.1:80, LTE, TCP] 452 1.2.2 [2001:DB8::1.80, LTE, TCP] 454 When an implementation views this aggregate set of connection 455 attempts as a single connection establishment, it only will use one 456 of the leaf nodes to transfer data. Thus, when a single leaf node 457 becomes ready to use, then the entire connection attempt is ready to 458 use by the application. Another way to represent this is that every 459 leaf node updates the state of its parent node when it becomes ready, 460 until the trunk node of the tree is ready, which then notifies the 461 application that the connection as a whole is ready to use. 463 A connection establishment tree may be degenerate, and only have a 464 single leaf node, such as a connection attempt to an IP address over 465 a single interface with a single protocol. 467 1 [192.0.2.1:80, Wi-Fi, TCP] 468 A parent node may also only have one child (or leaf) node, such as a 469 when a hostname resolves to only a single IP address. 471 1 [www.example.com:80, Wi-Fi, TCP] 472 1.1 [192.0.2.1:80, Wi-Fi, TCP] 474 4.1.3. Branch Types 476 There are three types of branching from a parent node into one or 477 more child nodes. Any parent node of the tree must only use one type 478 of branching. 480 4.1.3.1. Derived Endpoints 482 If a connection originally targets a single endpoint, there may be 483 multiple endpoints of different types that can be derived from the 484 original. The connection library creates an ordered list of the 485 derived endpoints according to application preference, system policy 486 and expected performance. 488 DNS hostname-to-address resolution is the most common method of 489 endpoint derivation. When trying to connect to a hostname endpoint 490 on a traditional IP network, the implementation should send DNS 491 queries for both A (IPv4) and AAAA (IPv6) records if both are 492 supported on the local link. The algorithm for ordering and racing 493 these addresses should follow the recommendations in Happy Eyeballs 494 [RFC8305]. 496 1 [www.example.com:80, Wi-Fi, TCP] 497 1.1 [2001:DB8::1.80, Wi-Fi, TCP] 498 1.2 [192.0.2.1:80, Wi-Fi, TCP] 499 1.3 [2001:DB8::2.80, Wi-Fi, TCP] 500 1.4 [2001:DB8::3.80, Wi-Fi, TCP] 502 DNS-Based Service Discovery [RFC6763] can also provide an endpoint 503 derivation step. When trying to connect to a named service, the 504 client may discover one or more hostname and port pairs on the local 505 network using multicast DNS [RFC6762]. These hostnames should each 506 be treated as a branch that can be attempted independently from other 507 hostnames. Each of these hostnames might resolve to one or more 508 addresses, which would create multiple layers of branching. 510 1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP] 511 1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP] 512 1.1.1 [31.133.160.18.631, Wi-Fi, TCP] 514 4.1.3.2. Alternate Paths 516 If a client has multiple network interfaces available to it, e.g., a 517 mobile client with both Wi-Fi and Cellular connectivity, it can 518 attempt a connection over any of the interfaces. This represents a 519 branch point in the connection establishment. Similar to a derived 520 endpoint, the interfaces should be ranked based on preference, system 521 policy, and performance. Attempts should be started on one 522 interface, and then on other interfaces successively after delays 523 based on expected round-trip-time or other available metrics. 525 1 [192.0.2.1:80, Any, TCP] 526 1.1 [192.0.2.1:80, Wi-Fi, TCP] 527 1.2 [192.0.2.1:80, LTE, TCP] 529 This same approach applies to any situation in which the client is 530 aware of multiple links or views of the network. Multiple Paths, 531 each with a coherent set of addresses, routes, DNS server, and more, 532 may share a single interface. A path may also represent a virtual 533 interface service such as a Virtual Private Network (VPN). 535 The list of available paths should be constrained by any requirements 536 or prohibitions the application sets, as well as system policy. 538 4.1.3.3. Protocol Options 540 Differences in possible protocol compositions and options can also 541 provide a branching point in connection establishment. This allows 542 clients to be resilient to situations in which a certain protocol is 543 not functioning on a server or network. 545 This approach is commonly used for connections with optional proxy 546 server configurations. A single connection might have several 547 options available: an HTTP-based proxy, a SOCKS-based proxy, or no 548 proxy. These options should be ranked and attempted in succession. 550 1 [www.example.com:80, Any, HTTP/TCP] 551 1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP] 552 1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP] 553 1.3 [www.example.com:80, Any, HTTP/TCP] 554 1.3.1 [192.0.2.1:80, Any, HTTP/TCP] 556 This approach also allows a client to attempt different sets of 557 application and transport protocols that, when available, could 558 provide preferable features. For example, the protocol options could 559 involve QUIC [I-D.ietf-quic-transport] over UDP on one branch, and 560 HTTP/2 [RFC7540] over TLS over TCP on the other: 562 1 [www.example.com:443, Any, Any HTTP] 563 1.1 [www.example.com:443, Any, QUIC/UDP] 564 1.1.1 [192.0.2.1:443, Any, QUIC/UDP] 565 1.2 [www.example.com:443, Any, HTTP2/TLS/TCP] 566 1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP] 568 Another example is racing SCTP with TCP: 570 1 [www.example.com:80, Any, Any Stream] 571 1.1 [www.example.com:80, Any, SCTP] 572 1.1.1 [192.0.2.1:80, Any, SCTP] 573 1.2 [www.example.com:80, Any, TCP] 574 1.2.1 [192.0.2.1:80, Any, TCP] 576 Implementations that support racing protocols and protocol options 577 should maintain a history of which protocols and protocol options 578 successfully established, on a per-network and per-endpoint basis 579 (see Section 9.2). This information can influence future racing 580 decisions to prioritize or prune branches. 582 4.1.4. Branching Order-of-Operations 584 Branch types must occur in a specific order relative to one another 585 to avoid creating leaf nodes with invalid or incompatible settings. 586 In the example above, it would be invalid to branch for derived 587 endpoints (the DNS results for www.example.com) before branching 588 between interface paths, since there are situations when the results 589 will be different across networks due to private names or different 590 supported IP versions. Implementations must be careful to branch in 591 an order that results in usable leaf nodes whenever there are 592 multiple branch types that could be used from a single node. 594 The order of operations for branching, where lower numbers are acted 595 upon first, should be: 597 1. Alternate Paths 599 2. Protocol Options 601 3. Derived Endpoints 603 Branching between paths is the first in the list because results 604 across multiple interfaces are likely not related to one another: 605 endpoint resolution may return different results, especially when 606 using locally resolved host and service names, and which protocols 607 are supported and preferred may differ across interfaces. Thus, if 608 multiple paths are attempted, the overall connection can be seen as a 609 race between the available paths or interfaces. 611 Protocol options are next checked in order. Whether or not a set of 612 protocol, or protocol-specific options, can successfully connect is 613 generally not dependent on which specific IP address is used. 614 Furthermore, the protocol stacks being attempted may influence or 615 altogether change the endpoints being used. Adding a proxy to a 616 connection's branch will change the endpoint to the proxy's IP 617 address or hostname. Choosing an alternate protocol may also modify 618 the ports that should be selected. 620 Branching for derived endpoints is the final step, and may have 621 multiple layers of derivation or resolution, such as DNS service 622 resolution and DNS hostname resolution. 624 For example, if the application has indicated both a preference for 625 WiFi over LTE and for a feature only available in SCTP, branches will 626 be first sorted accord to path selection, with WiFi at the top. 627 Then, branches with SCTP will be sorted to the top within their 628 subtree according to the properties influencing protocol selection. 629 However, if the implementation has current cache information that 630 SCTP is not available on the path over WiFi, there is no SCTP node in 631 the WiFi subtree. Here, the path over WiFi will be tried first, and, 632 if connection establishment succeeds, TCP will be used. So the 633 Selection Property of preferring WiFi takes precedence over the 634 Property that led to a preference for SCTP. 636 1. [www.example.com:80, Any, Any Stream] 637 1.1 [192.0.2.1:80, Wi-Fi, Any Stream] 638 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 639 1.2 [192.0.3.1:80, LTE, Any Stream] 640 1.2.1 [192.0.3.1:80, LTE, SCTP] 641 1.2.2 [192.0.3.1:80, LTE, TCP] 643 4.1.5. Sorting Branches 645 Implementations should sort the branches of the tree of connection 646 options in order of their preference rank, from most preferred to 647 least preferred. Leaf nodes on branches with higher rankings 648 represent connection attempts that will be raced first. 649 Implementations should order the branches to reflect the preferences 650 expressed by the application for its new connection, including 651 Selection Properties, which are specified in 652 [I-D.ietf-taps-interface]. 654 In addition to the properties provided by the application, an 655 implementation may include additional criteria such as cached 656 performance estimates, see Section 9.2, or system policy, see 657 Section 3.2, in the ranking. Two examples of how Selection and 658 Connection Properties may be used to sort branches are provided 659 below: 661 * "Interface Instance or Type": If the application specifies an 662 interface type to be preferred or avoided, implementations should 663 accordingly rank the paths. If the application specifies an 664 interface type to be required or prohibited, an implementation is 665 expeceted to not include the non-conforming paths. 667 * "Capacity Profile": An implementation can use the Capacity Profile 668 to prefer paths that match an application's expected traffic 669 pattern. This match will use cached performance estimates, see 670 Section 9.2: 672 - Scavenger: Prefer paths with the highest expected available 673 capacity, based on the observed maximum throughput; 675 - Low Latency/Interactive: Prefer paths with the lowest expected 676 Round Trip Time, based on observed round trip time estimates; 678 - Constant-Rate Streaming: Prefer paths that can are expected to 679 satisy the requested Stream Send or Stream Receive Bitrate, 680 based on the observed maximum throughput. 682 Implementations process the Properties in the following order: 683 Prohibit, Require, Prefer, Avoid. If Selection Properties contain 684 any prohibited properties, the implementation should first purge 685 branches containing nodes with these properties. For required 686 properties, it should only keep branches that satisfy these 687 requirements. Finally, it should order the branches according to the 688 preferred properties, and finally use any avoided properties as a 689 tiebreaker. When ordering branches, an implementation can give more 690 weight to properties that the application has explicitly set, than to 691 the properties that are default. 693 The available protocols and paths on a specific system and in a 694 specific context can change; therefore, the result of sorting and the 695 outcome of racing may vary, even when using the same Selection and 696 Connection Properties. However, an implementation ought to provide a 697 consistent outcome to applications, e.g., by preferring protocols and 698 paths that are already used by existing Connections that specified 699 similar Properties. 701 4.2. Candidate Racing 703 The primary goal of the Candidate Racing process is to successfully 704 negotiate a protocol stack to an endpoint over an interface--to 705 connect a single leaf node of the tree--with as little delay and as 706 few unnecessary connections attempts as possible. Optimizing these 707 two factors improves the user experience, while minimizing network 708 load. 710 This section covers the dynamic aspect of connection establishment. 711 The tree described above is a useful conceptual and architectural 712 model. However, an implementation is unable to know the full tree 713 before it is formed and many of the possible branches ultimately 714 might not be used. 716 There are three different approaches to racing the attempts for 717 different nodes of the connection establishment tree: 719 1. Immediate 721 2. Delayed 723 3. Failover 725 Each approach is appropriate in different use-cases and branch types. 726 However, to avoid consuming unnecessary network resources, 727 implementations should not use immediate racing as a default 728 approach. 730 The timing algorithms for racing should remain independent across 731 branches of the tree. Any timers or racing logic is isolated to a 732 given parent node, and is not ordered precisely with regards to other 733 children of other nodes. 735 4.2.1. Immediate 737 Immediate racing is when multiple alternate branches are started 738 without waiting for any one branch to make progress before starting 739 the next alternative. This means the attempts are effectively 740 simultaneous. Immediate racing should be avoided by implementations, 741 since it consumes extra network resources and establishes state that 742 might not be used. 744 4.2.2. Delayed 746 Delayed racing can be used whenever a single node of the tree has 747 multiple child nodes. Based on the order determined when building 748 the tree, the first child node will be initiated immediately, 749 followed by the next child node after some delay. Once that second 750 child node is initiated, the third child node (if present) will begin 751 after another delay, and so on until all child nodes have been 752 initiated, or one of the child nodes successfully completes its 753 negotiation. 755 Delayed racing attempts occur in parallel. Implementations should 756 not terminate an earlier child connection attempt upon starting a 757 secondary child. 759 The delay between starting child nodes should be based on the 760 properties of the previously started child node. For example, if the 761 first child represents an IP address with a known route, and the 762 second child represents another IP address, the delay between 763 starting the first and second IP addresses can be based on the 764 expected retransmission cadence for the first child's connection 765 (derived from historical round-trip-time). Alternatively, if the 766 first child represents a branch on a Wi-Fi interface, and the second 767 child represents a branch on an LTE interface, the delay should be 768 based on the expected time in which the branch for the first 769 interface would be able to establish a connection, based on link 770 quality and historical round-trip-time. 772 Any delay should have a defined minimum and maximum value based on 773 the branch type. Generally, branches between paths and protocols 774 should have longer delays than branches between derived endpoints. 775 The maximum delay should be considered with regards to how long a 776 user is expected to wait for the connection to complete. 778 If a child node fails to connect before the delay timer has fired for 779 the next child, the next child should be started immediately. 781 4.2.3. Failover 783 If an implementation or application has a strong preference for one 784 branch over another, the branching node may choose to wait until one 785 child has failed before starting the next. Failure of a leaf node is 786 determined by its protocol negotiation failing or timing out; failure 787 of a parent branching node is determined by all of its children 788 failing. 790 An example in which failover is recommended is a race between a 791 protocol stack that uses a proxy and a protocol stack that bypasses 792 the proxy. Failover is useful in case the proxy is down or 793 misconfigured, but any more aggressive type of racing may end up 794 unnecessarily avoiding a proxy that was preferred by policy. 796 4.3. Completing Establishment 798 The process of connection establishment completes when one leaf node 799 of the tree has completed negotiation with the remote endpoint 800 successfully, or else all nodes of the tree have failed to connect. 801 The first leaf node to complete its connection is then used by the 802 application to send and receive data. 804 Successes and failures of a given attempt should be reported up to 805 parent nodes (towards the trunk of the tree). For example, in the 806 following case, if 1.1.1 fails to connect, it reports the failure to 807 1.1. Since 1.1 has no other child nodes, it also has failed and 808 reports that failure to 1. Because 1.2 has not yet failed, 1 is not 809 considered to have failed. Since 1.2 has not yet started, it is 810 started and the process continues. Similarly, if 1.1.1 successfully 811 connects, then it marks 1.1 as connected, which propagates to the 812 trunk node 1. At this point, the connection as a whole is considered 813 to be successfully connected and ready to process application data 815 1 [www.example.com:80, Any, TCP] 816 1.1 [www.example.com:80, Wi-Fi, TCP] 817 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 818 1.2 [www.example.com:80, LTE, TCP] 819 ... 821 If a leaf node has successfully completed its connection, all other 822 attempts should be made ineligible for use by the application for the 823 original request. New connection attempts that involve transmitting 824 data on the network ought not to be started after another leaf node 825 has already successfully completed, because the connection as a whole 826 has now been established. An implementation may choose to let 827 certain handshakes and negotiations complete in order to gather 828 metrics to influence future connections. Keeping additional 829 connections is generally not recommended since those attempts were 830 slower to connect and may exhibit less desirable properties. 832 4.3.1. Determining Successful Establishment 834 Implementations may select the criteria by which a leaf node is 835 considered to be successfully connected differently on a per-protocol 836 basis. If the only protocol being used is a transport protocol with 837 a clear handshake, like TCP, then the obvious choice is to declare 838 that node "connected" when the last packet of the three-way handshake 839 has been received. If the only protocol being used is an 840 "unconnected" protocol, like UDP, the implementation may consider the 841 node fully "connected" the moment it determines a route is present, 842 before sending any packets on the network, see further Section 4.5. 844 For protocol stacks with multiple handshakes, the decision becomes 845 more nuanced. If the protocol stack involves both TLS and TCP, an 846 implementation could determine that a leaf node is connected after 847 the TCP handshake is complete, or it can wait for the TLS handshake 848 to complete as well. The benefit of declaring completion when the 849 TCP handshake finishes, and thus stopping the race for other branches 850 of the tree, is that there will be less burden on the network from 851 other connection attempts. On the other hand, by waiting until the 852 TLS handshake is complete, an implementation avoids the scenario in 853 which a TCP handshake completes quickly, but TLS negotiation is 854 either very slow or fails altogether in particular network conditions 855 or to a particular endpoint. To avoid the issue of TLS possibly 856 failing, the implementation should not generate a Ready event for the 857 Connection until TLS is established. 859 If all of the leaf nodes fail to connect during racing, i.e. none of 860 the configurations that satisfy all requirements given in the 861 Transport Properties actually work over the available paths, then the 862 transport system should notify the application with an InitiateError 863 event. An InitiateError event should also be generated in case the 864 transport system finds no usable candidates to race. 866 4.4. Establishing multiplexed connections 868 Multiplexing several Connections over a single underlying transport 869 connection requires that the Connections to be multiplexed belong to 870 the same Connection Group (as is indicated by the application using 871 the Clone call). When the underlying transport connection supports 872 multi-streaming, the Transport System can map each Connection in the 873 Connection Group to a different stream. Thus, when the Connections 874 that are offered to an application by the Transport System are 875 multiplexed, the Transport System may implement the establishment of 876 a new Connection by simply beginning to use a new stream of an 877 already established transport connection and there is no need for a 878 connection establishment procedure. This, then, also means that 879 there may not be any "establishment" message (like a TCP SYN), but 880 the application can simply start sending or receiving. Therefore, 881 when the Initiate action of a Transport System is called without 882 Messages being handed over, it cannot be guaranteed that the other 883 endpoint will have any way to know about this, and hence a passive 884 endpoint's ConnectionReceived event may not be called upon an active 885 endpoint's Inititate. Instead, calling the ConnectionReceived event 886 may be delayed until the first Message arrives. 888 4.5. Handling racing with "unconnected" protocols 890 While protocols that use an explicit handshake to validate a 891 Connection to a peer can be used for racing multiple establishment 892 attempts in parallel, "unconnected" protocols such as raw UDP do not 893 offer a way to validate the presence of a peer or the usability of a 894 Connection without application feedback. An implementation should 895 consider such a protocol stack to be established as soon as a local 896 route to the peer endpoint is confirmed. 898 However, if a peer is not reachable over the network using the 899 unconnected protocol, or data cannot be exchanged for any other 900 reason, the application may want to attempt using another candidate 901 Protocol Stack. The implementation should maintain the list of other 902 candidate Protocol Stacks that were eligible to use. 904 4.6. Implementing listeners 906 When an implementation is asked to Listen, it registers with the 907 system to wait for incoming traffic to the Local Endpoint. If no 908 Local Endpoint is specified, the implementation should use an 909 ephemeral port. 911 If the Selection Properties do not require a single network interface 912 or path, but allow the use of multiple paths, the Listener object 913 should register for incoming traffic on all of the network interfaces 914 or paths that conform to the Properties. The set of available paths 915 can change over time, so the implementation should monitor network 916 path changes and register and de-register the Listener across all 917 usable paths. When using multiple paths, the Listener is generally 918 expected to use the same port for listening on each. 920 If the Selection Properties allow multiple protocols to be used for 921 listening, and the implementation supports it, the Listener object 922 should support receiving inbound connections for each eligible 923 protocol on each eligible path. 925 4.6.1. Implementing listeners for Connected Protocols 927 Connected protocols such as TCP and TLS-over-TCP have a strong 928 mapping between the Local and Remote Endpoints (five-tuple) and their 929 protocol connection state. These map into Connection objects. 930 Whenever a new inbound handshake is being started, the Listener 931 should generate a new Connection object and pass it to the 932 application. 934 4.6.2. Implementing listeners for Unconnected Protocols 936 Unconnected protocols such as UDP and UDP-lite generally do not 937 provide the same mechanisms that connected protocols do to offer 938 Connection objects. Implementations should wait for incoming packets 939 for unconnected protocols on a listening port and should perform 940 five-tuple matching of packets to either existing Connection objects 941 or the creation of new Connection objects. On platforms with 942 facilities to create a "virtual connection" for unconnected protocols 943 implementations should use these mechanisms to minimise the handling 944 of datagrams intended for already created Connection objects. 946 4.6.3. Implementing listeners for Multiplexed Protocols 948 Protocols that provide multiplexing of streams into a single five- 949 tuple can listen both for entirely new connections (a new HTTP/2 950 stream on a new TCP connection, for example) and for new sub- 951 connections (a new HTTP/2 stream on an existing connection). If the 952 abstraction of Connection presented to the application is mapped to 953 the multiplexed stream, then the Listener should deliver new 954 Connection objects in the same way for either case. The 955 implementation should allow the application to introspect the 956 Connection Group marked on the Connections to determine the grouping 957 of the multiplexing. 959 5. Implementing Sending and Receiving Data 961 The most basic mapping for sending a Message is an abstraction of 962 datagrams, in which the transport protocol naturally deals in 963 discrete packets. Each Message here corresponds to a single 964 datagram. Generally, these will be short enough that sending and 965 receiving will always use a complete Message. 967 For protocols that expose byte-streams, the only delineation provided 968 by the protocol is the end of the stream in a given direction. Each 969 Message in this case corresponds to the entire stream of bytes in a 970 direction. These Messages may be quite long, in which case they can 971 be sent in multiple parts. 973 Protocols that provide the framing (such as length-value protocols, 974 or protocols that use delimiters) provide data boundaries that may be 975 longer than a traditional packet datagram. Each Message for framing 976 protocols corresponds to a single frame, which may be sent either as 977 a complete Message, or in multiple parts. 979 5.1. Sending Messages 981 The effect of the application sending a Message is determined by the 982 top-level protocol in the established Protocol Stack. That is, if 983 the top-level protocol provides an abstraction of framed messages 984 over a connection, the receiving application will be able to obtain 985 multiple Messages on that connection, even if the framing protocol is 986 built on a byte-stream protocol like TCP. 988 5.1.1. Message Properties 990 * Lifetime: this should be implemented by removing the Message from 991 the queue of pending Messages after the Lifetime has expired. A 992 queue of pending Messages within the transport system 993 implementation that have yet to be handed to the Protocol Stack 994 can always support this property, but once a Message has been sent 995 into the send buffer of a protocol, only certain protocols may 996 support removing a message. For example, an implementation cannot 997 remove bytes from a TCP send buffer, while it can remove data from 998 a SCTP send buffer using the partial reliability extension 999 [RFC8303]. When there is no standing queue of Messages within the 1000 system, and the Protocol Stack does not support the removal of a 1001 Message from the stack's send buffer, this property may be 1002 ignored. 1004 * Priority: this represents the ability to prioritize a Message over 1005 other Messages. This can be implemented by the system re-ordering 1006 Messages that have yet to be handed to the Protocol Stack, or by 1007 giving relative priority hints to protocols that support 1008 priorities per Message. For example, an implementation of HTTP/2 1009 could choose to send Messages of different Priority on streams of 1010 different priority. 1012 * Ordered: when this is false, this disables the requirement of in- 1013 order-delivery for protocols that support configurable ordering. 1015 * Safely Replayable: when this is true, this means that the Message 1016 can be used by mechanisms that might transfer it multiple times - 1017 e.g., as a result of racing multiple transports or as part of TCP 1018 Fast Open. Also, protocols that do not protect against duplicated 1019 messages, such as UDP, can only be used with Messages that are 1020 Safely Replayable. 1022 * Final: when this is true, this means that a transport connection 1023 can be closed immediately after transmission of the message. 1025 * Corruption Protection Length: when this is set to any value other 1026 than "Full Coverage", it sets the minimum protection in protocols 1027 that allow limiting the checksum length (e.g. UDP-Lite). 1029 * Reliable Data Transfer (Message): When true, the property 1030 specifies that the Message must be reliably transmitted. When 1031 false, and if unreliable transmission is supported by the 1032 underlying protocol, then the Message should be unreliably 1033 transmitted. If the underlying protocol does not support 1034 unreliable transmission, the Message should be reliably 1035 transmitted. 1037 * Message Capacity Profile Override: When true, this expresses a 1038 wish to override the Generic Connection Property "Capacity 1039 Profile" for this Message. Depending on the value, this can, for 1040 example, be implemented by changing the DSCP value of the 1041 associated packet (note that the he guidelines in Section 6 of 1042 [RFC7657] apply; e.g., the DSCP value should not be changed for 1043 different packets within a reliable transport protocol session or 1044 DCCP connection). 1046 * No Fragmentation: When set, this property limits the message size 1047 to the Maximum Message Size Before Fragmentation or Segmentation 1048 (see Section 10.1.7 of [I-D.ietf-taps-interface]). Messages 1049 larger than this size generate an error. Setting this avoids 1050 transport-layer segmentation or network-layer fragmentation. When 1051 used with transports running over IP version 4 the Don't Fragment 1052 bit will be set to avoid on-path IP fragmentation ([RFC8304]). 1054 5.1.2. Send Completion 1056 The application should be notified whenever a Message or partial 1057 Message has been consumed by the Protocol Stack, or has failed to 1058 send. The meaning of the Message being consumed by the stack may 1059 vary depending on the protocol. For a basic datagram protocol like 1060 UDP, this may correspond to the time when the packet is sent into the 1061 interface driver. For a protocol that buffers data in queues, like 1062 TCP, this may correspond to when the data has entered the send 1063 buffer. 1065 5.1.3. Batching Sends 1067 Since sending a Message may involve a context switch between the 1068 application and the transport system, sending patterns that involve 1069 multiple small Messages can incur high overhead if each needs to be 1070 enqueued separately. To avoid this, the application can indicate a 1071 batch of Send actions through the API. When this is used, the 1072 implementation should hold off on processing Messages until the batch 1073 is complete. 1075 5.2. Receiving Messages 1077 Similar to sending, Receiving a Message is determined by the top- 1078 level protocol in the established Protocol Stack. The main 1079 difference with Receiving is that the size and boundaries of the 1080 Message are not known beforehand. The application can communicate in 1081 its Receive action the parameters for the Message, which can help the 1082 implementation know how much data to deliver and when. For example, 1083 if the application only wants to receive a complete Message, the 1084 implementation should wait until an entire Message (datagram, stream, 1085 or frame) is read before delivering any Message content to the 1086 application. This requires the implementation to understand where 1087 messages end, either via a supplied deframer or because the top-level 1088 protocol in the established Protocol Stack preserves message 1089 boundaries. If the top-level protocol only supports a byte-stream 1090 and no framers were supported, the application can control the flow 1091 of received data by specifying the minimum number of bytes of Message 1092 content it wants to receive at one time. 1094 If a Connection becomes finished before a requested Receive action 1095 can be satisfied, the implementation should deliver any partial 1096 Message content outstanding, or if none is available, an indication 1097 that there will be no more received Messages. 1099 5.3. Handling of data for fast-open protocols 1101 Several protocols allow sending higher-level protocol or application 1102 data within the first packet of their protocol establishment, such as 1103 TCP Fast Open [RFC7413] and TLS 1.3 [RFC8446]. This approach is 1104 referred to as sending Zero-RTT (0-RTT) data. This is a desirable 1105 property, but poses challenges to an implementation that uses racing 1106 during connection establishment. 1108 If the application has 0-RTT data to send in any protocol handshakes, 1109 it needs to provide this data before the handshakes have begun. When 1110 racing, this means that the data should be provided before the 1111 process of connection establishment has begun. If the application 1112 wants to send 0-RTT data, it must indicate this to the implementation 1113 by setting the "Safely Replayable" send parameter to true when 1114 sending the data. In general, 0-RTT data may be replayed (for 1115 example, if a TCP SYN contains data, and the SYN is retransmitted, 1116 the data will be retransmitted as well but may be considered as a new 1117 connection instead of a retransmission). Also, when racing 1118 connections, different leaf nodes have the opportunity to send the 1119 same data independently. If data is truly safely replayable, this 1120 should be permissible. 1122 Once the application has provided its 0-RTT data, an implementation 1123 should keep a copy of this data and provide it to each new leaf node 1124 that is started and for which a 0-RTT protocol is being used. 1126 It is also possible that protocol stacks within a particular leaf 1127 node use 0-RTT handshakes without any safely replayable application 1128 data. For example, TCP Fast Open could use a Client Hello from TLS 1129 as its 0-RTT data, shortening the cumulative handshake time. 1131 0-RTT handshakes often rely on previous state, such as TCP Fast Open 1132 cookies, previously established TLS tickets, or out-of-band 1133 distributed pre-shared keys (PSKs). Implementations should be aware 1134 of security concerns around using these tokens across multiple 1135 addresses or paths when racing. In the case of TLS, any given ticket 1136 or PSK should only be used on one leaf node, since servers will 1137 likely reject duplicate tickets in order to prevent replays (see 1138 section-8.1 [RFC8446]). If implementations have multiple tickets 1139 available from a previous connection, each leaf node attempt can use 1140 a different ticket. In effect, each leaf node will send the same 1141 early application data, yet encoded (encrypted) differently on the 1142 wire. 1144 6. Implementing Message Framers 1146 Message Framers are pieces of code that define simple transformations 1147 between application Message data and raw transport protocol data. A 1148 Framer can encapsulate or encode outbound Messages, and decapsulate 1149 or decode inbound data into Messages. 1151 While many protocols can be represented as Message Framers, for the 1152 purposes of the Transport Services interface these are ways for 1153 applications or application frameworks to define their own Message 1154 parsing to be included within a Connection's Protocol Stack. As an 1155 example, TLS can serve the purpose of framing data over TCP, but is 1156 exposed as a protocol natively supported by the Transport Services 1157 interface. 1159 Most Message Framers fall into one of two categories: 1161 * Header-prefixed record formats, such as a basic Type-Length-Value 1162 (TLV) structure 1164 * Delimiter-separated formats, such as HTTP/1.1. 1166 Common Message Framers can be provided by the Transport Services 1167 implementation, but an implementation ought to allow custom Message 1168 Framers to be defined by the application or some other piece of 1169 software. This section describes one possible interface for defining 1170 Message Framers as an example. 1172 6.1. Defining Message Framers 1174 A Message Framer is primarily defined by the set of code that handles 1175 events for a framer implementation, specifically how it handles 1176 inbound and outbound data parsing. The piece of code that implements 1177 custom framing logic will be referred to as the "framer 1178 implementation", which may be provided by the Transport Services 1179 implementation or the application itself. The Message Framer refers 1180 to the object or piece of code within the main Connection 1181 implementation that delivers events to the custom framer 1182 implementation whenever data is ready to be parsed or framed. 1184 When a Connection establishment attempt begins, an event can be 1185 delivered to notify the framer implementation that a new Connection 1186 is being created. Similarly, a stop event can be delivered when a 1187 Connection is being torn down. The framer implementation can use the 1188 Connection object to look up specific properties of the Connection or 1189 the network being used that may influence how to frame Messages. 1191 MessageFramer -> Start(Connection) 1192 MessageFramer -> Stop(Connection) 1194 When a Message Framer generates a "Start" event, the framer 1195 implementation has the opportunity to start writing some data prior 1196 to the Connection delivering its "Ready" event. This allows the 1197 implementation to communicate control data to the remote endpoint 1198 that can be used to parse Messages. 1200 MessageFramer.MakeConnectionReady(Connection) 1202 Similarly, when a Message Framer generates a "Stop" event, the framer 1203 implementation has the opportunity to write some final data or clear 1204 up its local state before the "Closed" event is delivered to the 1205 Application. The framer implementation can indicate that it has 1206 finished with this. 1208 MessageFramer.MakeConnectionClosed(Connection) 1209 At any time if the implementation encounters a fatal error, it can 1210 also cause the Connection to fail and provide an error. 1212 MessageFramer.FailConnection(Connection, Error) 1214 Should the framer implementation deem the candidate selected during 1215 racing unsuitable it can signal this by failing the Connection prior 1216 to marking it as ready. If there are no other candidates available, 1217 the Connection will fail. Otherwise, the Connection will select a 1218 different candidate and the Message Framer will generate a new 1219 "Start" event. 1221 Before an implementation marks a Message Framer as ready, it can also 1222 dynamically add a protocol or framer above it in the stack. This 1223 allows protocols like STARTTLS, that need to add TLS conditionally, 1224 to modify the Protocol Stack based on a handshake result. 1226 otherFramer := NewMessageFramer() 1227 MessageFramer.PrependFramer(Connection, otherFramer) 1229 6.2. Sender-side Message Framing 1231 Message Framers generate an event whenever a Connection sends a new 1232 Message. 1234 MessageFramer -> NewSentMessage 1236 Upon receiving this event, a framer implementation is responsible for 1237 performing any necessary transformations and sending the resulting 1238 data back to the Message Framer, which will in turn send it to the 1239 next protocol. Implementations SHOULD ensure that there is a way to 1240 pass the original data through without copying to improve 1241 performance. 1243 MessageFramer.Send(Connection, Data) 1245 To provide an example, a simple protocol that adds a length as a 1246 header would receive the "NewSentMessage" event, create a data 1247 representation of the length of the Message data, and then send a 1248 block of data that is the concatenation of the length header and the 1249 original Message data. 1251 6.3. Receiver-side Message Framing 1253 In order to parse a received flow of data into Messages, the Message 1254 Framer notifies the framer implementation whenever new data is 1255 available to parse. 1257 MessageFramer -> HandleReceivedData 1259 Upon receiving this event, the framer implementation can inspect the 1260 inbound data. The data is parsed from a particular cursor 1261 representing the unprocessed data. The application requests a 1262 specific amount of data it needs to have available in order to parse. 1263 If the data is not available, the parse fails. 1265 MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage) 1267 The framer implementation can directly advance the receive cursor 1268 once it has parsed data to effectively discard data (for example, 1269 discard a header once the content has been parsed). 1271 To deliver a Message to the application, the framer implementation 1272 can either directly deliver data that it has allocated, or deliver a 1273 range of data directly from the underlying transport and 1274 simultaneously advance the receive cursor. 1276 MessageFramer.AdvanceReceiveCursor(Connection, Length) 1277 MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) 1278 MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) 1280 Note that "MessageFramer.DeliverAndAdvanceReceiveCursor" allows the 1281 framer implementation to earmark bytes as part of a Message even 1282 before they are received by the transport. This allows the delivery 1283 of very large Messages without requiring the implementation to 1284 directly inspect all of the bytes. 1286 To provide an example, a simple protocol that parses a length as a 1287 header value would receive the "HandleReceivedData" event, and call 1288 "Parse" with a minimum and maximum set to the length of the header 1289 field. Once the parse succeeded, it would call 1290 "AdvanceReceiveCursor" with the length of the header field, and then 1291 call "DeliverAndAdvanceReceiveCursor" with the length of the body 1292 that was parsed from the header, marking the new Message as complete. 1294 7. Implementing Connection Management 1296 Once a Connection is established, the Transport Services system 1297 allows applications to interact with the Connection by modifying or 1298 inspecting Connection Properties. A Connection can also generate 1299 events in the form of Soft Errors. 1301 The set of Connection Properties that are supported for setting and 1302 getting on a Connection are described in [I-D.ietf-taps-interface]. 1303 For any properties that are generic, and thus could apply to all 1304 protocols being used by a Connection, the Transport System should 1305 store the properties in a generic storage, and notify all protocol 1306 instances in the Protocol Stack whenever the properties have been 1307 modified by the application. For protocol-specfic properties, such 1308 as the User Timeout that applies to TCP, the Transport System only 1309 needs to update the relevant protocol instance. 1311 If an error is encountered in setting a property (for example, if the 1312 application tries to set a TCP-specific property on a Connection that 1313 is not using TCP), the action should fail gracefully. The 1314 application may be informed of the error, but the Connection itself 1315 should not be terminated. 1317 The Transport Services implementation should allow protocol instances 1318 in the Protocol Stack to pass up arbitrary generic or protocol- 1319 specific errors that can be delivered to the application as Soft 1320 Errors. These allow the application to be informed of ICMP errors, 1321 and other similar events. 1323 7.1. Pooled Connection 1325 For protocols that employ request/response pairs and do not require 1326 in-order delivery of the responses, like HTTP, the transport 1327 implementation may distribute interactions across several underlying 1328 transport connections. For these kinds of protocols, implementations 1329 may hide the connection management and only expose a single 1330 Connection object and the individual requests/responses as messages. 1331 These Pooled Connections can use multiple connections or multiple 1332 streams of multi-streaming connections between endpoints, as long as 1333 all of these satisfy the requirements, and prohibitions specified in 1334 the Selection Properties of the Pooled Connection. This enables 1335 implementations to realize transparent connection coalescing, 1336 connection migration, and to perform per-message endpoint and path 1337 selection by choosing among these underlying connections. 1339 7.2. Handling Path Changes 1341 When a path change occurs, the Transport Services implementation is 1342 responsible for notifying Protocol Instances in the Protocol Stack. 1343 If the Protocol Stack includes a transport protocol that supports 1344 multipath connectivity, an update to the available paths should 1345 inform the Protocol Instance of the new set of paths that are 1346 permissible based on the Selection Properties passed by the 1347 application. A multipath protocol can establish new subflows over 1348 new paths, and should tear down subflows over paths that are no 1349 longer available. Pooled Connections Section 7.1 may add or remove 1350 underlying transport connections in a similar manner. If the 1351 Protocol Stack includes a transport protocol that does not support 1352 multipath, but support migrating between paths, the update to 1353 available paths can be used as the trigger to migrating the 1354 connection. For protocols that do not support multipath or 1355 migration, the Protocol Instances may be informed of the path change, 1356 but should not be forcibly disconnected if the previously used path 1357 becomes unavailable. An exception to this case is if the System 1358 Policy changes to prohibit traffic from the Connection based on its 1359 properties, in which case the Protocol Stack should be disconnected. 1361 8. Implementing Connection Termination 1363 With TCP, when an application closes a connection, this means that it 1364 has no more data to send (but expects all data that has been handed 1365 over to be reliably delivered). However, with TCP only, "close" does 1366 not mean that the application will stop receiving data. This is 1367 related to TCP's ability to support half-closed connections. 1369 SCTP is an example of a protocol that does not support such half- 1370 closed connections. Hence, with SCTP, the meaning of "close" is 1371 stricter: an application has no more data to send (but expects all 1372 data that has been handed over to be reliably delivered), and will 1373 also not receive any more data. 1375 Implementing a protocol independent transport system means that the 1376 exposed semantics must be the strictest subset of the semantics of 1377 all supported protocols. Hence, as is common with all reliable 1378 transport protocols, after a Close action, the application can expect 1379 to have its reliability requirements honored regarding the data it 1380 has given to the Transport System, but it cannot expect to be able to 1381 read any more data after calling Close. 1383 Abort differs from Close only in that no guarantees are given 1384 regarding data that the application has handed over to the Transport 1385 System before calling Abort. 1387 As explained in Section 4.4, when a new stream is multiplexed on an 1388 already existing connection of a Transport Protocol Instance, there 1389 is no need for a connection establishment procedure. Because the 1390 Connections that are offered by the Transport System can be 1391 implemented as streams that are multiplexed on a transport protocol's 1392 connection, it can therefore not be guaranteed that one Endpoint's 1393 Initiate action provokes a ConnectionReceived event at its peer. 1395 For Close (provoking a Finished event) and Abort (provoking a 1396 ConnectionError event), the same logic applies: while it is desirable 1397 to be informed when a peer closes or aborts a Connection, whether 1398 this is possible depends on the underlying protocol, and no 1399 guarantees can be given. With SCTP, the transport system can use the 1400 stream reset procedure to cause a Finish event upon a Close action 1401 from the peer [NEAT-flow-mapping]. 1403 9. Cached State 1405 Beyond a single Connection's lifetime, it is useful for an 1406 implementation to keep state and history. This cached state can help 1407 improve future Connection establishment due to re-using results and 1408 credentials, and favoring paths and protocols that performed well in 1409 the past. 1411 Cached state may be associated with different Endpoints for the same 1412 Connection, depending on the protocol generating the cached content. 1413 For example, session tickets for TLS are associated with specific 1414 endpoints, and thus should be cached based on a Connection's hostname 1415 Endpoint (if applicable). On the other hand, performance 1416 characteristics of a path are more likely tied to the IP address and 1417 subnet being used. 1419 9.1. Protocol state caches 1421 Some protocols will have long-term state to be cached in association 1422 with Endpoints. This state often has some time after which it is 1423 expired, so the implementation should allow each protocol to specify 1424 an expiration for cached content. 1426 Examples of cached protocol state include: 1428 * The DNS protocol can cache resolution answers (A and AAAA queries, 1429 for example), associated with a Time To Live (TTL) to be used for 1430 future hostname resolutions without requiring asking the DNS 1431 resolver again. 1433 * TLS caches session state and tickets based on a hostname, which 1434 can be used for resuming sessions with a server. 1436 * TCP can cache cookies for use in TCP Fast Open. 1438 Cached protocol state is primarily used during Connection 1439 establishment for a single Protocol Stack, but may be used to 1440 influence an implementation's preference between several candidate 1441 Protocol Stacks. For example, if two IP address Endpoints are 1442 otherwise equally preferred, an implementation may choose to attempt 1443 a connection to an address for which it has a TCP Fast Open cookie. 1445 Applications must have a way to flush protocol cache state if 1446 desired. This may be necessary, for example, if application-layer 1447 identifiers rotate and clients wish to avoid linkability via 1448 trackable TLS tickets or TFO cookies. 1450 9.2. Performance caches 1452 In addition to protocol state, Protocol Instances should provide data 1453 into a performance-oriented cache to help guide future protocol and 1454 path selection. Some performance information can be gathered 1455 generically across several protocols to allow predictive comparisons 1456 between protocols on given paths: 1458 * Observed Round Trip Time 1460 * Connection Establishment latency 1462 * Connection Establishment success rate 1464 These items can be cached on a per-address and per-subnet 1465 granularity, and averaged between different values. The information 1466 should be cached on a per-network basis, since it is expected that 1467 different network attachments will have different performance 1468 characteristics. Besides Protocol Instances, other system entities 1469 may also provide data into performance-oriented caches. This could 1470 for instance be signal strength information reported by radio modems 1471 like Wi-Fi and mobile broadband or information about the battery- 1472 level of the device. Furthermore, the system may cache the observed 1473 maximum throughput on a path as an estimate of the available 1474 bandwidth. 1476 An implementation should use this information, when possible, to 1477 determine preference between candidate paths, endpoints, and protocol 1478 options. Eligible options that historically had significantly better 1479 performance than others should be selected first when gathering 1480 candidates (see Section 4.1) to ensure better performance for the 1481 application. 1483 The reasonable lifetime for cached performance values will vary 1484 depending on the nature of the value. Certain information, like the 1485 connection establishment success rate to a Remote Endpoint using a 1486 given protocol stack, can be stored for a long period of time (hours 1487 or longer), since it is expected that the capabilities of the Remote 1488 Endpoint are not changing very quickly. On the other hand, the Round 1489 Trip Time observed by TCP over a particular network path may vary 1490 over a relatively short time interval. For such values, the 1491 implementation should remove them from the cache more quickly, or 1492 treat older values with less confidence/weight. 1494 [I-D.ietf-tcpm-2140bis] provides guidance about sharing of TCP 1495 Control Block information between connections on initialization. 1497 10. Specific Transport Protocol Considerations 1499 Each protocol that can run as part of a Transport Services 1500 implementation defines both its API mapping as well as implementation 1501 details. API mappings for a protocol apply most to Connections in 1502 which the given protocol is the "top" of the Protocol Stack. For 1503 example, the mapping of the "Send" function for TCP applies to 1504 Connections in which the application directly sends over TCP. If 1505 HTTP/2 is used on top of TCP, the HTTP/2 mappings take precendence. 1507 Each protocol has a notion of Connectedness. Possible values for 1508 Connectedness are: 1510 * Unconnected. Unconnected protocols do not establish explicit 1511 state between endpoints, and do not perform a handshake during 1512 Connection establishment. 1514 * Connected. Connected protocols establish state between endpoints, 1515 and perform a handshake during Connection establishment. The 1516 handshake may be 0-RTT to send data or resume a session, but 1517 bidirectional traffic is required to confirm connectedness. 1519 * Multiplexing Connected. Multiplexing Connected protocols share 1520 properties with Connected protocols, but also explictly support 1521 opening multiple application-level flows. This means that they 1522 can support cloning new Connection objects without a new explicit 1523 handshake. 1525 Protocols also define a notion of Data Unit. Possible values for 1526 Data Unit are: 1528 * Byte-stream. Byte-stream protocols do not define any Message 1529 boundaries of their own apart from the end of a stream in each 1530 direction. 1532 * Datagram. Datagram protocols define Message boundaries at the 1533 same level of transmission, such that only complete (not partial) 1534 Messages are supported. 1536 * Message. Message protocols support Message boundaries that can be 1537 sent and received either as complete or partial Messages. Maximum 1538 Message lengths can be defined, and Messages can be partially 1539 reliable. 1541 Below, terms in capitals with a dot (e.g., "CONNECT.SCTP") refer to 1542 the primitives with the same name in section 4 of [RFC8303]. For 1543 further implementation details, the description of these primitives 1544 in [RFC8303] points to section 3 of [RFC8303] and section 3 of 1545 [RFC8304], which refers back to the relevant specifications for each 1546 protocol. This back-tracking method applies to all elements of 1547 [I-D.ietf-taps-minset] (see appendix D of [I-D.ietf-taps-interface]): 1548 they are listed in appendix A of [I-D.ietf-taps-minset] with an 1549 implementation hint in the same style, pointing back to section 4 of 1550 [RFC8303]. 1552 10.1. TCP 1554 Connectedness: Connected 1556 Data Unit: Byte-stream 1558 API mappings for TCP are as follows: 1560 Connection Object: TCP connections between two hosts map directly to 1561 Connection objects. 1563 Initiate: CONNECT.TCP. Calling "Initiate" on a TCP Connection 1564 causes it to reserve a local port, and send a SYN to the Remote 1565 Endpoint. 1567 InitiateWithSend: CONNECT.TCP with parameter "user message". Early 1568 safely replayable data is sent on a TCP Connection in the SYN, as 1569 TCP Fast Open data. 1571 Ready: A TCP Connection is ready once the three-way handshake is 1572 complete. 1574 InitiateError: Failure of CONNECT.TCP. TCP can throw various errors 1575 during connection setup. Specifically, it is important to handle 1576 a RST being sent by the peer during the handshake. 1578 ConnectionError: Once established, TCP throws errors whenever the 1579 connection is disconnected, such as due to receiving a RST from 1580 the peer; or hitting a TCP retransmission timeout. 1582 Listen: LISTEN.TCP. Calling "Listen" for TCP binds a local port and 1583 prepares it to receive inbound SYN packets from peers. 1585 ConnectionReceived: TCP Listeners will deliver new connections once 1586 they have replied to an inbound SYN with a SYN-ACK. 1588 Clone: Calling "Clone" on a TCP Connection creates a new Connection 1589 with equivalent parameters. The two Connections are otherwise 1590 independent. 1592 Send: SEND.TCP. TCP does not on its own preserve Message 1593 boundaries. Calling "Send" on a TCP connection lays out the bytes 1594 on the TCP send stream without any other delineation. Any Message 1595 marked as Final will cause TCP to send a FIN once the Message has 1596 been completely written, by calling CLOSE.TCP immediately upon 1597 successful termination of SEND.TCP. 1599 Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without 1600 any Message delineation. All data delivered in the "Received" or 1601 "ReceivedPartial" event will be part of a single stream-wide 1602 Message that is marked Final (unless a Message Framer is used). 1603 EndOfMessage will be delivered when the TCP Connection has 1604 received a FIN (CLOSE-EVENT.TCP or ABORT-EVENT.TCP) from the peer. 1606 Close: Calling "Close" on a TCP Connection indicates that the 1607 Connection should be gracefully closed (CLOSE.TCP) by sending a 1608 FIN to the peer and waiting for a FIN-ACK before delivering the 1609 "Closed" event. 1611 Abort: Calling "Abort" on a TCP Connection indicates that the 1612 Connection should be immediately closed by sending a RST to the 1613 peer (ABORT.TCP). 1615 10.2. UDP 1617 Connectedness: Unconnected 1619 Data Unit: Datagram 1621 API mappings for UDP are as follows: 1623 Connection Object: UDP connections represent a pair of specific IP 1624 addresses and ports on two hosts. 1626 Initiate: CONNECT.UDP. Calling "Initiate" on a UDP Connection 1627 causes it to reserve a local port, but does not generate any 1628 traffic. 1630 InitiateWithSend: Early data on a UDP Connection does not have any 1631 special meaning. The data is sent whenever the Connection is 1632 Ready. 1634 Ready: A UDP Connection is ready once the system has reserved a 1635 local port and has a path to send to the Remote Endpoint. 1637 InitiateError: UDP Connections can only generate errors on 1638 initiation due to port conflicts on the local system. 1640 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1641 Lite)) upon receiving ICMP notifications indicating failures in 1642 the network. 1644 Listen: LISTEN.UDP. Calling "Listen" for UDP binds a local port and 1645 prepares it to receive inbound UDP datagrams from peers. 1647 ConnectionReceived: UDP Listeners will deliver new connections once 1648 they have received traffic from a new Remote Endpoint. 1650 Clone: Calling "Clone" on a UDP Connection creates a new Connection 1651 with equivalent parameters. The two Connections are otherwise 1652 independent. 1654 Send: SEND.UDP(-Lite). Calling "Send" on a UDP connection sends the 1655 data as the payload of a complete UDP datagram. Marking Messages 1656 as Final does not change anything in the datagram's contents. 1657 Upon sending a UDP datagram, some relevant fields and flags in the 1658 IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in 1659 IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)). 1661 Receive: RECEIVE.UDP(-Lite). UDP only delivers complete Messages to 1662 "Received", each of which represents a single datagram received in 1663 a UDP packet. Upon receiving a UDP datagram, the ECN flag from 1664 the IP header can be obtained (GET_ECN.UDP(-Lite)). 1666 Close: Calling "Close" on a UDP Connection (ABORT.UDP(-Lite)) 1667 releases the local port reservation. 1669 Abort: Calling "Abort" on a UDP Connection (ABORT.UDP(-Lite)) is 1670 identical to calling "Close". 1672 10.3. UDP Multicast Receive 1674 Connectedness: Unconnected 1676 Data Unit: Datagram 1678 API mappings for Receiving Multicast UDP are as follows: 1680 Connection Object: Established UDP Multicast Receive connections 1681 represent a pair of specific IP addresses and ports. The 1682 "unidirectional receive" transport property is required, and the 1683 local endpoint must be configured with a group IP address and a 1684 port. 1686 Initiate: Calling "Initiate" on a UDP Multicast Receive Connection 1687 causes an immediate InitiateError. This is an unsupported 1688 operation. 1690 InitiateWithSend: Calling "InitiateWithSend" on a UDP Multicast 1691 Receive Connection causes an immediate InitiateError. This is an 1692 unsupported operation. 1694 Ready: A UDP Multicast Receive Connection is ready once the system 1695 has received traffic for the appropriate group and port. 1697 InitiateError: UDP Multicast Receive Connections generate an 1698 InitiateError if Initiate is called. 1700 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1701 Lite)) upon receiving ICMP notifications indicating failures in 1702 the network. 1704 Listen: LISTEN.UDP. Calling "Listen" for UDP Multicast Receive 1705 binds a local port, prepares it to receive inbound UDP datagrams 1706 from peers, and issues a multicast host join. If a remote 1707 endpoint with an address is supplied, the join is Source-specific 1708 Multicast, and the path selection is based on the route to the 1709 remote endpoint. If a remote endpoint is not supplied, the join 1710 is Any-source Multicast, and the path selection is based on the 1711 outbound route to the group supplied in the local endpoint. 1713 ConnectionReceived: UDP Multicast Receive Listeners will deliver new 1714 connections once they have received traffic from a new Remote 1715 Endpoint. 1717 Clone: Calling "Clone" on a UDP Multicast Receive Connection creates 1718 a new Connection with equivalent parameters. The two Connections 1719 are otherwise independent. 1721 Send: SEND.UDP(-Lite). Calling "Send" on a UDP Multicast Receive 1722 connection causes an immediate SendError. This is an unsupported 1723 operation. 1725 Receive: RECEIVE.UDP(-Lite). The Receive operation in a UDP 1726 Multicast Receive connection only delivers complete Messages to 1727 "Received", each of which represents a single datagram received in 1728 a UDP packet. Upon receiving a UDP datagram, the ECN flag from 1729 the IP header can be obtained (GET_ECN.UDP(-Lite)). 1731 Close: Calling "Close" on a UDP Multicast Receive Connection 1732 (ABORT.UDP(-Lite)) releases the local port reservation and leaves 1733 the group. 1735 Abort: Calling "Abort" on a UDP Multicast Receive Connection 1736 (ABORT.UDP(-Lite)) is identical to calling "Close". 1738 10.4. TLS 1740 The mapping of a TLS stream abstraction into the application is 1741 equivalent to the contract provided by TCP (see Section 10.1), and 1742 builds upon many of the actions of TCP connections. 1744 Connectedness: Connected 1746 Data Unit: Byte-stream 1748 Connection Object: Connection objects represent a single TLS 1749 connection running over a TCP connection between two hosts. 1751 Initiate: Calling "Initiate" on a TLS Connection causes it to first 1752 initiate a TCP connection. Once the TCP protocol is Ready, the 1753 TLS handshake will be performed as a client (starting by sending a 1754 "client_hello", and so on). 1756 InitiateWithSend: Early safely replayable data is supported by TLS 1757 1.3, and sends encrypted application data in the first TLS message 1758 when performing session resumption. For older versions of TLS, or 1759 if a session is not being resumed, the initial data will be 1760 delayed until the TLS handshake is complete. TCP Fast Open can 1761 also be enabled automatically. 1763 Ready: A TLS Connection is ready once the underlying TCP connection 1764 is Ready, and TLS handshake is also complete and keys have been 1765 established to encrypt application data. 1767 InitiateError: In addition to TCP initiation errors, TLS can 1768 generate errors during its handshake. Examples of error include a 1769 failure of the peer to successfully authenticate, the peer 1770 rejecting the local authentication, or a failure to match versions 1771 or algorithms. 1773 ConnectionError: TLS connections will generate TCP errors, or errors 1774 due to failures to rekey or decrypt received messages. 1776 Listen: Calling "Listen" for TLS listens on TCP, and sets up 1777 received connections to perform server-side TLS handshakes. 1779 ConnectionReceived: TLS Listeners will deliver new connections once 1780 they have successfully completed both TCP and TLS handshakes. 1782 Clone: As with TCP, calling "Clone" on a TLS Connection creates a 1783 new Connection with equivalent parameters. The two Connections 1784 are otherwise independent. 1786 Send: Like TCP, TLS does not preserve message boundaries. Although 1787 application data is framed natively in TLS, there is not a general 1788 guarantee that these TLS messages represent semantically 1789 meaningful application stream boundaries. Rather, sending data on 1790 a TLS Connection only guarantees that the application data will be 1791 transmitted in an encrypted form. Marking Messages as Final 1792 causes a "close_notify" to be generated once the data has been 1793 written. 1795 Receive: Like TCP, TLS delivers a stream of bytes without any 1796 Message delineation. The data is decrypted prior to being 1797 delivered to the application. If a "close_notify" is received, 1798 the stream-wide Message will be delivered with EndOfMessage set. 1800 Close: Calling "Close" on a TLS Connection indicates that the 1801 Connection should be gracefully closed by sending a "close_notify" 1802 to the peer and waiting for a corresponding "close_notify" before 1803 delivering the "Closed" event. 1805 Abort: Calling "Abort" on a TCP Connection indicates that the 1806 Connection should be immediately closed by sending a 1807 "close_notify", optionally preceded by "user_canceled", to the 1808 peer. Implementations do not need to wait to receive 1809 "close_notify" before delivering the "Closed" event. 1811 10.5. DTLS 1813 DTLS follows the same behavior as TLS (Section 10.4), with the 1814 notable exception of not inheriting behavior directly from TCP. 1815 Differences from TLS are detailed below, and all cases not explicitly 1816 mentioned should be considered the same as TLS. 1818 Connectedness: Connected 1820 Data Unit: Datagram 1822 Connection Object: Connection objects represent a single DTLS 1823 connection running over a set of UDP ports between two hosts. 1825 Initiate: Calling "Initiate" on a DTLS Connection causes it reserve 1826 a UDP local port, and begin sending handshake messages to the peer 1827 over UDP. These messages are reliable, and will be automatically 1828 retransmitted. 1830 Ready: A DTLS Connection is ready once the TLS handshake is complete 1831 and keys have been established to encrypt application data. 1833 Send: Sending over DTLS does preserve message boundaries in the same 1834 way that UDP datagrams do. Marking a Message as Final does send a 1835 "close_notify" like TLS. 1837 Receive: Receiving over DTLS delivers one decrypted Message for each 1838 received DTLS datagram. If a "close_notify" is received, a 1839 Message will be delivered that is marked as Final. 1841 10.6. HTTP 1843 HTTP requests and responses map naturally into Messages, since they 1844 are delineated chunks of data with metadata that can be sent over a 1845 transport. To that end, HTTP can be seen as the most prevalent 1846 framing protocol that runs on top of streams like TCP, TLS, etc. 1848 In order to use a transport Connection that provides HTTP Message 1849 support, the establishment and closing of the connection can be 1850 treated as it would without the framing protocol. Sending and 1851 receiving of Messages, however, changes to treat each Message as a 1852 well-delineated HTTP request or response, with the content of the 1853 Message representing the body, and the Headers being provided in 1854 Message metadata. 1856 Connectedness: Multiplexing Connected 1858 Data Unit: Message 1859 Connection Object: Connection objects represent a flow of HTTP 1860 messages between a client and a server, which may be an HTTP/1.1 1861 connection over TCP, or a single stream in an HTTP/2 connection. 1863 Initiate: Calling "Initiate" on an HTTP connection intiates a TCP or 1864 TLS connection as a client. 1866 Clone: Calling "Clone" on an HTTP Connection opens a new stream on 1867 an existing HTTP/2 connection when possible. If the underlying 1868 version does not support multiplexed streams, calling "Clone" 1869 simply creates a new parallel connection. 1871 Send: When an application sends an HTTP Message, it is expected to 1872 provide HTTP header values as a MessageContext in a canonical 1873 form, along with any associated HTTP message body as the Message 1874 data. The HTTP header values are encoded in the specific version 1875 format upon sending. 1877 Receive: HTTP Connections deliver Messages in which HTTP header 1878 values attached to MessageContexts, and HTTP bodies in Message 1879 data. 1881 Close: Calling "Close" on an HTTP Connection will only close the 1882 underlying TLS or TCP connection if the HTTP version does not 1883 support multiplexing. For HTTP/2, for example, closing the 1884 connection only closes a specific stream. 1886 10.7. QUIC 1888 QUIC provides a multi-streaming interface to an encrypted transport. 1889 Each stream can be viewed as equivalent to a TLS stream over TCP, so 1890 a natural mapping is to present each QUIC stream as an individual 1891 Connection. The protocol for the stream will be considered Ready 1892 whenever the underlying QUIC connection is established to the point 1893 that this stream's data can be sent. For streams after the first 1894 stream, this will likely be an immediate operation. 1896 Closing a single QUIC stream, presented to the application as a 1897 Connection, does not imply closing the underlying QUIC connection 1898 itself. Rather, the implementation may choose to close the QUIC 1899 connection once all streams have been closed (often after some 1900 timeout), or after an individual stream Connection sends an Abort. 1902 Connectedness: Multiplexing Connected 1904 Data Unit: Stream 1906 Connection Object: Connection objects represent a single QUIC stream 1907 on a QUIC connection. 1909 10.8. HTTP/2 transport 1911 Similar to QUIC (Section 10.7), HTTP/2 provides a multi-streaming 1912 interface. This will generally use HTTP as the unit of Messages over 1913 the streams, in which each stream can be represented as a transport 1914 Connection. The lifetime of streams and the HTTP/2 connection should 1915 be managed as described for QUIC. 1917 It is possible to treat each HTTP/2 stream as a raw byte-stream 1918 instead of a carrier for HTTP messages, in which case the Messages 1919 over the streams can be represented similarly to the TCP stream (one 1920 Message per direction, see Section 10.1). 1922 Connectedness: Multiplexing Connected 1924 Data Unit: Stream 1926 Connection Object: Connection objects represent a single HTTP/2 1927 stream on a HTTP/2 connection. 1929 10.9. SCTP 1931 Connectedness: Connected 1933 Data Unit: Message 1935 API mappings for SCTP are as follows: 1937 Connection Object: Connection objects represent a flow of SCTP 1938 messages between a client and a server, which may be an SCTP 1939 association or a stream in a SCTP association. How to map 1940 Connection objects to streams is described in [NEAT-flow-mapping]; 1941 in the following, a similar method is described. To map 1942 Connection objects to SCTP streams without head-of-line blocking 1943 on the sender side, both the sending and receiving SCTP 1944 implementation must support message interleaving [RFC8260]. Both 1945 SCTP implementations must also support stream reconfiguration. 1946 Finally, both communicating endpoints must be aware of this 1947 intended multiplexing; [NEAT-flow-mapping] describes a way for a 1948 Transport System to negotiate the stream mapping capability using 1949 SCTP's adaptation layer indication, such that this functionality 1950 would only take effect if both ends sides are aware of it. The 1951 first flow, for which the SCTP association has been created, will 1952 always use stream id zero. All additional flows are assigned to 1953 unused stream ids in growing order. To avoid a conflict when both 1954 endpoints map new flows simultaneously, the peer which initiated 1955 the transport connection will use even stream numbers whereas the 1956 remote side will map its flows to odd stream numbers. Both sides 1957 maintain a status map of the assigned stream numbers. Generally, 1958 new streams must consume the lowest available (even or odd, 1959 depending on the side) stream number; this rule is relevant when 1960 lower numbers become available because Connection objects 1961 associated to the streams are closed. 1963 Initiate: If this is the only Connection object that is assigned to 1964 the SCTP association or stream mapping has not been negotiated, 1965 CONNECT.SCTP is called. Else, a new stream is used: if there are 1966 enough streams available, "Initiate" is just a local operation 1967 that assigns a new stream number to the Connection object. The 1968 number of streams is negotiated as a parameter of the prior 1969 CONNECT.SCTP call, and it represents a trade-off between local 1970 resource usage and the number of Connection objects that can be 1971 mapped without requiring a reconfiguration signal. When running 1972 out of streams, ADD_STREAM.SCTP must be called. 1974 InitiateWithSend: If this is the only Connection object that is 1975 assigned to the SCTP association or stream mapping has not been 1976 negotiated, CONNECT.SCTP is called with the "user message" 1977 parameter. Else, a new stream is used (see "Initiate" for how to 1978 handle running out of streams), and this just sends the first 1979 message on a new stream. 1981 Ready: "Initiate" or "InitiateWithSend" returns without an error, 1982 i.e. SCTP's four-way handshake has completed. If an association 1983 with the peer already exists, and stream mapping has been 1984 negotiated and enough streams are available, a Connection Object 1985 instantly becomes Ready after calling "Initiate" or 1986 "InitiateWithSend". 1988 InitiateError: Failure of CONNECT.SCTP. 1990 ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. 1992 Listen: LISTEN.SCTP. If an association with the peer already exists 1993 and stream mapping has been negotiated, "Listen" just expects to 1994 receive a new message on a new stream id (chosen in accordance 1995 with the stream number assignment procedure described above). 1997 ConnectionReceived: LISTEN.SCTP returns without an error (a result 1998 of successful CONNECT.SCTP from the peer), or, in case of stream 1999 mapping, the first message has arrived on a new stream (in this 2000 case, "Receive" is also invoked). 2002 Clone: Calling "Clone" on an SCTP association creates a new 2003 Connection object and assigns it a new stream number in accordance 2004 with the stream number assignment procedure described above. If 2005 there are not enough streams available, ADD_STREAM.SCTP must be 2006 called. 2008 Priority (Connection): When this value is changed, or a Message with 2009 Message Property "Priority" is sent, and there are multiple 2010 Connection objects assigned to the same SCTP association, 2011 CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities 2012 of streams in the SCTP association. 2014 Send: SEND.SCTP. Message Properties such as "Lifetime" and 2015 "Ordered" map to parameters of this primitive. 2017 Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a 2018 "ReceivedPartial" event. 2020 Close: If this is the only Connection object that is assigned to the 2021 SCTP association, CLOSE.SCTP is called. Else, the Connection object 2022 is one out of several Connection objects that are assigned to the 2023 same SCTP assocation, and RESET_STREAM.SCTP must be called, which 2024 informs the peer that the stream will no longer be used for mapping 2025 and can be used by future "Initiate", "InitiateWithSend" or "Listen" 2026 calls. At the peer, the event RESET_STREAM-EVENT.SCTP will fire, 2027 which the peer must answer by issuing RESET_STREAM.SCTP too. The 2028 resulting local RESET_STREAM-EVENT.SCTP informs the transport system 2029 that the stream number can now be re-used by the next "Initiate", 2030 "InitiateWithSend" or "Listen" calls. 2032 Abort: If this is the only Connection object that is assigned to the 2033 SCTP association, ABORT.SCTP is called. Else, the Connection object 2034 is one out of several Connection objects that are assigned to the 2035 same SCTP assocation, and shutdown proceeds as described under 2036 "Close". 2038 11. IANA Considerations 2040 RFC-EDITOR: Please remove this section before publication. 2042 This document has no actions for IANA. 2044 12. Security Considerations 2046 [I-D.ietf-taps-arch] outlines general security consideration and 2047 requirements for any system that implements the TAPS archtecture. 2048 [I-D.ietf-taps-interface] provides further discussion on security and 2049 privacy implications of the TAPS API. This document provides 2050 additional guidance on implementation specifics for the TAPS API and 2051 as such the security considerations in both of these documents apply. 2052 The next two subsections discuss further considerations that are 2053 specific to mechanisms specified in this document. 2055 12.1. Considerations for Candidate Gathering 2057 Implementations should avoid downgrade attacks that allow network 2058 interference to cause the implementation to select less secure, or 2059 entirely insecure, combinations of paths and protocols. 2061 12.2. Considerations for Candidate Racing 2063 See Section 5.3 for security considerations around racing with 0-RTT 2064 data. 2066 An attacker that knows a particular device is racing several options 2067 during connection establishment may be able to block packets for the 2068 first connection attempt, thus inducing the device to fall back to a 2069 secondary attempt. This is a problem if the secondary attempts have 2070 worse security properties that enable further attacks. 2071 Implementations should ensure that all options have equivalent 2072 security properties to avoid incentivizing attacks. 2074 Since results from the network can determine how a connection attempt 2075 tree is built, such as when DNS returns a list of resolved endpoints, 2076 it is possible for the network to cause an implementation to consume 2077 significant on-device resources. Implementations should limit the 2078 maximum amount of state allowed for any given node, including the 2079 number of child nodes, especially when the state is based on results 2080 from the network. 2082 13. Acknowledgements 2084 This work has received funding from the European Union's Horizon 2020 2085 research and innovation programme under grant agreement No. 644334 2086 (NEAT). 2088 This work has been supported by Leibniz Prize project funds of DFG - 2089 German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ 2090 FE 570/4-1). 2092 This work has been supported by the UK Engineering and Physical 2093 Sciences Research Council under grant EP/R04144X/1. 2095 This work has been supported by the Research Council of Norway under 2096 its "Toppforsk" programme through the "OCARINA" project. 2098 Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric 2099 Kinnear for their implementation and design efforts, including Happy 2100 Eyeballs, that heavily influenced this work. 2102 14. References 2104 14.1. Normative References 2106 [I-D.ietf-taps-arch] 2107 Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., 2108 Perkins, C., Tiesel, P., and C. Wood, "An Architecture for 2109 Transport Services", Work in Progress, Internet-Draft, 2110 draft-ietf-taps-arch-07, 9 March 2020, 2111 . 2114 [I-D.ietf-taps-interface] 2115 Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., 2116 Kuehlewind, M., Perkins, C., Tiesel, P., Wood, C., and T. 2117 Pauly, "An Abstract Application Layer Interface to 2118 Transport Services", Work in Progress, Internet-Draft, 2119 draft-ietf-taps-interface-06, 9 March 2020, 2120 . 2123 [I-D.ietf-taps-minset] 2124 Welzl, M. and S. Gjessing, "A Minimal Set of Transport 2125 Services for End Systems", Work in Progress, Internet- 2126 Draft, draft-ietf-taps-minset-11, 27 September 2018, 2127 . 2130 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 2131 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 2132 . 2134 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 2135 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 2136 DOI 10.17487/RFC7540, May 2015, 2137 . 2139 [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, 2140 "Stream Schedulers and User Message Interleaving for the 2141 Stream Control Transmission Protocol", RFC 8260, 2142 DOI 10.17487/RFC8260, November 2017, 2143 . 2145 [RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of 2146 Transport Features Provided by IETF Transport Protocols", 2147 RFC 8303, DOI 10.17487/RFC8303, February 2018, 2148 . 2150 [RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the 2151 User Datagram Protocol (UDP) and Lightweight UDP (UDP- 2152 Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018, 2153 . 2155 [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: 2156 Better Connectivity Using Concurrency", RFC 8305, 2157 DOI 10.17487/RFC8305, December 2017, 2158 . 2160 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 2161 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 2162 . 2164 14.2. Informative References 2166 [I-D.ietf-quic-transport] 2167 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 2168 and Secure Transport", Work in Progress, Internet-Draft, 2169 draft-ietf-quic-transport-29, 9 June 2020, 2170 . 2173 [I-D.ietf-tcpm-2140bis] 2174 Touch, J., Welzl, M., and S. Islam, "TCP Control Block 2175 Interdependence", Work in Progress, Internet-Draft, draft- 2176 ietf-tcpm-2140bis-05, 29 April 2020, . 2179 [NEAT-flow-mapping] 2180 "Transparent Flow Mapping for NEAT", Workshop on Future of 2181 Internet Transport (FIT 2017) , 2017. 2183 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 2184 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 2185 DOI 10.17487/RFC5389, October 2008, 2186 . 2188 [RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using 2189 Relays around NAT (TURN): Relay Extensions to Session 2190 Traversal Utilities for NAT (STUN)", RFC 5766, 2191 DOI 10.17487/RFC5766, April 2010, 2192 . 2194 [RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762, 2195 DOI 10.17487/RFC6762, February 2013, 2196 . 2198 [RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service 2199 Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013, 2200 . 2202 [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services 2203 (Diffserv) and Real-Time Communication", RFC 7657, 2204 DOI 10.17487/RFC7657, November 2015, 2205 . 2207 [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive 2208 Connectivity Establishment (ICE): A Protocol for Network 2209 Address Translator (NAT) Traversal", RFC 8445, 2210 DOI 10.17487/RFC8445, July 2018, 2211 . 2213 Appendix A. Additional Properties 2215 This appendix discusses implementation considerations for additional 2216 parameters and properties that could be used to enhance transport 2217 protocol and/or path selection, or the transmission of messages given 2218 a Protocol Stack that implements them. These are not part of the 2219 interface, and may be removed from the final document, but are 2220 presented here to support discussion within the TAPS working group as 2221 to whether they should be added to a future revision of the base 2222 specification. 2224 A.1. Properties Affecting Sorting of Branches 2226 In addition to the Protocol and Path Selection Properties discussed 2227 in Section 4.1.5, the following properties under discussion can 2228 influence branch sorting: 2230 * Bounds on Send or Receive Rate: If the application indicates a 2231 bound on the expected Send or Receive bitrate, an implementation 2232 may prefer a path that can likely provide the desired bandwidth, 2233 based on cached maximum throughput, see Section 9.2. The 2234 application may know the Send or Receive Bitrate from metadata in 2235 adaptive HTTP streaming, such as MPEG-DASH. 2237 * Cost Preferences: If the application indicates a preference to 2238 avoid expensive paths, and some paths are associated with a 2239 monetary cost, an implementation should decrease the ranking of 2240 such paths. If the application indicates that it prohibits using 2241 expensive paths, paths that are associated with a cost should be 2242 purged from the decision tree. 2244 Appendix B. Reasons for errors 2246 The Transport Services API [I-D.ietf-taps-interface] allows for the 2247 several generic error types to specify a more detailed reason as to 2248 why an error occurred. This appendix lists some of the possible 2249 reasons. 2251 * InvalidConfiguration: The transport properties and endpoints 2252 provided by the application are either contradictory or 2253 incomplete. Examples include the lack of a remote endpoint on an 2254 active open or using a multicast group address while not 2255 requesting a unidirectional receive. 2257 * NoCandidates: The configuration is valid, but none of the 2258 available transport protocols can satisfy the transport properties 2259 provided by the application. 2261 * ResolutionFailed: The remote or local specifier provided by the 2262 application can not be resolved. 2264 * EstablishmentFailed: The TAPS system was unable to establish a 2265 transport-layer connection to the remote endpoint specified by the 2266 application. 2268 * PolicyProhibited: The system policy prevents the transport system 2269 from performing the action requested by the application. 2271 * NotCloneable: The protocol stack is not capable of being cloned. 2273 * MessageTooLarge: The message size is too big for the transport 2274 system to handle. 2276 * ProtocolFailed: The underlying protocol stack failed. 2278 * InvalidMessageProperties: The message properties are either 2279 contradictory to the transport properties or they can not be 2280 satisfied by the transport system. 2282 * DeframingFailed: The data that was received by the underlying 2283 protocol stack could not be deframed. 2285 * ConnectionAborted: The connection was aborted by the peer. 2287 * Timeout: Delivery of a message was not possible after a timeout. 2289 Appendix C. Existing Implementations 2291 This appendix gives an overview of existing implementations, at the 2292 time of writing, of transport systems that are (to some degree) in 2293 line with this document. 2295 * Apple's Network.framework: 2297 - Network.framework is a transport-level API built for C, 2298 Objective-C, and Swift. It a connect-by-name API that supports 2299 transport security protocols. It provides userspace 2300 implementations of TCP, UDP, TLS, DTLS, proxy protocols, and 2301 allows extension via custom framers. 2303 - Documentation: https://developer.apple.com/documentation/ 2304 network (https://developer.apple.com/documentation/network) 2306 * NEAT and NEATPy: 2308 - NEAT is the output of the European H2020 research project 2309 "NEAT"; it is a user-space library for protocol-independent 2310 communication on top of TCP, UDP and SCTP, with many more 2311 features such as a policy manager. 2313 - Code: https://github.com/NEAT-project/neat (https://github.com/ 2314 NEAT-project/neat) 2316 - NEAT project: https://www.neat-project.org (https://www.neat- 2317 project.org) 2319 - NEATPy is a Python shim over NEAT which updates the NEAT API to 2320 be in line with version 6 of the TAPS interface draft. 2322 - Code: https://github.com/theagilepadawan/NEATPy 2323 (https://github.com/theagilepadawan/NEATPy) 2325 * PyTAPS: 2327 - A TAPS implementation based on Python asyncio, offering 2328 protocol-independent communication to applications on top of 2329 TCP, UDP and TLS, with support for multicast. 2331 - Code: https://github.com/fg-inet/python-asyncio-taps 2332 (https://github.com/fg-inet/python-asyncio-taps) 2334 Authors' Addresses 2336 Anna Brunstrom (editor) 2337 Karlstad University 2338 Universitetsgatan 2 2339 651 88 Karlstad 2340 Sweden 2342 Email: anna.brunstrom@kau.se 2344 Tommy Pauly (editor) 2345 Apple Inc. 2346 One Apple Park Way 2347 Cupertino, California 95014, 2348 United States of America 2350 Email: tpauly@apple.com 2352 Theresa Enghardt 2353 Netflix 2354 121 Albright Way 2355 Los Gatos, CA 95032, 2356 United States of America 2358 Email: ietf@tenghardt.net 2360 Karl-Johan Grinnemo 2361 Karlstad University 2362 Universitetsgatan 2 2363 651 88 Karlstad 2364 Sweden 2366 Email: karl-johan.grinnemo@kau.se 2368 Tom Jones 2369 University of Aberdeen 2370 Fraser Noble Building 2371 Aberdeen, AB24 3UE 2372 United Kingdom 2374 Email: tom@erg.abdn.ac.uk 2375 Philipp S. Tiesel 2376 TU Berlin 2377 Einsteinufer 25 2378 10587 Berlin 2379 Germany 2381 Email: philipp@tiesel.net 2383 Colin Perkins 2384 University of Glasgow 2385 School of Computing Science 2386 Glasgow G12 8QQ 2387 United Kingdom 2389 Email: csp@csperkins.org 2391 Michael Welzl 2392 University of Oslo 2393 PO Box 1080 Blindern 2394 0316 Oslo 2395 Norway 2397 Email: michawe@ifi.uio.no