idnits 2.17.1 draft-ietf-taps-impl-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 8 instances of too long lines in the document, the longest one being 44 characters in excess of 72. ** The abstract seems to contain references ([I-D.ietf-taps-arch]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1233: '... Implementations SHOULD ensure that th...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (9 March 2020) is 1508 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'SUBCATEGORY' is mentioned on line 1532, but not defined == Outdated reference: A later version (-19) exists of draft-ietf-taps-arch-06 == Outdated reference: A later version (-26) exists of draft-ietf-taps-interface-05 ** Obsolete normative reference: RFC 7540 (Obsoleted by RFC 9113) == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-27 -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TAPS Working Group A. Brunstrom, Ed. 3 Internet-Draft Karlstad University 4 Intended status: Informational T. Pauly, Ed. 5 Expires: 10 September 2020 Apple Inc. 6 T. Enghardt 7 TU Berlin 8 K-J. Grinnemo 9 Karlstad University 10 T. Jones 11 University of Aberdeen 12 P. Tiesel 13 TU Berlin 14 C. Perkins 15 University of Glasgow 16 M. Welzl 17 University of Oslo 18 9 March 2020 20 Implementing Interfaces to Transport Services 21 draft-ietf-taps-impl-06 23 Abstract 25 The Transport Services architecture [I-D.ietf-taps-arch] defines a 26 system that allows applications to use transport networking protocols 27 flexibly. This document serves as a guide to implementation on how 28 to build such a system. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on 10 September 2020. 47 Copyright Notice 49 Copyright (c) 2020 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 54 license-info) in effect on the date of publication of this document. 55 Please review these documents carefully, as they describe your rights 56 and restrictions with respect to this document. Code Components 57 extracted from this document must include Simplified BSD License text 58 as described in Section 4.e of the Trust Legal Provisions and are 59 provided without warranty as described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 65 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 66 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 67 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 68 4. Implementing Connection Establishment . . . . . . . . . . . . 7 69 4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 8 70 4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 8 71 4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 72 4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 11 73 4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 13 74 4.3. Sorting Branches . . . . . . . . . . . . . . . . . . . . 14 75 4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 16 76 4.4.1. Delayed . . . . . . . . . . . . . . . . . . . . . . . 16 77 4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 17 78 4.5. Completing Establishment . . . . . . . . . . . . . . . . 17 79 4.5.1. Determining Successful Establishment . . . . . . . . 18 80 4.6. Establishing multiplexed connections . . . . . . . . . . 19 81 4.7. Handling racing with "unconnected" protocols . . . . . . 19 82 4.8. Implementing listeners . . . . . . . . . . . . . . . . . 20 83 4.8.1. Implementing listeners for Connected Protocols . . . 20 84 4.8.2. Implementing listeners for Unconnected Protocols . . 21 85 4.8.3. Implementing listeners for Multiplexed Protocols . . 21 86 5. Implementing Sending and Receiving Data . . . . . . . . . . . 21 87 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 22 88 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 22 89 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 23 90 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 23 91 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 24 92 5.3. Handling of data for fast-open protocols . . . . . . . . 24 93 6. Implementing Message Framers . . . . . . . . . . . . . . . . 25 94 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 26 95 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 27 96 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 27 97 7. Implementing Connection Management . . . . . . . . . . . . . 28 98 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 29 99 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 29 100 8. Implementing Connection Termination . . . . . . . . . . . . . 30 101 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 31 102 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 31 103 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 32 104 10. Specific Transport Protocol Considerations . . . . . . . . . 33 105 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 106 10.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 35 107 10.3. UDP Multicast Receive . . . . . . . . . . . . . . . . . 36 108 10.4. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . 38 109 10.5. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 39 110 10.6. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 40 111 10.7. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 41 112 10.8. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 41 113 10.9. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 42 114 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 115 12. Security Considerations . . . . . . . . . . . . . . . . . . . 44 116 12.1. Considerations for Candidate Gathering . . . . . . . . . 44 117 12.2. Considerations for Candidate Racing . . . . . . . . . . 44 118 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 119 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 120 14.1. Normative References . . . . . . . . . . . . . . . . . . 45 121 14.2. Informative References . . . . . . . . . . . . . . . . . 46 122 Appendix A. Additional Properties . . . . . . . . . . . . . . . 47 123 A.1. Properties Affecting Sorting of Branches . . . . . . . . 47 124 Appendix B. Reasons for errors . . . . . . . . . . . . . . . . . 47 125 Appendix C. Existing Implementations . . . . . . . . . . . . . . 48 126 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 49 128 1. Introduction 130 The Transport Services architecture [I-D.ietf-taps-arch] defines a 131 system that allows applications to use transport networking protocols 132 flexibly. The interface such a system exposes to applications is 133 defined as the Transport Services API [I-D.ietf-taps-interface]. 134 This API is designed to be generic across multiple transport 135 protocols and sets of protocols features. 137 This document serves as a guide to implementation on how to build a 138 system that provides a Transport Services API. It is the job of an 139 implementation of a Transport Services system to turn the requests of 140 an application into decisions on how to establish connections, and 141 how to transfer data over those connections once established. The 142 terminology used in this document is based on the Architecture 143 [I-D.ietf-taps-arch]. 145 2. Implementing Connection Objects 147 The connection objects that are exposed to applications for Transport 148 Services are: 150 * the Preconnection, the bundle of properties that describes the 151 application constraints on the transport; 153 * the Connection, the basic object that represents a flow of data in 154 either direction between the Local and Remote Endpoints; 156 * and the Listener, a passive waiting object that delivers new 157 Connections. 159 Preconnection objects should be implemented as bundles of properties 160 that an application can both read and write. Once a Preconnection 161 has been used to create an outbound Connection or a Listener, the 162 implementation should ensure that the copy of the properties held by 163 the Connection or Listener is immutable. This may involve performing 164 a deep-copy if the application is still able to modify properties on 165 the original Preconnection object. 167 Connection objects represent the interface between the application 168 and the implementation to manage transport state, and conduct data 169 transfer. During the process of establishment (Section 4), the 170 Connection will be unbound to a specific transport flow, since there 171 may be multiple candidate Protocol Stacks being raced. Once the 172 Connection is established, the object should be considered mapped to 173 a specific Protocol Stack. The notion of a Connection maps to many 174 different protocols, depending on the Protocol Stack. For example, 175 the Connection may ultimately represent the interface into a TCP 176 connection, a TLS session over TCP, a UDP flow with fully-specified 177 local and remote endpoints, a DTLS session, a SCTP stream, a QUIC 178 stream, or an HTTP/2 stream. 180 Listener objects are created with a Preconnection, at which point 181 their configuration should be considered immutable by the 182 implementation. The process of listening is described in 183 Section 4.8. 185 3. Implementing Pre-Establishment 187 During pre-establishment the application specifies the Endpoints to 188 be used for communication as well as its preferences via Selection 189 Properties and, if desired, also Connection Properties. Generally, 190 Connection Properties should be configured as early as possible, as 191 they may serve as input to decisions that are made by the 192 implementation (the Capacity Profile may guide usage of a protocol 193 offering scavenger-type congestion control, for example). In the 194 remainder of this document, we only refer to Selection Properties 195 because they are the more typical case and have to be handled by all 196 implementations. 198 The implementation stores these objects and properties as part of the 199 Preconnection object for use during connection establishment. For 200 Selection Properties that are not provided by the application, the 201 implementation must use the default values specified in the Transport 202 Services API ([I-D.ietf-taps-interface]). 204 3.1. Configuration-time errors 206 The transport system should have a list of supported protocols 207 available, which each have transport features reflecting the 208 capabilities of the protocol. Once an application specifies its 209 Transport Parameters, the transport system should match the required 210 and prohibited properties against the transport features of the 211 available protocols. 213 In the following cases, failure should be detected during pre- 214 establishment: 216 * The application requested Protocol Properties that include 217 requirements or prohibitions that cannot be satisfied by any of 218 the available protocols. For example, if an application requires 219 "Configure Reliability per Message", but no such protocol is 220 available on the host running the transport system, e.g., because 221 SCTP is not supported by the operating system, this should result 222 in an error. 224 * The application requested Protocol Properties that are in conflict 225 with each other, i.e., the required and prohibited properties 226 cannot be satisfied by the same protocol. For example, if an 227 application prohibits "Reliable Data Transfer" but then requires 228 "Configure Reliability per Message", this mismatch should result 229 in an error. 231 It is important to fail as early as possible in such cases in order 232 to avoid allocating resources, e.g., to endpoint resolution, only to 233 find out later that there is no protocol that satisfies the 234 requirements. 236 3.2. Role of system policy 238 The properties specified during pre-establishment have a close 239 connection to system policy. The implementation is responsible for 240 combining and reconciling several different sources of preferences 241 when establishing Connections. These include, but are not limited 242 to: 244 1. Application preferences, i.e., preferences specified during the 245 pre-establishment via Selection Properties. 247 2. Dynamic system policy, i.e., policy compiled from internally and 248 externally acquired information about available network 249 interfaces, supported transport protocols, and current/previous 250 Connections. Examples of ways to externally retrieve policy- 251 support information are through OS-specific statistics/ 252 measurement tools and tools that reside on middleboxes and 253 routers. 255 3. Default implementation policy, i.e., predefined policy by OS or 256 application. 258 In general, any protocol or path used for a connection must conform 259 to all three sources of constraints. Any violation of any of the 260 layers should cause a protocol or path to be considered ineligible 261 for use. For an example of application preferences leading to 262 constraints, an application may prohibit the use of metered network 263 interfaces for a given Connection to avoid user cost. Similarly, the 264 system policy at a given time may prohibit the use of such a metered 265 network interface from the application's process. Lastly, the 266 implementation itself may default to disallowing certain network 267 interfaces unless explicitly requested by the application and allowed 268 by the system. 270 It is expected that the database of system policies and the method of 271 looking up these policies will vary across various platforms. An 272 implementation should attempt to look up the relevant policies for 273 the system in a dynamic way to make sure it is reflecting an accurate 274 version of the system policy, since the system's policy regarding the 275 application's traffic may change over time due to user or 276 administrative changes. 278 4. Implementing Connection Establishment 280 The process of establishing a network connection begins when an 281 application expresses intent to communicate with a remote endpoint by 282 calling Initiate. (At this point, any constraints or requirements 283 the application may have on the connection are available from pre- 284 establishment.) The process can be considered complete once there is 285 at least one Protocol Stack that has completed any required setup to 286 the point that it can transmit and receive the application's data. 288 Connection establishment is divided into two top-level steps: 289 Candidate Gathering, to identify the paths, protocols, and endpoints 290 to use, and Candidate Racing, in which the necessary protocol 291 handshakes are conducted so that the transport system can select 292 which set to use. This document structures candidates for racing as 293 a tree. 295 The most simple example of this process might involve identifying the 296 single IP address to which the implementation wishes to connect, 297 using the system's current default interface or path, and starting a 298 TCP handshake to establish a stream to the specified IP address. 299 However, each step may also vary depending on the requirements of the 300 connection: if the endpoint is defined as a hostname and port, then 301 there may be multiple resolved addresses that are available; there 302 may also be multiple interfaces or paths available, other than the 303 default system interface; and some protocols may not need any 304 transport handshake to be considered "established" (such as UDP), 305 while other connections may utilize layered protocol handshakes, such 306 as TLS over TCP. 308 Whenever an implementation has multiple options for connection 309 establishment, it can view the set of all individual connection 310 establishment options as a single, aggregate connection 311 establishment. The aggregate set conceptually includes every valid 312 combination of endpoints, paths, and protocols. As an example, 313 consider an implementation that initiates a TCP connection to a 314 hostname + port endpoint, and has two valid interfaces available (Wi- 315 Fi and LTE). The hostname resolves to a single IPv4 address on the 316 Wi-Fi network, and resolves to the same IPv4 address on the LTE 317 network, as well as a single IPv6 address. The aggregate set of 318 connection establishment options can be viewed as follows: 320 Aggregate [Endpoint: www.example.com:80] [Interface: Any] [Protocol: TCP] 321 |-> [Endpoint: 192.0.2.1:80] [Interface: Wi-Fi] [Protocol: TCP] 322 |-> [Endpoint: 192.0.2.1:80] [Interface: LTE] [Protocol: TCP] 323 |-> [Endpoint: 2001:DB8::1.80] [Interface: LTE] [Protocol: TCP] 324 Any one of these sub-entries on the aggregate connection attempt 325 would satisfy the original application intent. The concern of this 326 section is the algorithm defining which of these options to try, 327 when, and in what order. 329 During Candidate Gathering, an implementation first excludes all 330 protocols and paths that match a Prohibit or do not match all Require 331 properties. Then, the implementation will sort branches according to 332 Preferred properties, Avoided properties, and possibly other 333 criteria. 335 4.1. Candidate Gathering 337 The step of gathering candidates involves identifying which paths, 338 protocols, and endpoints may be used for a given Connection. This 339 list is determined by the requirements, prohibitions, and preferences 340 of the application as specified in the Selection Properties. 342 4.1.1. Gathering Endpoint Candidates 344 Both Local and Remote Endpoint Candidates must be discovered during 345 connection establishment. To support ICE, or similar protocols, that 346 involve out-of-band indirect signalling to exchange candidates with 347 the Remote Endpoint, it's important to be able to query the set of 348 candidate Local Endpoints, and give the protocol stack a set of 349 candidate Remote Endpoints, before it attempts to establish 350 connections. 352 4.1.1.1. Local Endpoint candidates 354 The set of possible Local Endpoints is gathered. In the simple case, 355 this merely enumerates the local interfaces and protocols, allocates 356 ephemeral source ports. For example, a system that has WiFi and 357 Ethernet and supports IPv4 and IPv6 might gather four candidate 358 locals (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 on WiFi, and IPv6 on 359 WiFi) that can form the source for a transient. 361 If NAT traversal is required, the process of gathering Local 362 Endpoints becomes broadly equivalent to the ICE candidate gathering 363 phase [RFC5245]. The endpoint determines its server reflexive Local 364 Endpoints (i.e., the translated address of a local, on the other side 365 of a NAT) and relayed locals (e.g., via a TURN server or other 366 relay), for each interface and network protocol. These are added to 367 the set of candidate Local Endpoints for this connection. 369 Gathering Local Endpoints is primarily a local operation, although it 370 might involve exchanges with a STUN server to derive server reflexive 371 locals, or with a TURN server or other relay to derive relayed 372 locals. It does not involve communication with the Remote Endpoint. 374 4.1.1.2. Remote Endpoint Candidates 376 The Remote Endpoint is typically a name that needs to be resolved 377 into a set of possible addresses that can be used for communication. 378 Resolving the Remote Endpoint is the process of recursively 379 performing such name lookups, until fully resolved, to return the set 380 of candidates for the remote of this connection. 382 How this is done will depend on the type of the Remote Endpoint, and 383 can also be specific to each Local Endpoint. A common case is when 384 the Remote Endpoint is a DNS name, in which case it is resolved to 385 give a set of IPv4 and IPv6 addresses representing that name. Some 386 types of remote might require more complex resolution. Resolving the 387 Remote Endpoint for a peer-to-peer connection might involve 388 communication with a rendezvous server, which in turn contacts the 389 peer to gain consent to communicate and retrieve its set of candidate 390 locals, which are returned and form the candidate remote addresses 391 for contacting that peer. 393 Resolving the remote is not a local operation. It will involve a 394 directory service, and can require communication with the remote to 395 rendezvous and exchange peer addresses. This can expose some or all 396 of the candidate locals to the remote. 398 4.1.2. Structuring Options as a Tree 400 When an implementation responsible for connection establishment needs 401 to consider multiple options, it should logically structure these 402 options as a hierarchical tree. Each leaf node of the tree 403 represents a single, coherent connection attempt, with an Endpoint, a 404 Path, and a set of protocols that can directly negotiate and send 405 data on the network. Each node in the tree that is not a leaf 406 represents a connection attempt that is either underspecified, or 407 else includes multiple distinct options. For example. when 408 connecting on an IP network, a connection attempt to a hostname and 409 port is underspecified, because the connection attempt requires a 410 resolved IP address as its remote endpoint. In this case, the node 411 represented by the connection attempt to the hostname is a parent 412 node, with child nodes for each IP address. Similarly, an 413 implementation that is allowed to connect using multiple interfaces 414 will have a parent node of the tree for the decision between the 415 paths, with a branch for each interface. 417 The example aggregate connection attempt above can be drawn as a tree 418 by grouping the addresses resolved on the same interface into 419 branches: 421 || 422 +==========================+ 423 | www.example.com:80/Any | 424 +==========================+ 425 // \\ 426 +==========================+ +==========================+ 427 | www.example.com:80/Wi-Fi | | www.example.com:80/LTE | 428 +==========================+ +==========================+ 429 || // \\ 430 +====================+ +====================+ +======================+ 431 | 192.0.2.1:80/Wi-Fi | | 192.0.2.1:80/LTE | | 2001:DB8::1.80/LTE | 432 +====================+ +====================+ +======================+ 434 The rest of this section will use a notation scheme to represent this 435 tree. The parent (or trunk) node of the tree will be represented by 436 a single integer, such as "1". Each child of that node will have an 437 integer that identifies it, from 1 to the number of children. That 438 child node will be uniquely identified by concatenating its integer 439 to it's parents identifier with a dot in between, such as "1.1" and 440 "1.2". Each node will be summarized by a tuple of three elements: 441 Endpoint, Path, and Protocol. The above example can now be written 442 more succinctly as: 444 1 [www.example.com:80, Any, TCP] 445 1.1 [www.example.com:80, Wi-Fi, TCP] 446 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 447 1.2 [www.example.com:80, LTE, TCP] 448 1.2.1 [192.0.2.1:80, LTE, TCP] 449 1.2.2 [2001:DB8::1.80, LTE, TCP] 451 When an implementation views this aggregate set of connection 452 attempts as a single connection establishment, it only will use one 453 of the leaf nodes to transfer data. Thus, when a single leaf node 454 becomes ready to use, then the entire connection attempt is ready to 455 use by the application. Another way to represent this is that every 456 leaf node updates the state of its parent node when it becomes ready, 457 until the trunk node of the tree is ready, which then notifies the 458 application that the connection as a whole is ready to use. 460 A connection establishment tree may be degenerate, and only have a 461 single leaf node, such as a connection attempt to an IP address over 462 a single interface with a single protocol. 464 1 [192.0.2.1:80, Wi-Fi, TCP] 465 A parent node may also only have one child (or leaf) node, such as a 466 when a hostname resolves to only a single IP address. 468 1 [www.example.com:80, Wi-Fi, TCP] 469 1.1 [192.0.2.1:80, Wi-Fi, TCP] 471 4.1.3. Branch Types 473 There are three types of branching from a parent node into one or 474 more child nodes. Any parent node of the tree must only use one type 475 of branching. 477 4.1.3.1. Derived Endpoints 479 If a connection originally targets a single endpoint, there may be 480 multiple endpoints of different types that can be derived from the 481 original. The connection library should order the derived endpoints 482 according to application preference, system policy and expected 483 performance. 485 DNS hostname-to-address resolution is the most common method of 486 endpoint derivation. When trying to connect to a hostname endpoint 487 on a traditional IP network, the implementation should send DNS 488 queries for both A (IPv4) and AAAA (IPv6) records if both are 489 supported on the local link. The algorithm for ordering and racing 490 these addresses should follow the recommendations in Happy Eyeballs 491 [RFC8305]. 493 1 [www.example.com:80, Wi-Fi, TCP] 494 1.1 [2001:DB8::1.80, Wi-Fi, TCP] 495 1.2 [192.0.2.1:80, Wi-Fi, TCP] 496 1.3 [2001:DB8::2.80, Wi-Fi, TCP] 497 1.4 [2001:DB8::3.80, Wi-Fi, TCP] 499 DNS-Based Service Discovery can also provide an endpoint derivation 500 step. When trying to connect to a named service, the client may 501 discover one or more hostname and port pairs on the local network 502 using multicast DNS. These hostnames should each be treated as a 503 branch which can be attempted independently from other hostnames. 504 Each of these hostnames may also resolve to one or more addresses, 505 thus creating multiple layers of branching. 507 1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP] 508 1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP] 509 1.1.1 [31.133.160.18.631, Wi-Fi, TCP] 511 4.1.3.2. Alternate Paths 513 If a client has multiple network interfaces available to it, such as 514 mobile client with both Wi-Fi and Cellular connectivity, it can 515 attempt a connection over either interface. This represents a branch 516 point in the connection establishment. Like with derived endpoints, 517 the interfaces should be ranked based on preference, system policy, 518 and performance. Attempts should be started on one interface, and 519 then on other interfaces successively after delays based on expected 520 round-trip-time or other available metrics. 522 1 [192.0.2.1:80, Any, TCP] 523 1.1 [192.0.2.1:80, Wi-Fi, TCP] 524 1.2 [192.0.2.1:80, LTE, TCP] 526 This same approach applies to any situation in which the client is 527 aware of multiple links or views of the network. Multiple Paths, 528 each with a coherent set of addresses, routes, DNS server, and more, 529 may share a single interface. A path may also represent a virtual 530 interface service such as a Virtual Private Network (VPN). 532 The list of available paths should be constrained by any requirements 533 or prohibitions the application sets, as well as system policy. 535 4.1.3.3. Protocol Options 537 Differences in possible protocol compositions and options can also 538 provide a branching point in connection establishment. This allows 539 clients to be resilient to situations in which a certain protocol is 540 not functioning on a server or network. 542 This approach is commonly used for connections with optional proxy 543 server configurations. A single connection may be allowed to use an 544 HTTP-based proxy, a SOCKS-based proxy, or connect directly. These 545 options should be ranked and attempted in succession. 547 1 [www.example.com:80, Any, HTTP/TCP] 548 1.1 [192.0.2.8:80, Any, HTTP/HTTP Proxy/TCP] 549 1.2 [192.0.2.7:10234, Any, HTTP/SOCKS/TCP] 550 1.3 [www.example.com:80, Any, HTTP/TCP] 551 1.3.1 [192.0.2.1:80, Any, HTTP/TCP] 553 This approach also allows a client to attempt different sets of 554 application and transport protocols that may provide preferable 555 characteristics when available. For example, the protocol options 556 could involve QUIC [I-D.ietf-quic-transport] over UDP on one branch, 557 and HTTP/2 [RFC7540] over TLS over TCP on the other: 559 1 [www.example.com:443, Any, Any HTTP] 560 1.1 [www.example.com:443, Any, QUIC/UDP] 561 1.1.1 [192.0.2.1:443, Any, QUIC/UDP] 562 1.2 [www.example.com:443, Any, HTTP2/TLS/TCP] 563 1.2.1 [192.0.2.1:443, Any, HTTP2/TLS/TCP] 565 Another example is racing SCTP with TCP: 567 1 [www.example.com:80, Any, Any Stream] 568 1.1 [www.example.com:80, Any, SCTP] 569 1.1.1 [192.0.2.1:80, Any, SCTP] 570 1.2 [www.example.com:80, Any, TCP] 571 1.2.1 [192.0.2.1:80, Any, TCP] 573 Implementations that support racing protocols and protocol options 574 should maintain a history of which protocols and protocol options 575 successfully established, on a per-network basis (see Section 9.2). 576 This information can influence future racing decisions to prioritize 577 or prune branches. 579 4.2. Branching Order-of-Operations 581 Branch types must occur in a specific order relative to one another 582 to avoid creating leaf nodes with invalid or incompatible settings. 583 In the example above, it would be invalid to branch for derived 584 endpoints (the DNS results for www.example.com) before branching 585 between interface paths, since usable DNS results on one network may 586 not necessarily be the same as DNS results on another network due to 587 local network entities, supported address families, or enterprise 588 network configurations. Implementations must be careful to branch in 589 an order that results in usable leaf nodes whenever there are 590 multiple branch types that could be used from a single node. 592 The order of operations for branching, where lower numbers are acted 593 upon first, should be: 595 1. Alternate Paths 597 2. Protocol Options 599 3. Derived Endpoints 601 Branching between paths is the first in the list because results 602 across multiple interfaces are likely not related to one another: 603 endpoint resolution may return different results, especially when 604 using locally resolved host and service names, and which protocols 605 are supported and preferred may differ across interfaces. Thus, if 606 multiple paths are attempted, the overall connection can be seen as a 607 race between the available paths or interfaces. 609 Protocol options are checked next in order. Whether or not a set of 610 protocol, or protocol-specific options, can successfully connect is 611 generally not dependent on which specific IP address is used. 612 Furthermore, the protocol stacks being attempted may influence or 613 altogether change the endpoints being used. Adding a proxy to a 614 connection's branch will change the endpoint to the proxy's IP 615 address or hostname. Choosing an alternate protocol may also modify 616 the ports that should be selected. 618 Branching for derived endpoints is the final step, and may have 619 multiple layers of derivation or resolution, such as DNS service 620 resolution and DNS hostname resolution. 622 For example, if the application has indicated both a preference for 623 WiFi over LTE and for a feature only available in SCTP, branches will 624 be first sorted accord to path selection, with WiFi at the top. 625 Then, branches with SCTP will be sorted to the top within their 626 subtree according to the properties influencing protocol selection. 627 However, if the implementation has cached the information that SCTP 628 is not available on the path over WiFi, there is no SCTP node in the 629 WiFi subtree. Here, the path over WiFi will be tried first, and, if 630 connection establishment succeeds, TCP will be used. So the 631 Selection Property of preferring WiFi takes precedence over the 632 Property that led to a preference for SCTP. 634 1. [www.example.com:80, Any, Any Stream] 635 1.1 [192.0.2.1:80, Wi-Fi, Any Stream] 636 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 637 1.2 [192.0.3.1:80, LTE, Any Stream] 638 1.2.1 [192.0.3.1:80, LTE, SCTP] 639 1.2.2 [192.0.3.1:80, LTE, TCP] 641 4.3. Sorting Branches 643 Implementations should sort the branches of the tree of connection 644 options in order of their preference rank. Leaf nodes on branches 645 with higher rankings represent connection attempts that will be raced 646 first. Implementations should order the branches to reflect the 647 preferences expressed by the application for its new connection, 648 including Selection Properties, which are specified in 649 [I-D.ietf-taps-interface]. 651 In addition to the properties provided by the application, an 652 implementation may include additional criteria such as cached 653 performance estimates, see Section 9.2, or system policy, see 654 Section 3.2, in the ranking. Two examples of how Selection and 655 Connection Properties may be used to sort branches are provided 656 below: 658 * "Interface Instance or Type": If the application specifies an 659 interface type to be preferred or avoided, implementations should 660 rank paths accordingly. If the application specifies an interface 661 type to be required or prohibited, we expect an implementation to 662 not include the non-conforming paths into the three. 664 * "Capacity Profile": An implementation may use the Capacity Profile 665 to prefer paths optimized for the application's expected traffic 666 pattern according to cached performance estimates, see 667 Section 9.2: 669 - Scavenger: Prefer paths with the highest expected available 670 bandwidth, based on observed maximum throughput 672 - Low Latency/Interactive: Prefer paths with the lowest expected 673 Round Trip Time 675 - Constant-Rate Streaming: Prefer paths that can satisfy the 676 requested Stream Send or Stream Receive Bitrate, based on 677 observed maximum throughput 679 Implementations should process properties in the following order: 680 Prohibit, Require, Prefer, Avoid. If Selection Properties contain 681 any prohibited properties, the implementation should first purge 682 branches containing nodes with these properties. For required 683 properties, it should only keep branches that satisfy these 684 requirements. Finally, it should order branches according to 685 preferred properties, and finally use avoided properties as a 686 tiebreaker. When ordering branches, an implementation may give more 687 weight to properties that the application has explicitly set than to 688 properties that are default. 690 As the available protocols and paths on a specific system and in a 691 specific context may vary, the result of sorting and the outcome of 692 racing may vary even given the same Selection and Connection 693 Properties. However, an implementation ought to aim to provide a 694 consistent outcome to applications, e.g., by preferring protocols and 695 paths that existing Connections with similar Properties are already 696 using. 698 4.4. Candidate Racing 700 The primary goal of the Candidate Racing process is to successfully 701 negotiate a protocol stack to an endpoint over an interface--to 702 connect a single leaf node of the tree--with as little delay and as 703 few unnecessary connections attempts as possible. Optimizing these 704 two factors improves the user experience, while minimizing network 705 load. 707 This section covers the dynamic aspect of connection establishment. 708 While the tree described above is a useful conceptual and 709 architectural model, an implementation does not know what the full 710 tree may become up front, nor will many of the possible branches be 711 used in the common case. 713 There are three different approaches to racing the attempts for 714 different nodes of the connection establishment tree: 716 1. Immediate 718 2. Delayed 720 3. Failover 722 Each approach is appropriate in different use-cases and branch types. 723 However, to avoid consuming unnecessary network resources, 724 implementations should not use immediate racing as a default 725 approach. 727 The timing algorithms for racing should remain independent across 728 branches of the tree. Any timers or racing logic is isolated to a 729 given parent node, and is not ordered precisely with regards to other 730 children of other nodes. 732 4.4.1. Delayed 734 Delayed racing can be used whenever a single node of the tree has 735 multiple child nodes. Based on the order determined when building 736 the tree, the first child node will be initiated immediately, 737 followed by the next child node after some delay. Once that second 738 child node is initiated, the third child node (if present) will begin 739 after another delay, and so on until all child nodes have been 740 initiated, or one of the child nodes successfully completes its 741 negotiation. 743 Delayed racing attempts occur in parallel. Implementations should 744 not terminate an earlier child connection attempt upon starting a 745 secondary child. 747 The delay between starting child nodes should be based on the 748 properties of the previously started child node. For example, if the 749 first child represents an IP address with a known route, and the 750 second child represents another IP address, the delay between 751 starting the first and second IP addresses can be based on the 752 expected retransmission cadence for the first child's connection 753 (derived from historical round-trip-time). Alternatively, if the 754 first child represents a branch on a Wi-Fi interface, and the second 755 child represents a branch on an LTE interface, the delay should be 756 based on the expected time in which the branch for the first 757 interface would be able to establish a connection, based on link 758 quality and historical round-trip-time. 760 Any delay should have a defined minimum and maximum value based on 761 the branch type. Generally, branches between paths and protocols 762 should have longer delays than branches between derived endpoints. 763 The maximum delay should be considered with regards to how long a 764 user is expected to wait for the connection to complete. 766 If a child node fails to connect before the delay timer has fired for 767 the next child, the next child should be started immediately. 769 4.4.2. Failover 771 If an implementation or application has a strong preference for one 772 branch over another, the branching node may choose to wait until one 773 child has failed before starting the next. Failure of a leaf node is 774 determined by its protocol negotiation failing or timing out; failure 775 of a parent branching node is determined by all of its children 776 failing. 778 An example in which failover is recommended is a race between a 779 protocol stack that uses a proxy and a protocol stack that bypasses 780 the proxy. Failover is useful in case the proxy is down or 781 misconfigured, but any more aggressive type of racing may end up 782 unnecessarily avoiding a proxy that was preferred by policy. 784 4.5. Completing Establishment 786 The process of connection establishment completes when one leaf node 787 of the tree has completed negotiation with the remote endpoint 788 successfully, or else all nodes of the tree have failed to connect. 789 The first leaf node to complete its connection is then used by the 790 application to send and receive data. 792 It is useful to process success and failure throughout the tree by 793 child nodes reporting to their parent nodes (towards the trunk of the 794 tree). For example, in the following case, if 1.1.1 fails to 795 connect, it reports the failure to 1.1. Since 1.1 has no other child 796 nodes, it also has failed and reports that failure to 1. Because 1.2 797 has not yet failed, 1 is not considered to have failed. Since 1.2 798 has not yet started, it is started and the process continues. 799 Similarly, if 1.1.1 successfully connects, then it marks 1.1 as 800 connected, which propagates to the trunk node 1. At this point, the 801 connection as a whole is considered to be successfully connected and 802 ready to process application data 804 1 [www.example.com:80, Any, TCP] 805 1.1 [www.example.com:80, Wi-Fi, TCP] 806 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 807 1.2 [www.example.com:80, LTE, TCP] 808 ... 810 If a leaf node has successfully completed its connection, all other 811 attempts should be made ineligible for use by the application for the 812 original request. New connection attempts that involve transmitting 813 data on the network should not be started after another leaf node has 814 completed successfully, as the connection as a whole has been 815 established. An implementation may choose to let certain handshakes 816 and negotiations complete in order to gather metrics to influence 817 future connections. Similarly, an implementation may choose to hold 818 onto fully established leaf nodes that were not the first to 819 establish for use as part of a Pooled Connection, see Section 7.1, or 820 in future connections. In both cases, keeping additional connections 821 is generally not recommended since those attempts were slower to 822 connect and may exhibit less desirable properties. 824 4.5.1. Determining Successful Establishment 826 Implementations may select the criteria by which a leaf node is 827 considered to be successfully connected differently on a per-protocol 828 basis. If the only protocol being used is a transport protocol with 829 a clear handshake, like TCP, then the obvious choice is to declare 830 that node "connected" when the last packet of the three-way handshake 831 has been received. If the only protocol being used is an 832 "unconnected" protocol, like UDP, the implementation may consider the 833 node fully "connected" the moment it determines a route is present, 834 before sending any packets on the network, see further Section 4.7. 836 For protocol stacks with multiple handshakes, the decision becomes 837 more nuanced. If the protocol stack involves both TLS and TCP, an 838 implementation could determine that a leaf node is connected after 839 the TCP handshake is complete, or it can wait for the TLS handshake 840 to complete as well. The benefit of declaring completion when the 841 TCP handshake finishes, and thus stopping the race for other branches 842 of the tree, is that there will be less burden on the network from 843 other connection attempts. On the other hand, by waiting until the 844 TLS handshake is complete, an implementation avoids the scenario in 845 which a TCP handshake completes quickly, but TLS negotiation is 846 either very slow or fails altogether in particular network conditions 847 or to a particular endpoint. To avoid the issue of TLS possibly 848 failing, the implementation should not generate a Ready event for the 849 Connection until TLS is established. 851 If all of the leaf nodes fail to connect during racing, i.e. none of 852 the configurations that satisfy all requirements given in the 853 Transport Parameters actually work over the available paths, then the 854 transport system should notify the application with an InitiateError 855 event. An InitiateError event should also be generated in case the 856 transport system finds no usable candidates to race. 858 4.6. Establishing multiplexed connections 860 Multiplexing several Connections over a single underlying transport 861 connection requires that the Connections to be multiplexed belong to 862 the same Connection Group (as is indicated by the application using 863 the Clone call). When the underlying transport connection supports 864 multi-streaming, the Transport System can map each Connection in the 865 Connection Group to a different stream. Thus, when the Connections 866 that are offered to an application by the Transport System are 867 multiplexed, the Transport System may implement the establishment of 868 a new Connection by simply beginning to use a new stream of an 869 already established transport connection and there is no need for a 870 connection establishment procedure. This, then, also means that 871 there may not be any "establishment" message (like a TCP SYN), but 872 the application can simply start sending or receiving. Therefore, 873 when the Initiate action of a Transport System is called without 874 Messages being handed over, it cannot be guaranteed that the other 875 endpoint will have any way to know about this, and hence a passive 876 endpoint's ConnectionReceived event may not be called upon an active 877 endpoint's Inititate. Instead, calling the ConnectionReceived event 878 may be delayed until the first Message arrives. 880 4.7. Handling racing with "unconnected" protocols 882 While protocols that use an explicit handshake to validate a 883 Connection to a peer can be used for racing multiple establishment 884 attempts in parallel, "unconnected" protocols such as raw UDP do not 885 offer a way to validate the presence of a peer or the usability of a 886 Connection without application feedback. An implementation should 887 consider such a protocol stack to be established as soon as a local 888 route to the peer endpoint is confirmed. 890 However, if a peer is not reachable over the network using the 891 unconnected protocol, or data cannot be exchanged for any other 892 reason, the application may want to attempt using another candidate 893 Protocol Stack. The implementation should maintain the list of other 894 candidate Protocol Stacks that were eligible to use. In the case 895 that the application signals that the initial Protocol Stack is 896 failing for some reason and that another option should be attempted, 897 the Connection can be updated to point to the next candidate Protocol 898 Stack. This can be viewed as an application-driven form of Protocol 899 Stack racing. 901 4.8. Implementing listeners 903 When an implementation is asked to Listen, it registers with the 904 system to wait for incoming traffic to the Local Endpoint. If no 905 Local Endpoint is specified, the implementation should either use an 906 ephemeral port or generate an error. 908 If the Selection Properties do not require a single network interface 909 or path, but allow the use of multiple paths, the Listener object 910 should register for incoming traffic on all of the network interfaces 911 or paths that conform to the Properties. The set of available paths 912 can change over time, so the implementation should monitor network 913 path changes and register and de-register the Listener across all 914 usable paths. When using multiple paths, the Listener is generally 915 expected to use the same port for listening on each. 917 If the Selection Properties allow multiple protocols to be used for 918 listening, and the implementation supports it, the Listener object 919 should register across the eligble protocols for each path. This 920 means that inbound Connections delivered by the implementation may 921 have heterogeneous protocol stacks. 923 4.8.1. Implementing listeners for Connected Protocols 925 Connected protocols such as TCP and TLS-over-TCP have a strong 926 mapping between the Local and Remote Endpoints (five-tuple) and their 927 protocol connection state. These map well into Connection objects. 928 Whenever a new inbound handshake is being started, the Listener 929 should generate a new Connection object and pass it to the 930 application. 932 4.8.2. Implementing listeners for Unconnected Protocols 934 Unconnected protocols such as UDP and UDP-lite generally do not 935 provide the same mechanisms that connected protocols do to offer 936 Connection objects. Implementations should wait for incoming packets 937 for unconnected protocols on a listening port and should perform 938 five-tuple matching of packets to either existing Connection objects 939 or the creation of new Connection objects. On platforms with 940 facilities to create a "virtual connection" for unconnected protocols 941 implementations should use these mechanisms to minimise the handling 942 of datagrams intended for already created Connection objects. 944 4.8.3. Implementing listeners for Multiplexed Protocols 946 Protocols that provide multiplexing of streams into a single five- 947 tuple can listen both for entirely new connections (a new HTTP/2 948 stream on a new TCP connection, for example) and for new sub- 949 connections (a new HTTP/2 stream on an existing connection). If the 950 abstraction of Connection presented to the application is mapped to 951 the multiplexed stream, then the Listener should deliver new 952 Connection objects in the same way for either case. The 953 implementation should allow the application to introspect the 954 Connection Group marked on the Connections to determine the grouping 955 of the multiplexing. 957 5. Implementing Sending and Receiving Data 959 The most basic mapping for sending a Message is an abstraction of 960 datagrams, in which the transport protocol naturally deals in 961 discrete packets. Each Message here corresponds to a single 962 datagram. Generally, these will be short enough that sending and 963 receiving will always use a complete Message. 965 For protocols that expose byte-streams, the only delineation provided 966 by the protocol is the end of the stream in a given direction. Each 967 Message in this case corresponds to the entire stream of bytes in a 968 direction. These Messages may be quite long, in which case they can 969 be sent in multiple parts. 971 Protocols that provide the framing (such as length-value protocols, 972 or protocols that use delimiters) provide data boundaries that may be 973 longer than a traditional packet datagram. Each Message for framing 974 protocols corresponds to a single frame, which may be sent either as 975 a complete Message, or in multiple parts. 977 5.1. Sending Messages 979 The effect of the application sending a Message is determined by the 980 top-level protocol in the established Protocol Stack. That is, if 981 the top-level protocol provides an abstraction of framed messages 982 over a connection, the receiving application will be able to obtain 983 multiple Messages on that connection, even if the framing protocol is 984 built on a byte-stream protocol like TCP. 986 5.1.1. Message Properties 988 * Lifetime: this should be implemented by removing the Message from 989 its queue of pending Messages after the Lifetime has expired. A 990 queue of pending Messages within the transport system 991 implementation that have yet to be handed to the Protocol Stack 992 can always support this property, but once a Message has been sent 993 into the send buffer of a protocol, only certain protocols may 994 support de-queueing a message. For example, TCP cannot remove 995 bytes from its send buffer, while in case of SCTP, such control 996 over the SCTP send buffer can be exercised using the partial 997 reliability extension [RFC8303]. When there is no standing queue 998 of Messages within the system, and the Protocol Stack does not 999 support removing a Message from its buffer, this property may be 1000 ignored. 1002 * Priority: this represents the ability to prioritize a Message over 1003 other Messages. This can be implemented by the system re-ordering 1004 Messages that have yet to be handed to the Protocol Stack, or by 1005 giving relative priority hints to protocols that support 1006 priorities per Message. For example, an implementation of HTTP/2 1007 could choose to send Messages of different Priority on streams of 1008 different priority. 1010 * Ordered: when this is false, it disables the requirement of in- 1011 order-delivery for protocols that support configurable ordering. 1013 * Idempotent: when this is true, it means that the Message can be 1014 used by mechanisms that might transfer it multiple times - e.g., 1015 as a result of racing multiple transports or as part of TCP Fast 1016 Open. 1018 * Final: when this is true, it means that a transport connection can 1019 be closed immediately after its transmission. 1021 * Corruption Protection Length: when this is set to any value other 1022 than -1, it limits the required checksum in protocols that allow 1023 limiting the checksum length (e.g. UDP-Lite). 1025 * Transmission Profile: TBD - because it's not final in the API yet. 1026 Old text follows: when this is set to "Interactive/Low Latency", 1027 the Message should be sent immediately, even when this comes at 1028 the cost of using the network capacity less efficiently. For 1029 example, small messages can sometimes be bundled to fit into a 1030 single data packet for the sake of reducing header overhead; such 1031 bundling should not be used. For example, in case of TCP, the 1032 Nagle algorithm should be disabled when Interactive/Low Latency is 1033 selected as the capacity profile. Scavenger/Bulk can translate 1034 into usage of a congestion control mechanism such as LEDBAT, and/ 1035 or the capacity profile can lead to a choice of a DSCP value as 1036 described in [I-D.ietf-taps-minset]). 1038 * Singular Transmission: when this is true, the application requests 1039 to avoid transport-layer segmentation or network-layer 1040 fragmentation. Some transports implement network-layer 1041 fragmentation avoidance (Path MTU Discovery) without exposing this 1042 functionality to the application; in this case, only transport- 1043 layer segmentation should be avoided, by fitting the message into 1044 a single transport-layer segment or otherwise failing. Otherwise, 1045 network-layer fragmentation should be avoided--e.g. by requesting 1046 the IP Don't Fragment bit to be set in case of UDP(-Lite) and IPv4 1047 (SET_DF in [RFC8304]). 1049 5.1.2. Send Completion 1051 The application should be notified whenever a Message or partial 1052 Message has been consumed by the Protocol Stack, or has failed to 1053 send. The meaning of the Message being consumed by the stack may 1054 vary depending on the protocol. For a basic datagram protocol like 1055 UDP, this may correspond to the time when the packet is sent into the 1056 interface driver. For a protocol that buffers data in queues, like 1057 TCP, this may correspond to when the data has entered the send 1058 buffer. 1060 5.1.3. Batching Sends 1062 Since sending a Message may involve a context switch between the 1063 application and the transport system, sending patterns that involve 1064 multiple small Messages can incur high overhead if each needs to be 1065 enqueued separately. To avoid this, the application should have a 1066 way to indicate a batch of Send actions, during which time the 1067 implementation will hold off on processing Messages until the batch 1068 is complete. This can also help context switches when enqueuing data 1069 in the interface driver if the operation can be batched. 1071 5.2. Receiving Messages 1073 Similar to sending, Receiving a Message is determined by the top- 1074 level protocol in the established Protocol Stack. The main 1075 difference with Receiving is that the size and boundaries of the 1076 Message are not known beforehand. The application can communicate in 1077 its Receive action the parameters for the Message, which can help the 1078 implementation know how much data to deliver and when. For example, 1079 if the application only wants to receive a complete Message, the 1080 implementation should wait until an entire Message (datagram, stream, 1081 or frame) is read before delivering any Message content to the 1082 application. This requires the implementation to understand where 1083 messages end, either via a supplied deframer or because the top-level 1084 protocol in the established Protocol Stack preserves message 1085 boundaries; if, on the other hand, the top-level protocol only 1086 supports a byte-stream and no deframers were supported, the 1087 application must specify the minimum number of bytes of Message 1088 content it wants to receive (which may be just a single byte) to 1089 control the flow of received data. 1091 If a Connection becomes finished before a requested Receive action 1092 can be satisfied, the implementation should deliver any partial 1093 Message content outstanding, or if none is available, an indication 1094 that there will be no more received Messages. 1096 5.3. Handling of data for fast-open protocols 1098 Several protocols allow sending higher-level protocol or application 1099 data within the first packet of their protocol establishment, such as 1100 TCP Fast Open [RFC7413] and TLS 1.3 [RFC8446]. This approach is 1101 referred to as sending Zero-RTT (0-RTT) data. This is a desirable 1102 property, but poses challenges to an implementation that uses racing 1103 during connection establishment. 1105 If the application has 0-RTT data to send in any protocol handshakes, 1106 it needs to provide this data before the handshakes have begun. When 1107 racing, this means that the data should be provided before the 1108 process of connection establishment has begun. If the application 1109 wants to send 0-RTT data, it must indicate this to the implementation 1110 by setting the Idempotent send parameter to true when sending the 1111 data. In general, 0-RTT data may be replayed (for example, if a TCP 1112 SYN contains data, and the SYN is retransmitted, the data will be 1113 retransmitted as well), but racing means that different leaf nodes 1114 have the opportunity to send the same data independently. If data is 1115 truly idempotent, this should be permissible. 1117 Once the application has provided its 0-RTT data, an implementation 1118 should keep a copy of this data and provide it to each new leaf node 1119 that is started and for which a 0-RTT protocol is being used. 1121 It is also possible that protocol stacks within a particular leaf 1122 node use 0-RTT handshakes without any idempotent application data. 1123 For example, TCP Fast Open could use a Client Hello from TLS as its 1124 0-RTT data, shortening the cumulative handshake time. 1126 0-RTT handshakes often rely on previous state, such as TCP Fast Open 1127 cookies, previously established TLS tickets, or out-of-band 1128 distributed pre-shared keys (PSKs). Implementations should be aware 1129 of security concerns around using these tokens across multiple 1130 addresses or paths when racing. In the case of TLS, any given ticket 1131 or PSK should only be used on one leaf node. If implementations have 1132 multiple tickets available from a previous connection, each leaf node 1133 attempt must use a different ticket. In effect, each leaf node will 1134 send the same early application data, yet encoded (encrypted) 1135 differently on the wire. 1137 6. Implementing Message Framers 1139 Message Framers are pieces of code that define simple transformations 1140 between application Message data and raw transport protocol data. A 1141 Framer can encapsulate or encode outbound Messages, and decapsulate 1142 or decode inbound data into Messages. 1144 While many protocols can be represented as Message Framers, for the 1145 purposes of the Transport Services interface these are ways for 1146 applications or application frameworks to define their own Message 1147 parsing to be included within a Connection's Protocol Stack. As an 1148 example, TLS can serve the purpose of framing data over TCP, but is 1149 exposed as a protocol natively supported by the Transport Services 1150 interface. 1152 Most Message Framers fall into one of two categories: 1154 * Header-prefixed record formats, such as a basic Type-Length-Value 1155 (TLV) structure 1157 * Delimiter-separated formats, such as HTTP/1.1. 1159 Common Message Framers can be provided by the Transport Services 1160 implementation, but an implementation ought to allow custom Message 1161 Framers to be defined by the application or some other piece of 1162 software. This section describes one possible interface for defining 1163 Message Framers as an example. 1165 6.1. Defining Message Framers 1167 A Message Framer is primarily defined by the set of code that handles 1168 events for a framer implementation, specifically how it handles 1169 inbound and outbound data parsing. The piece of code that implements 1170 custom framing logic will be referred to as the "framer 1171 implementation", which may be provided by the Transport Services 1172 implementation or the application itself. The Message Framer refers 1173 to the object or piece of code within the main Connection 1174 implementation that delivers events to the custom framer 1175 implementation whenever data is ready to be parsed or framed. 1177 When a Connection establishment attempt begins, an event can be 1178 delivered to notify the framer implementation that a new Connection 1179 is being created. Similarly, a stop event can be delivered when a 1180 Connection is being torn down. The framer implementation can use the 1181 Connection object to look up specific properties of the Connection or 1182 the network being used that may influence how to frame Messages. 1184 MessageFramer -> Start(Connection) 1185 MessageFramer -> Stop(Connection) 1187 When a Message Framer generates a "Start" event, the framer 1188 implementation has the opportunity to start writing some data prior 1189 to the Connection delivering its "Ready" event. This allows the 1190 implementation to communicate control data to the remote endpoint 1191 that can be used to parse Messages. 1193 MessageFramer.MakeConnectionReady(Connection) 1195 Similarly, when a Message Framer generates a "Stop" event, the framer 1196 implementation has the opportunity to write some final data or clear 1197 up its local state before the "Closed" event is delivered to the 1198 Application. The framer implementation can indicate that it has 1199 finished with this. 1201 MessageFramer.MakeConnectionClosed(Connection) 1203 At any time if the implementation encounters a fatal error, it can 1204 also cause the Connection to fail and provide an error. 1206 MessageFramer.FailConnection(Connection, Error) 1208 Should the framer implementation deem the candidate selected during 1209 racing unsuitable it can signal this by failing the Connection prior 1210 to marking it as ready. If there are no other candidates available, 1211 the Connection will fail. Otherwise, the Connection will select a 1212 different candidate and the Message Framer will generate a new 1213 "Start" event. 1215 Before an implementation marks a Message Framer as ready, it can also 1216 dynamically add a protocol or framer above it in the stack. This 1217 allows protocols like STARTTLS, that need to add TLS conditionally, 1218 to modify the Protocol Stack based on a handshake result. 1220 otherFramer := NewMessageFramer() 1221 MessageFramer.PrependFramer(Connection, otherFramer) 1223 6.2. Sender-side Message Framing 1225 Message Framers generate an event whenever a Connection sends a new 1226 Message. 1228 MessageFramer -> NewSentMessage 1230 Upon receiving this event, a framer implementation is responsible for 1231 performing any necessary transformations and sending the resulting 1232 data back to the Message Framer, which will in turn send it to the 1233 next protocol. Implementations SHOULD ensure that there is a way to 1234 pass the original data through without copying to improve 1235 performance. 1237 MessageFramer.Send(Connection, Data) 1239 To provide an example, a simple protocol that adds a length as a 1240 header would receive the "NewSentMessage" event, create a data 1241 representation of the length of the Message data, and then send a 1242 block of data that is the concatenation of the length header and the 1243 original Message data. 1245 6.3. Receiver-side Message Framing 1247 In order to parse a received flow of data into Messages, the Message 1248 Framer notifies the framer implementation whenever new data is 1249 available to parse. 1251 MessageFramer -> HandleReceivedData 1253 Upon receiving this event, the framer implementation can inspect the 1254 inbound data. The data is parsed from a particular cursor 1255 representing the unprocessed data. The application requests a 1256 specific amount of data it needs to have available in order to parse. 1257 If the data is not available, the parse fails. 1259 MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage) 1260 The framer implementation can directly advance the receive cursor 1261 once it has parsed data to effectively discard data (for example, 1262 discard a header once the content has been parsed). 1264 To deliver a Message to the application, the framer implementation 1265 can either directly deliver data that it has allocated, or deliver a 1266 range of data directly from the underlying transport and 1267 simultaneously advance the receive cursor. 1269 MessageFramer.AdvanceReceiveCursor(Connection, Length) 1270 MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) 1271 MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) 1273 Note that "MessageFramer.DeliverAndAdvanceReceiveCursor" allows the 1274 framer implementation to earmark bytes as part of a Message even 1275 before they are received by the transport. This allows the delivery 1276 of very large Messages without requiring the implementation to 1277 directly inspect all of the bytes. 1279 To provide an example, a simple protocol that parses a length as a 1280 header value would receive the "HandleReceivedData" event, and call 1281 "Parse" with a minimum and maximum set to the length of the header 1282 field. Once the parse succeeded, it would call 1283 "AdvanceReceiveCursor" with the length of the header field, and then 1284 call "DeliverAndAdvanceReceiveCursor" with the length of the body 1285 that was parsed from the header, marking the new Message as complete. 1287 7. Implementing Connection Management 1289 Once a Connection is established, the Transport Services system 1290 allows applications to interact with the Connection by modifying or 1291 inspecting Connection Properties. A Connection can also generate 1292 events in the form of Soft Errors. 1294 The set of Connection Properties that are supported for setting and 1295 getting on a Connection are described in [I-D.ietf-taps-interface]. 1296 For any properties that are generic, and thus could apply to all 1297 protocols being used by a Connection, the Transport System should 1298 store the properties in a generic storage, and notify all protocol 1299 instances in the Protocol Stack whenever the properties have been 1300 modified by the application. For protocol-specfic properties, such 1301 as the User Timeout that applies to TCP, the Transport System only 1302 needs to update the relevant protocol instance. 1304 If an error is encountered in setting a property (for example, if the 1305 application tries to set a TCP-specific property on a Connection that 1306 is not using TCP), the action should fail gracefully. The 1307 application may be informed of the error, but the Connection itself 1308 should not be terminated. 1310 The Transport Services implementation should allow protocol instances 1311 in the Protocol Stack to pass up arbitrary generic or protocol- 1312 specific errors that can be delivered to the application as Soft 1313 Errors. These allow the application to be informed of ICMP errors, 1314 and other similar events. 1316 7.1. Pooled Connection 1318 For protocols that employ request/response pairs and do not require 1319 in-order delivery of the responses, like HTTP, the transport 1320 implementation may distribute interactions across several underlying 1321 transport connections. For these kinds of protocols, implementations 1322 may hide the connection management and only expose a single 1323 Connection object and the individual requests/responses as messages. 1324 These Pooled Connections can use multiple connections or multiple 1325 streams of multi-streaming connections between endpoints, as long as 1326 all of these satisfy the requirements, and prohibitions specified in 1327 the Selection Properties of the Pooled Connection. This enables 1328 implementations to realize transparent connection coalescing, 1329 connection migration, and to perform per-message endpoint and path 1330 selection by choosing among these underlying connections. 1332 7.2. Handling Path Changes 1334 When a path change occurs, the Transport Services implementation is 1335 responsible for notifying Protocol Instances in the Protocol Stack. 1336 If the Protocol Stack includes a transport protocol that supports 1337 multipath connectivity, an update to the available paths should 1338 inform the Protocol Instance of the new set of paths that are 1339 permissible based on the Selection Properties passed by the 1340 application. A multipath protocol can establish new subflows over 1341 new paths, and should tear down subflows over paths that are no 1342 longer available. Pooled Connections Section 7.1 may add or remove 1343 underlying transport connections in a similar manner. If the 1344 Protocol Stack includes a transport protocol that does not support 1345 multipath, but support migrating between paths, the update to 1346 available paths can be used as the trigger to migrating the 1347 connection. For protocols that do not support multipath or 1348 migration, the Protocol Instances may be informed of the path change, 1349 but should not be forcibly disconnected if the previously used path 1350 becomes unavailable. An exception to this case is if the System 1351 Policy changes to prohibit traffic from the Connection based on its 1352 properties, in which case the Protocol Stack should be disconnected. 1354 8. Implementing Connection Termination 1356 With TCP, when an application closes a connection, this means that it 1357 has no more data to send (but expects all data that has been handed 1358 over to be reliably delivered). However, with TCP only, "close" does 1359 not mean that the application will stop receiving data. This is 1360 related to TCP's ability to support half-closed connections. 1362 SCTP is an example of a protocol that does not support such half- 1363 closed connections. Hence, with SCTP, the meaning of "close" is 1364 stricter: an application has no more data to send (but expects all 1365 data that has been handed over to be reliably delivered), and will 1366 also not receive any more data. 1368 Implementing a protocol independent transport system means that the 1369 exposed semantics must be the strictest subset of the semantics of 1370 all supported protocols. Hence, as is common with all reliable 1371 transport protocols, after a Close action, the application can expect 1372 to have its reliability requirements honored regarding the data it 1373 has given to the Transport System, but it cannot expect to be able to 1374 read any more data after calling Close. 1376 Abort differs from Close only in that no guarantees are given 1377 regarding data that the application has handed over to the Transport 1378 System before calling Abort. 1380 As explained in Section 4.6, when a new stream is multiplexed on an 1381 already existing connection of a Transport Protocol Instance, there 1382 is no need for a connection establishment procedure. Because the 1383 Connections that are offered by the Transport System can be 1384 implemented as streams that are multiplexed on a transport protocol's 1385 connection, it can therefore not be guaranteed that one Endpoint's 1386 Initiate action provokes a ConnectionReceived event at its peer. 1388 For Close (provoking a Finished event) and Abort (provoking a 1389 ConnectionError event), the same logic applies: while it is desirable 1390 to be informed when a peer closes or aborts a Connection, whether 1391 this is possible depends on the underlying protocol, and no 1392 guarantees can be given. With SCTP, the transport system can use the 1393 stream reset procedure to cause a Finish event upon a Close action 1394 from the peer [NEAT-flow-mapping]. 1396 9. Cached State 1398 Beyond a single Connection's lifetime, it is useful for an 1399 implementation to keep state and history. This cached state can help 1400 improve future Connection establishment due to re-using results and 1401 credentials, and favoring paths and protocols that performed well in 1402 the past. 1404 Cached state may be associated with different Endpoints for the same 1405 Connection, depending on the protocol generating the cached content. 1406 For example, session tickets for TLS are associated with specific 1407 endpoints, and thus should be cached based on a Connection's hostname 1408 Endpoint (if applicable). On the other hand, performance 1409 characteristics of a path are more likely tied to the IP address and 1410 subnet being used. 1412 9.1. Protocol state caches 1414 Some protocols will have long-term state to be cached in association 1415 with Endpoints. This state often has some time after which it is 1416 expired, so the implementation should allow each protocol to specify 1417 an expiration for cached content. 1419 Examples of cached protocol state include: 1421 * The DNS protocol can cache resolution answers (A and AAAA queries, 1422 for example), associated with a Time To Live (TTL) to be used for 1423 future hostname resolutions without requiring asking the DNS 1424 resolver again. 1426 * TLS caches session state and tickets based on a hostname, which 1427 can be used for resuming sessions with a server. 1429 * TCP can cache cookies for use in TCP Fast Open. 1431 Cached protocol state is primarily used during Connection 1432 establishment for a single Protocol Stack, but may be used to 1433 influence an implementation's preference between several candidate 1434 Protocol Stacks. For example, if two IP address Endpoints are 1435 otherwise equally preferred, an implementation may choose to attempt 1436 a connection to an address for which it has a TCP Fast Open cookie. 1438 Applications must have a way to flush protocol cache state if 1439 desired. This may be necessary, for example, if application-layer 1440 identifiers rotate and clients wish to avoid linkability via 1441 trackable TLS tickets or TFO cookies. 1443 9.2. Performance caches 1445 In addition to protocol state, Protocol Instances should provide data 1446 into a performance-oriented cache to help guide future protocol and 1447 path selection. Some performance information can be gathered 1448 generically across several protocols to allow predictive comparisons 1449 between protocols on given paths: 1451 * Observed Round Trip Time 1453 * Connection Establishment latency 1455 * Connection Establishment success rate 1457 These items can be cached on a per-address and per-subnet 1458 granularity, and averaged between different values. The information 1459 should be cached on a per-network basis, since it is expected that 1460 different network attachments will have different performance 1461 characteristics. Besides Protocol Instances, other system entities 1462 may also provide data into performance-oriented caches. This could 1463 for instance be signal strength information reported by radio modems 1464 like Wi-Fi and mobile broadband or information about the battery- 1465 level of the device. Furthermore, the system may cache the observed 1466 maximum throughput on a path as an estimate of the available 1467 bandwidth. 1469 An implementation should use this information, when possible, to 1470 determine preference between candidate paths, endpoints, and protocol 1471 options. Eligible options that historically had significantly better 1472 performance than others should be selected first when gathering 1473 candidates (see Section 4.1) to ensure better performance for the 1474 application. 1476 The reasonable lifetime for cached performance values will vary 1477 depending on the nature of the value. Certain information, like the 1478 connection establishment success rate to a Remote Endpoint using a 1479 given protocol stack, can be stored for a long period of time (hours 1480 or longer), since it is expected that the capabilities of the Remote 1481 Endpoint are not changing very quickly. On the other hand, Round 1482 Trip Time observed by TCP over a particular network path may vary 1483 over a relatively short time interval. For such values, the 1484 implementation should remove them from the cache more quickly, or 1485 treat older values with less confidence/weight. 1487 10. Specific Transport Protocol Considerations 1489 Each protocol that can run as part of a Transport Services 1490 implementation defines both its API mapping as well as implementation 1491 details. API mappings for a protocol apply most to Connections in 1492 which the given protocol is the "top" of the Protocol Stack. For 1493 example, the mapping of the "Send" function for TCP applies to 1494 Connections in which the application directly sends over TCP. If 1495 HTTP/2 is used on top of TCP, the HTTP/2 mappings take precendence. 1497 Each protocol has a notion of Connectedness. Possible values for 1498 Connectedness are: 1500 * Unconnected. Unconnected protocols do not establish explicit 1501 state between endpoints, and do not perform a handshake during 1502 Connection establishment. 1504 * Connected. Connected protocols establish state between endpoints, 1505 and perform a handshake during Connection establishment. The 1506 handshake may be 0-RTT to send data or resume a session, but 1507 bidirectional traffic is required to confirm connectedness. 1509 * Multiplexing Connected. Multiplexing Connected protocols share 1510 properties with Connected protocols, but also explictly support 1511 opening multiple application-level flows. This means that they 1512 can support cloning new Connection objects without a new explicit 1513 handshake. 1515 Protocols also define a notion of Data Unit. Possible values for 1516 Data Unit are: 1518 * Byte-stream. Byte-stream protocols do not define any Message 1519 boundaries of their own apart from the end of a stream in each 1520 direction. 1522 * Datagram. Datagram protocols define Message boundaries at the 1523 same level of transmission, such that only complete (not partial) 1524 Messages are supported. 1526 * Message. Message protocols support Message boundaries that can be 1527 sent and received either as complete or partial Messages. Maximum 1528 Message lengths can be defined, and Messages can be partially 1529 reliable. 1531 Below, primitives in the style of 1532 "CATEGORY.[SUBCATEGORY].PRIMITIVENAME.PROTOCOL" (e.g., 1533 "CONNECT.SCTP") refer to the primitives with the same name in section 1534 4 of [RFC8303]. For further implementation details, the description 1535 of these primitives in [RFC8303] points to section 3, which refers 1536 back to the specifications for each protocol. This back-tracking 1537 method applies to all elements of [I-D.ietf-taps-minset] (see 1538 appendix D of [I-D.ietf-taps-interface]): they are listed in appendix 1539 A of [I-D.ietf-taps-minset] with an implementation hint in the same 1540 style, pointing back to section 4 of [RFC8303]. 1542 10.1. TCP 1544 Connectedness: Connected 1546 Data Unit: Byte-stream 1548 API mappings for TCP are as follows: 1550 Connection Object: TCP connections between two hosts map directly to 1551 Connection objects. 1553 Initiate: CONNECT.TCP. Calling "Initiate" on a TCP Connection 1554 causes it to reserve a local port, and send a SYN to the Remote 1555 Endpoint. 1557 InitiateWithSend: CONNECT.TCP with parameter "user message". Early 1558 idempotent data is sent on a TCP Connection in the SYN, as TCP 1559 Fast Open data. 1561 Ready: A TCP Connection is ready once the three-way handshake is 1562 complete. 1564 InitiateError: Failure of CONNECT.TCP. TCP can throw various errors 1565 during connection setup. Specifically, it is important to handle 1566 a RST being sent by the peer during the handshake. 1568 ConnectionError: Once established, TCP throws errors whenever the 1569 connection is disconnected, such as due to receiving a RST from 1570 the peer; or hitting a TCP retransmission timeout. 1572 Listen: LISTEN.TCP. Calling "Listen" for TCP binds a local port and 1573 prepares it to receive inbound SYN packets from peers. 1575 ConnectionReceived: TCP Listeners will deliver new connections once 1576 they have replied to an inbound SYN with a SYN-ACK. 1578 Clone: Calling "Clone" on a TCP Connection creates a new Connection 1579 with equivalent parameters. The two Connections are otherwise 1580 independent. 1582 Send: SEND.TCP. TCP does not on its own preserve Message 1583 boundaries. Calling "Send" on a TCP connection lays out the bytes 1584 on the TCP send stream without any other delineation. Any Message 1585 marked as Final will cause TCP to send a FIN once the Message has 1586 been completely written, by calling CLOSE.TCP immediately upon 1587 successful termination of SEND.TCP. 1589 Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without 1590 any Message delineation. All data delivered in the "Received" or 1591 "ReceivedPartial" event will be part of a single stream-wide 1592 Message that is marked Final (unless a Message Framer is used). 1593 EndOfMessage will be delivered when the TCP Connection has 1594 received a FIN (CLOSE-EVENT.TCP or ABORT-EVENT.TCP) from the peer. 1596 Close: Calling "Close" on a TCP Connection indicates that the 1597 Connection should be gracefully closed (CLOSE.TCP) by sending a 1598 FIN to the peer and waiting for a FIN-ACK before delivering the 1599 "Closed" event. 1601 Abort: Calling "Abort" on a TCP Connection indicates that the 1602 Connection should be immediately closed by sending a RST to the 1603 peer (ABORT.TCP). 1605 10.2. UDP 1607 Connectedness: Unconnected 1609 Data Unit: Datagram 1611 API mappings for UDP are as follows: 1613 Connection Object: UDP connections represent a pair of specific IP 1614 addresses and ports on two hosts. 1616 Initiate: CONNECT.UDP. Calling "Initiate" on a UDP Connection 1617 causes it to reserve a local port, but does not generate any 1618 traffic. 1620 InitiateWithSend: Early data on a UDP Connection does not have any 1621 special meaning. The data is sent whenever the Connection is 1622 Ready. 1624 Ready: A UDP Connection is ready once the system has reserved a 1625 local port and has a path to send to the Remote Endpoint. 1627 InitiateError: UDP Connections can only generate errors on 1628 initiation due to port conflicts on the local system. 1630 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1631 Lite)) upon receiving ICMP notifications indicating failures in 1632 the network. 1634 Listen: LISTEN.UDP. Calling "Listen" for UDP binds a local port and 1635 prepares it to receive inbound UDP datagrams from peers. 1637 ConnectionReceived: UDP Listeners will deliver new connections once 1638 they have received traffic from a new Remote Endpoint. 1640 Clone: Calling "Clone" on a UDP Connection creates a new Connection 1641 with equivalent parameters. The two Connections are otherwise 1642 independent. 1644 Send: SEND.UDP(-Lite). Calling "Send" on a UDP connection sends the 1645 data as the payload of a complete UDP datagram. Marking Messages 1646 as Final does not change anything in the datagram's contents. 1647 Upon sending a UDP datagram, some relevant fields and flags in the 1648 IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in 1649 IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)). 1651 Receive: RECEIVE.UDP(-Lite). UDP only delivers complete Messages to 1652 "Received", each of which represents a single datagram received in 1653 a UDP packet. Upon receiving a UDP datagram, the ECN flag from 1654 the IP header can be obtained (GET_ECN.UDP(-Lite)). 1656 Close: Calling "Close" on a UDP Connection (ABORT.UDP(-Lite)) 1657 releases the local port reservation. 1659 Abort: Calling "Abort" on a UDP Connection (ABORT.UDP(-Lite)) is 1660 identical to calling "Close". 1662 10.3. UDP Multicast Receive 1664 Connectedness: Unconnected 1666 Data Unit: Datagram 1668 API mappings for Receiving Multicast UDP are as follows: 1670 Connection Object: Established UDP Multicast Receive connections 1671 represent a pair of specific IP addresses and ports. The 1672 "unidirectional receive" transport property is required, and the 1673 local endpoint must be configured with a group IP address and a 1674 port. 1676 Initiate: Calling "Initiate" on a UDP Multicast Receive Connection 1677 causes an immediate InitiateError. This is an unsupported 1678 operation. 1680 InitiateWithSend: Calling "InitiateWithSend" on a UDP Multicast 1681 Receive Connection causes an immediate InitiateError. This is an 1682 unsupported operation. 1684 Ready: A UDP Multicast Receive Connection is ready once the system 1685 has received traffic for the appropriate group and port. 1687 InitiateError: UDP Multicast Receive Connections generate an 1688 InitiateError if Initiate is called. 1690 ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- 1691 Lite)) upon receiving ICMP notifications indicating failures in 1692 the network. 1694 Listen: LISTEN.UDP. Calling "Listen" for UDP Multicast Receive 1695 binds a local port, prepares it to receive inbound UDP datagrams 1696 from peers, and issues a multicast host join. If a remote 1697 endpoint with an address is supplied, the join is Source-specific 1698 Multicast, and the path selection is based on the route to the 1699 remote endpoint. If a remote endpoint is not supplied, the join 1700 is Any-source Multicast, and the path selection is based on the 1701 outbound route to the group supplied in the local endpoint. 1703 ConnectionReceived: UDP Multicast Receive Listeners will deliver new 1704 connections once they have received traffic from a new Remote 1705 Endpoint. 1707 Clone: Calling "Clone" on a UDP Multicast Receive Connection creates 1708 a new Connection with equivalent parameters. The two Connections 1709 are otherwise independent. 1711 Send: SEND.UDP(-Lite). Calling "Send" on a UDP Multicast Receive 1712 connection causes an immediate SendError. This is an unsupported 1713 operation. 1715 Receive: RECEIVE.UDP(-Lite). The Receive operation in a UDP 1716 Multicast Receive connection only delivers complete Messages to 1717 "Received", each of which represents a single datagram received in 1718 a UDP packet. Upon receiving a UDP datagram, the ECN flag from 1719 the IP header can be obtained (GET_ECN.UDP(-Lite)). 1721 Close: Calling "Close" on a UDP Multicast Receive Connection 1722 (ABORT.UDP(-Lite)) releases the local port reservation and leaves 1723 the group. 1725 Abort: Calling "Abort" on a UDP Multicast Receive Connection 1726 (ABORT.UDP(-Lite)) is identical to calling "Close". 1728 10.4. TLS 1730 The mapping of a TLS stream abstraction into the application is 1731 equivalent to the contract provided by TCP (see Section 10.1), and 1732 builds upon many of the actions of TCP connections. 1734 Connectedness: Connected 1736 Data Unit: Byte-stream 1738 Connection Object: Connection objects represent a single TLS 1739 connection running over a TCP connection between two hosts. 1741 Initiate: Calling "Initiate" on a TLS Connection causes it to first 1742 initiate a TCP connection. Once the TCP protocol is Ready, the 1743 TLS handshake will be performed as a client (starting by sending a 1744 "client_hello", and so on). 1746 InitiateWithSend: Early idempotent data is supported by TLS 1.3, and 1747 sends encrypted application data in the first TLS message when 1748 performing session resumption. For older versions of TLS, or if a 1749 session is not being resumed, the initial data will be delayed 1750 until the TLS handshake is complete. TCP Fast Option can also be 1751 enabled automatically. 1753 Ready: A TLS Connection is ready once the underlying TCP connection 1754 is Ready, and TLS handshake is also complete and keys have been 1755 established to encrypt application data. 1757 InitiateError: In addition to TCP initiation errors, TLS can 1758 generate errors during its handshake. Examples of error include a 1759 failure of the peer to successfully authenticate, the peer 1760 rejecting the local authentication, or a failure to match versions 1761 or algorithms. 1763 ConnectionError: TLS connections will generate TCP errors, or errors 1764 due to failures to rekey or decrypt received messages. 1766 Listen: Calling "Listen" for TLS listens on TCP, and sets up 1767 received connections to perform server-side TLS handshakes. 1769 ConnectionReceived: TLS Listeners will deliver new connections once 1770 they have successfully completed both TCP and TLS handshakes. 1772 Clone: As with TCP, calling "Clone" on a TLS Connection creates a 1773 new Connection with equivalent parameters. The two Connections 1774 are otherwise independent. 1776 Send: Like TCP, TLS does not preserve message boundaries. Although 1777 application data is framed natively in TLS, there is not a general 1778 guarantee that these TLS messages represent semantically 1779 meaningful application stream boundaries. Rather, sending data on 1780 a TLS Connection only guarantees that the application data will be 1781 transmitted in an encrypted form. Marking Messages as Final 1782 causes a "close_notify" to be generated once the data has been 1783 written. 1785 Receive: Like TCP, TLS delivers a stream of bytes without any 1786 Message delineation. The data is decrypted prior to being 1787 delivered to the application. If a "close_notify" is received, 1788 the stream-wide Message will be delivered with EndOfMessage set. 1790 Close: Calling "Close" on a TLS Connection indicates that the 1791 Connection should be gracefully closed by sending a "close_notify" 1792 to the peer and waiting for a corresponding "close_notify" before 1793 delivering the "Closed" event. 1795 Abort: Calling "Abort" on a TCP Connection indicates that the 1796 Connection should be immediately closed by sending a 1797 "close_notify", optionally preceded by "user_canceled", to the 1798 peer. Implementations do not need to wait to receive 1799 "close_notify" before delivering the "Closed" event. 1801 10.5. DTLS 1803 DTLS follows the same behavior as TLS (Section 10.4), with the 1804 notable exception of not inheriting behavior directly from TCP. 1805 Differences from TLS are detailed below, and all cases not explicitly 1806 mentioned should be considered the same as TLS. 1808 Connectedness: Connected 1810 Data Unit: Datagram 1812 Connection Object: Connection objects represent a single DTLS 1813 connection running over a set of UDP ports between two hosts. 1815 Initiate: Calling "Initiate" on a DTLS Connection causes it reserve 1816 a UDP local port, and begin sending handshake messages to the peer 1817 over UDP. These messages are reliable, and will be automatically 1818 retransmitted. 1820 Ready: A DTLS Connection is ready once the TLS handshake is complete 1821 and keys have been established to encrypt application data. 1823 Send: Sending over DTLS does preserve message boundaries in the same 1824 way that UDP datagrams do. Marking a Message as Final does send a 1825 "close_notify" like TLS. 1827 Receive: Receiving over DTLS delivers one decrypted Message for each 1828 received DTLS datagram. If a "close_notify" is received, a 1829 Message will be delivered that is marked as Final. 1831 10.6. HTTP 1833 HTTP requests and responses map naturally into Messages, since they 1834 are delineated chunks of data with metadata that can be sent over a 1835 transport. To that end, HTTP can be seen as the most prevalent 1836 framing protocol that runs on top of streams like TCP, TLS, etc. 1838 In order to use a transport Connection that provides HTTP Message 1839 support, the establishment and closing of the connection can be 1840 treated as it would without the framing protocol. Sending and 1841 receiving of Messages, however, changes to treat each Message as a 1842 well-delineated HTTP request or response, with the content of the 1843 Message representing the body, and the Headers being provided in 1844 Message metadata. 1846 Connectedness: Multiplexing Connected 1848 Data Unit: Message 1850 Connection Object: Connection objects represent a flow of HTTP 1851 messages between a client and a server, which may be an HTTP/1.1 1852 connection over TCP, or a single stream in an HTTP/2 connection. 1854 Initiate: Calling "Initiate" on an HTTP connection intiates a TCP or 1855 TLS connection as a client. 1857 Clone: Calling "Clone" on an HTTP Connection opens a new stream on 1858 an existing HTTP/2 connection when possible. If the underlying 1859 version does not support multiplexed streams, calling "Clone" 1860 simply creates a new parallel connection. 1862 Send: When an application sends an HTTP Message, it is expected to 1863 provide HTTP header values as a MessageContext in a canonical 1864 form, along with any associated HTTP message body as the Message 1865 data. The HTTP header values are encoded in the specific version 1866 format upon sending. 1868 Receive: HTTP Connections deliver Messages in which HTTP header 1869 values attached to MessageContexts, and HTTP bodies in Message 1870 data. 1872 Close: Calling "Close" on an HTTP Connection will only close the 1873 underlying TLS or TCP connection if the HTTP version does not 1874 support multiplexing. For HTTP/2, for example, closing the 1875 connection only closes a specific stream. 1877 10.7. QUIC 1879 QUIC provides a multi-streaming interface to an encrypted transport. 1880 Each stream can be viewed as equivalent to a TLS stream over TCP, so 1881 a natural mapping is to present each QUIC stream as an individual 1882 Connection. The protocol for the stream will be considered Ready 1883 whenever the underlying QUIC connection is established to the point 1884 that this stream's data can be sent. For streams after the first 1885 stream, this will likely be an immediate operation. 1887 Closing a single QUIC stream, presented to the application as a 1888 Connection, does not imply closing the underlying QUIC connection 1889 itself. Rather, the implementation may choose to close the QUIC 1890 connection once all streams have been closed (often after some 1891 timeout), or after an individual stream Connection sends an Abort. 1893 Connectedness: Multiplexing Connected 1895 Data Unit: Stream 1897 Connection Object: Connection objects represent a single QUIC stream 1898 on a QUIC connection. 1900 10.8. HTTP/2 transport 1902 Similar to QUIC (Section 10.7), HTTP/2 provides a multi-streaming 1903 interface. This will generally use HTTP as the unit of Messages over 1904 the streams, in which each stream can be represented as a transport 1905 Connection. The lifetime of streams and the HTTP/2 connection should 1906 be managed as described for QUIC. 1908 It is possible to treat each HTTP/2 stream as a raw byte-stream 1909 instead of a carrier for HTTP messages, in which case the Messages 1910 over the streams can be represented similarly to the TCP stream (one 1911 Message per direction, see Section 10.1). 1913 Connectedness: Multiplexing Connected 1915 Data Unit: Stream 1916 Connection Object: Connection objects represent a single HTTP/2 1917 stream on a HTTP/2 connection. 1919 10.9. SCTP 1921 Connectedness: Connected 1923 Data Unit: Message 1925 API mappings for SCTP are as follows: 1927 Connection Object: Connection objects represent a flow of SCTP 1928 messages between a client and a server, which may be an SCTP 1929 association or a stream in a SCTP association. How to map 1930 Connection objects to streams is described in [NEAT-flow-mapping]; 1931 in the following, a similar method is described. To map 1932 Connection objects to SCTP streams without head-of-line blocking 1933 on the sender side, both the sending and receiving SCTP 1934 implementation must support message interleaving [RFC8260]. Both 1935 SCTP implementations must also support stream reconfiguration. 1936 Finally, both communicating endpoints must be aware of this 1937 intended multiplexing; [NEAT-flow-mapping] describes a way for a 1938 Transport System to negotiate the stream mapping capability using 1939 SCTP's adaptation layer indication, such that this functionality 1940 would only take effect if both ends sides are aware of it. The 1941 first flow, for which the SCTP association has been created, will 1942 always use stream id zero. All additional flows are assigned to 1943 unused stream ids in growing order. To avoid a conflict when both 1944 endpoints map new flows simultaneously, the peer which initiated 1945 the transport connection will use even stream numbers whereas the 1946 remote side will map its flows to odd stream numbers. Both sides 1947 maintain a status map of the assigned stream numbers. Generally, 1948 new streams must consume the lowest available (even or odd, 1949 depending on the side) stream number; this rule is relevant when 1950 lower numbers become available because Connection objects 1951 associated to the streams are closed. 1953 Initiate: If this is the only Connection object that is assigned to 1954 the SCTP association or stream mapping has not been negotiated, 1955 CONNECT.SCTP is called. Else, a new stream is used: if there are 1956 enough streams available, "Initiate" is just a local operation 1957 that assigns a new stream number to the Connection object. The 1958 number of streams is negotiated as a parameter of the prior 1959 CONNECT.SCTP call, and it represents a trade-off between local 1960 resource usage and the number of Connection objects that can be 1961 mapped without requiring a reconfiguration signal. When running 1962 out of streams, ADD_STREAM.SCTP must be called. 1964 InitiateWithSend: If this is the only Connection object that is 1965 assigned to the SCTP association or stream mapping has not been 1966 negotiated, CONNECT.SCTP is called with the "user message" 1967 parameter. Else, a new stream is used (see "Initiate" for how to 1968 handle running out of streams), and this just sends the first 1969 message on a new stream. 1971 Ready: "Initiate" or "InitiateWithSend" returns without an error, 1972 i.e. SCTP's four-way handshake has completed. If an association 1973 with the peer already exists, and stream mapping has been 1974 negotiated and enough streams are available, a Connection Object 1975 instantly becomes Ready after calling "Initiate" or 1976 "InitiateWithSend". 1978 InitiateError: Failure of CONNECT.SCTP. 1980 ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. 1982 Listen: LISTEN.SCTP. If an association with the peer already exists 1983 and stream mapping has been negotiated, "Listen" just expects to 1984 receive a new message on a new stream id (chosen in accordance 1985 with the stream number assignment procedure described above). 1987 ConnectionReceived: LISTEN.SCTP returns without an error (a result 1988 of successful CONNECT.SCTP from the peer), or, in case of stream 1989 mapping, the first message has arrived on a new stream (in this 1990 case, "Receive" is also invoked). 1992 Clone: Calling "Clone" on an SCTP association creates a new 1993 Connection object and assigns it a new stream number in accordance 1994 with the stream number assignment procedure described above. If 1995 there are not enough streams available, ADD_STREAM.SCTP must be 1996 called. 1998 Priority (Connection): When this value is changed, or a Message with 1999 Message Property "Priority" is sent, and there are multiple 2000 Connection objects assigned to the same SCTP association, 2001 CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities 2002 of streams in the SCTP association. 2004 Send: SEND.SCTP. Message Properties such as "Lifetime" and 2005 "Ordered" map to parameters of this primitive. 2007 Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a 2008 "ReceivedPartial" event. 2010 Close: If this is the only Connection object that is assigned to the 2011 SCTP association, CLOSE.SCTP is called. Else, the Connection object 2012 is one out of several Connection objects that are assigned to the 2013 same SCTP assocation, and RESET_STREAM.SCTP must be called, which 2014 informs the peer that the stream will no longer be used for mapping 2015 and can be used by future "Initiate", "InitiateWithSend" or "Listen" 2016 calls. At the peer, the event RESET_STREAM-EVENT.SCTP will fire, 2017 which the peer must answer by issuing RESET_STREAM.SCTP too. The 2018 resulting local RESET_STREAM-EVENT.SCTP informs the transport system 2019 that the stream number can now be re-used by the next "Initiate", 2020 "InitiateWithSend" or "Listen" calls. 2022 Abort: If this is the only Connection object that is assigned to the 2023 SCTP association, ABORT.SCTP is called. Else, the Connection object 2024 is one out of several Connection objects that are assigned to the 2025 same SCTP assocation, and shutdown proceeds as described under 2026 "Close". 2028 11. IANA Considerations 2030 RFC-EDITOR: Please remove this section before publication. 2032 This document has no actions for IANA. 2034 12. Security Considerations 2036 12.1. Considerations for Candidate Gathering 2038 Implementations should avoid downgrade attacks that allow network 2039 interference to cause the implementation to select less secure, or 2040 entirely insecure, combinations of paths and protocols. 2042 12.2. Considerations for Candidate Racing 2044 See Section 5.3 for security considerations around racing with 0-RTT 2045 data. 2047 An attacker that knows a particular device is racing several options 2048 during connection establishment may be able to block packets for the 2049 first connection attempt, thus inducing the device to fall back to a 2050 secondary attempt. This is a problem if the secondary attempts have 2051 worse security properties that enable further attacks. 2052 Implementations should ensure that all options have equivalent 2053 security properties to avoid incentivizing attacks. 2055 Since results from the network can determine how a connection attempt 2056 tree is built, such as when DNS returns a list of resolved endpoints, 2057 it is possible for the network to cause an implementation to consume 2058 significant on-device resources. Implementations should limit the 2059 maximum amount of state allowed for any given node, including the 2060 number of child nodes, especially when the state is based on results 2061 from the network. 2063 13. Acknowledgements 2065 This work has received funding from the European Union's Horizon 2020 2066 research and innovation programme under grant agreement No. 644334 2067 (NEAT). 2069 This work has been supported by Leibniz Prize project funds of DFG - 2070 German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ 2071 FE 570/4-1). 2073 This work has been supported by the UK Engineering and Physical 2074 Sciences Research Council under grant EP/R04144X/1. 2076 This work has been supported by the Research Council of Norway under 2077 its "Toppforsk" programme through the "OCARINA" project. 2079 Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric 2080 Kinnear for their implementation and design efforts, including Happy 2081 Eyeballs, that heavily influenced this work. 2083 14. References 2085 14.1. Normative References 2087 [I-D.ietf-taps-arch] 2088 Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., 2089 Perkins, C., Tiesel, P., and C. Wood, "An Architecture for 2090 Transport Services", Work in Progress, Internet-Draft, 2091 draft-ietf-taps-arch-06, 23 December 2019, 2092 . 2095 [I-D.ietf-taps-interface] 2096 Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., 2097 Kuehlewind, M., Perkins, C., Tiesel, P., Wood, C., and T. 2098 Pauly, "An Abstract Application Layer Interface to 2099 Transport Services", Work in Progress, Internet-Draft, 2100 draft-ietf-taps-interface-05, 4 November 2019, 2101 . 2104 [I-D.ietf-taps-minset] 2105 Welzl, M. and S. Gjessing, "A Minimal Set of Transport 2106 Services for End Systems", Work in Progress, Internet- 2107 Draft, draft-ietf-taps-minset-11, 27 September 2018, 2108 . 2111 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 2112 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 2113 . 2115 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 2116 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 2117 DOI 10.17487/RFC7540, May 2015, 2118 . 2120 [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, 2121 "Stream Schedulers and User Message Interleaving for the 2122 Stream Control Transmission Protocol", RFC 8260, 2123 DOI 10.17487/RFC8260, November 2017, 2124 . 2126 [RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of 2127 Transport Features Provided by IETF Transport Protocols", 2128 RFC 8303, DOI 10.17487/RFC8303, February 2018, 2129 . 2131 [RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the 2132 User Datagram Protocol (UDP) and Lightweight UDP (UDP- 2133 Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018, 2134 . 2136 [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: 2137 Better Connectivity Using Concurrency", RFC 8305, 2138 DOI 10.17487/RFC8305, December 2017, 2139 . 2141 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 2142 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 2143 . 2145 14.2. Informative References 2147 [I-D.ietf-quic-transport] 2148 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 2149 and Secure Transport", Work in Progress, Internet-Draft, 2150 draft-ietf-quic-transport-27, 21 February 2020, 2151 . 2154 [NEAT-flow-mapping] 2155 "Transparent Flow Mapping for NEAT (in Workshop on Future 2156 of Internet Transport (FIT 2017))", 2017. 2158 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 2159 (ICE): A Protocol for Network Address Translator (NAT) 2160 Traversal for Offer/Answer Protocols", RFC 5245, 2161 DOI 10.17487/RFC5245, April 2010, 2162 . 2164 Appendix A. Additional Properties 2166 This appendix discusses implementation considerations for additional 2167 parameters and properties that could be used to enhance transport 2168 protocol and/or path selection, or the transmission of messages given 2169 a Protocol Stack that implements them. These are not part of the 2170 interface, and may be removed from the final document, but are 2171 presented here to support discussion within the TAPS working group as 2172 to whether they should be added to a future revision of the base 2173 specification. 2175 A.1. Properties Affecting Sorting of Branches 2177 In addition to the Protocol and Path Selection Properties discussed 2178 in Section 4.3, the following properties under discussion can 2179 influence branch sorting: 2181 * Bounds on Send or Receive Rate: If the application indicates a 2182 bound on the expected Send or Receive bitrate, an implementation 2183 may prefer a path that can likely provide the desired bandwidth, 2184 based on cached maximum throughput, see Section 9.2. The 2185 application may know the Send or Receive Bitrate from metadata in 2186 adaptive HTTP streaming, such as MPEG-DASH. 2188 * Cost Preferences: If the application indicates a preference to 2189 avoid expensive paths, and some paths are associated with a 2190 monetary cost, an implementation should decrease the ranking of 2191 such paths. If the application indicates that it prohibits using 2192 expensive paths, paths that are associated with a cost should be 2193 purged from the decision tree. 2195 Appendix B. Reasons for errors 2197 The Transport Services API [I-D.ietf-taps-interface] allows for the 2198 several generic error types to specify a more detailed reason as to 2199 why an error occurred. This appendix lists some of the possible 2200 reasons. 2202 * InvalidConfiguration: The transport properties and endpoints 2203 provided by the application are either contradictory or 2204 incomplete. Examples include the lack of a remote endpoint on an 2205 active open or using a multicast group address while not 2206 requesting a unidirectional receive. 2208 * NoCandidates: The configuration is valid, but none of the 2209 available transport protocols can satisfy the transport properties 2210 provided by the application. 2212 * ResolutionFailed: The remote or local specifier provided by the 2213 application can not be resolved. 2215 * EstablishmentFailed: The TAPS system was unable to establish a 2216 transport-layer connection to the remote endpoint specified by the 2217 application. 2219 * PolicyProhibited: The system policy prevents the transport system 2220 from performing the action requested by the application. 2222 * NotCloneable: The protocol stack is not capable of being cloned. 2224 * MessageTooLarge: The message size is too big for the transport 2225 system to handle. 2227 * ProtocolFailed: The underlying protocol stack failed. 2229 * InvalidMessageProperties: The message properties are either 2230 contradictory to the transport properties or they can not be 2231 satisfied by the transport system. 2233 * DeframingFailed: The data that was received by the underlying 2234 protocol stack could not be deframed. 2236 * ConnectionAborted: The connection was aborted by the peer. 2238 * Timeout: Delivery of a message was not possible after a timeout. 2240 Appendix C. Existing Implementations 2242 This appendix gives an overview of existing implementations, at the 2243 time of writing, of transport systems that are (to some degree) in 2244 line with this document. 2246 * Apple's Network.framework: 2248 - Network.framework is a transport-level API built for C, 2249 Objective-C, and Swift. It a connect-by-name API that supports 2250 transport security protocols. It provides userspace 2251 implementations of TCP, UDP, TLS, DTLS, proxy protocols, and 2252 allows extension via custom framers. 2254 - Documentation: https://developer.apple.com/documentation/ 2255 network (https://developer.apple.com/documentation/network) 2257 * NEAT: 2259 - NEAT is the output of the European H2020 research project 2260 "NEAT"; it is a user-space library for protocol-independent 2261 communication on top of TCP, UDP and SCTP, with many more 2262 features such as a policy manager. 2264 - Code: https://github.com/NEAT-project/neat (https://github.com/ 2265 NEAT-project/neat) 2267 - NEAT project: https://www.neat-project.org (https://www.neat- 2268 project.org) 2270 * PyTAPS: 2272 - A TAPS implementation based on Python asyncio, offering 2273 protocol-independent communication to applications on top of 2274 TCP, UDP and TLS, with support for multicast. 2276 - Code: https://github.com/fg-inet/python-asyncio-taps 2277 (https://github.com/fg-inet/python-asyncio-taps) 2279 Authors' Addresses 2281 Anna Brunstrom (editor) 2282 Karlstad University 2283 Universitetsgatan 2 2284 SE- 651 88 Karlstad 2285 Sweden 2287 Email: anna.brunstrom@kau.se 2289 Tommy Pauly (editor) 2290 Apple Inc. 2291 One Apple Park Way 2292 Cupertino, California 95014, 2293 United States of America 2295 Email: tpauly@apple.com 2296 Theresa Enghardt 2297 TU Berlin 2298 Marchstrasse 23 2299 10587 Berlin 2300 Germany 2302 Email: theresa@inet.tu-berlin.de 2304 Karl-Johan Grinnemo 2305 Karlstad University 2306 Universitetsgatan 2 2307 SE- 651 88 Karlstad 2308 Sweden 2310 Email: karl-johan.grinnemo@kau.se 2312 Tom Jones 2313 University of Aberdeen 2314 Fraser Noble Building 2315 Aberdeen, AB24 3UE 2316 United Kingdom 2318 Email: tom@erg.abdn.ac.uk 2320 Philipp S. Tiesel 2321 TU Berlin 2322 Einsteinufer 25 2323 10587 Berlin 2324 Germany 2326 Email: philipp@tiesel.net 2328 Colin Perkins 2329 University of Glasgow 2330 School of Computing Science 2331 Glasgow G12 8QQ 2332 United Kingdom 2334 Email: csp@csperkins.org 2336 Michael Welzl 2337 University of Oslo 2338 PO Box 1080 Blindern 2339 0316 Oslo 2340 Norway 2342 Email: michawe@ifi.uio.no