idnits 2.17.1 draft-trammell-taps-post-sockets-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 11 instances of too long lines in the document, the longest one being 43 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 27, 2017) is 2366 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'Carrier' is mentioned on line 957, but not defined == Missing Reference: 'Association' is mentioned on line 918, but not defined == Missing Reference: 'Receiver' is mentioned on line 957, but not defined == Unused Reference: 'I-D.trammell-plus-abstract-mech' is defined on line 1305, but no explicit reference was found in the text == Unused Reference: 'I-D.trammell-plus-statefulness' is defined on line 1310, but no explicit reference was found in the text == Unused Reference: 'NEAT' is defined on line 1321, but no explicit reference was found in the text == Unused Reference: 'RFC6698' is defined on line 1344, but no explicit reference was found in the text == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-07 == Outdated reference: A later version (-28) exists of draft-ietf-tls-tls13-21 == Outdated reference: A later version (-03) exists of draft-kuehlewind-taps-crypto-sep-00 == Outdated reference: A later version (-02) exists of draft-pauly-taps-transport-security-00 == Outdated reference: A later version (-04) exists of draft-trammell-plus-statefulness-03 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) -- Obsolete informational reference (is this intentional?): RFC 6555 (Obsoleted by RFC 8305) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 3 errors (**), 0 flaws (~~), 13 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TAPS Working Group B. Trammell 3 Internet-Draft ETH Zurich 4 Intended status: Informational C. Perkins 5 Expires: April 30, 2018 University of Glasgow 6 T. Pauly 7 Apple Inc. 8 M. Kuehlewind 9 ETH Zurich 10 C. Wood 11 Apple Inc. 12 October 27, 2017 14 Post Sockets, An Abstract Programming Interface for the Transport Layer 15 draft-trammell-taps-post-sockets-03 17 Abstract 19 This document describes Post Sockets, an asynchronous abstract 20 programming interface for the atomic transmission of messages in an 21 inherently multipath environment. Post replaces connections with 22 long-lived associations between endpoints, with the possibility to 23 cache cryptographic state in order to reduce amortized connection 24 latency. We present this abstract interface as an illustration of 25 what is possible with present developments in transport protocols 26 when freed from the strictures of the current sockets API. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on April 30, 2018. 45 Copyright Notice 47 Copyright (c) 2017 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Abstractions and Terminology . . . . . . . . . . . . . . . . 5 64 2.1. Message Carrier . . . . . . . . . . . . . . . . . . . . . 6 65 2.2. Message . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 2.3. Association . . . . . . . . . . . . . . . . . . . . . . . 11 67 2.4. Remote . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 2.5. Local . . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 2.6. Configuration . . . . . . . . . . . . . . . . . . . . . . 12 70 2.7. Transient . . . . . . . . . . . . . . . . . . . . . . . . 13 71 2.8. Path . . . . . . . . . . . . . . . . . . . . . . . . . . 14 72 3. Abstract Programming Interface . . . . . . . . . . . . . . . 15 73 3.1. Example Connection Patterns . . . . . . . . . . . . . . . 16 74 3.1.1. Client-Server . . . . . . . . . . . . . . . . . . . . 16 75 3.1.2. Client-Server with Happy Eyeballs and 0-RTT 76 establishment . . . . . . . . . . . . . . . . . . . . 18 77 3.1.3. Peer to Peer with Network Address Translation . . . . 18 78 3.1.4. Multicast Receiver . . . . . . . . . . . . . . . . . 18 79 3.1.5. Association Bootstrapping . . . . . . . . . . . . . . 19 80 3.2. API Dynamics . . . . . . . . . . . . . . . . . . . . . . 20 81 4. Implementation Considerations . . . . . . . . . . . . . . . . 22 82 4.1. Protocol Stack Instance (PSI) . . . . . . . . . . . . . . 23 83 4.2. Message Framing, Parsing, and Serialization . . . . . . . 24 84 4.3. Message Size Limitations . . . . . . . . . . . . . . . . 25 85 4.4. Back-pressure . . . . . . . . . . . . . . . . . . . . . . 25 86 4.5. Associations, Transients, Racing, and Rendezvous . . . . 26 87 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 88 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 89 6.1. Normative References . . . . . . . . . . . . . . . . . . 28 90 6.2. Informative References . . . . . . . . . . . . . . . . . 28 91 Appendix A. Open Issues . . . . . . . . . . . . . . . . . . . . 30 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 94 1. Introduction 96 The BSD Unix Sockets API's SOCK_STREAM abstraction, by bringing 97 network sockets into the UNIX programming model, allowing anyone who 98 knew how to write programs that dealt with sequential-access files to 99 also write network applications, was a revolution in simplicity. It 100 would not be an overstatement to say that this simple API is the 101 reason the Internet won the protocol wars of the 1980s. SOCK_STREAM 102 is tied to the Transmission Control Protocol (TCP), specified in 1981 103 [RFC0793]. TCP has scaled remarkably well over the past three and a 104 half decades, but its total ubiquity has hidden an uncomfortable 105 fact: the network is not really a file, and stream abstractions are 106 too simplistic for many modern application programming models. 108 In the meantime, the nature of Internet access, and the variety of 109 Internet transport protocols, is evolving. The challenges that new 110 protocols and access paradigms present to the sockets API and to 111 programming models based on them inspire the design elements of a new 112 approach. 114 Many end-user devices are connected to the Internet via multiple 115 interfaces, which suggests it is time to promote the paths by which 116 two endpoints are connected to each other to a first-order object. 117 While implicit multipath communication is available for these 118 multihomed nodes in the present Internet architecture with the 119 Multipath TCP extension (MPTCP) [RFC6824], MPTCP was specifically 120 designed to hide multipath communication from the application for 121 purposes of compatibility. Since many multihomed nodes are connected 122 to the Internet through access paths with widely different properties 123 with respect to bandwidth, latency and cost, adding explicit path 124 control to MPTCP's API would be useful in many situations. 126 Another trend straining the traditional layering of the transport 127 stack associated with the SOCK_STREAM interface is the widespread 128 interest in ubiquitous deployment of encryption to guarantee 129 confidentiality, authenticity, and integrity, in the face of 130 pervasive surveillance [RFC7258]. Layering the most widely deployed 131 encryption technology, Transport Layer Security (TLS), strictly atop 132 TCP (i.e., via a TLS library such as OpenSSL that uses the sockets 133 API) requires the encryption-layer handshake to happen after the 134 transport-layer handshake, which increases connection setup latency 135 on the order of one or two round-trip times, an unacceptable delay 136 for many applications. Integrating cryptographic state setup and 137 maintenance into the path abstraction naturally complements efforts 138 in new protocols (e.g. QUIC [I-D.ietf-quic-transport]) to mitigate 139 this strict layering. 141 To meet these challenges, we present the Post-Sockets Application 142 Programming Interface (API), described in detail in this work. Post 143 is designed to be language, transport protocol, and architecture 144 independent, allowing applications to be written to a common abstract 145 interface, easily ported among different platforms, and used even in 146 environments where transport protocol selection may be done 147 dynamically, as proposed in the IETF's Transport Services working 148 group. 150 Post replaces the traditional SOCK_STREAM abstraction with a Message 151 abstraction, which can be seen as a generalization of the Stream 152 Control Transmission Protocol's [RFC4960] SOCK_SEQPACKET service. 153 Messages are sent and received on Carriers, which logically group 154 Messages for transmission and reception. For backward compatibility, 155 bidirectional byte stream protocols are represented as a pair of 156 Messages, one in each direction, that can only be marked complete 157 when the sending peer has finished transmitting data. 159 Post replaces the notions of a socket address and connected socket 160 with an Association with a remote endpoint via set of Paths. 161 Implementation and wire format for transport protocol(s) implementing 162 the Post API are explicitly out of scope for this work; these 163 abstractions need not map directly to implementation-level concepts, 164 and indeed with various amounts of shimming and glue could be 165 implemented with varying success atop any sufficiently flexible 166 transport protocol. 168 The key features of Post as compared with the existing sockets API 169 are: 171 o Explicit Message orientation, with framing and atomicity 172 guarantees for Message transmission. 174 o Asynchronous reception, allowing all receiver-side interactions to 175 be event-driven. 177 o Explicit support for multistreaming and multipath transport 178 protocols and network architectures. 180 o Long-lived Associations, whose lifetimes may not be bound to 181 underlying transport connections. This allows associations to 182 cache state and cryptographic key material to enable fast 183 resumption of communication, and for the implementation of the API 184 to explicitly take care of connection establishment mechanics such 185 as connection racing [RFC6555] and peer-to-peer rendezvous 186 [RFC5245]. 188 o Transport protocol stack independence, allowing applications to be 189 written in terms of the semantics best for the application's own 190 design, separate from the protocol(s) used on the wire to achieve 191 them. This enables applications written to a single API to make 192 use of transport protocols in terms of the features they provide, 193 as in [I-D.ietf-taps-transports]. 195 This work is the synthesis of many years of Internet transport 196 protocol research and development. It is inspired by concepts from 197 the Stream Control Transmission Protocol (SCTP) [RFC4960], TCP Minion 198 [I-D.iyengar-minion-protocol], and MinimaLT [MinimaLT], among other 199 transport protocol modernization efforts. We present Post as an 200 illustration of what is possible with present developments in 201 transport protocols when freed from the strictures of the current 202 sockets API. While much of the work for building parts of the 203 protocols needed to implement Post are already ongoing in other IETF 204 working groups (e.g. MPTCP, QUIC, TLS), we argue that an abstract 205 programming interface unifying access all these efforts is necessary 206 to fully exploit their potential. 208 2. Abstractions and Terminology 209 +===============+ 210 | Message | 211 +===============+ 212 | ^ | | 213 send()| |ready() |initiate() |listen() 214 V | V V 215 +=======================+ +============+ 216 | | accept()| | 217 | Carrier |<--------| Listener | 218 | | | | 219 +=======================+ +============+ 220 |1 | n| | +=========+ 221 | | |1 | +---| Local | 222 | +===============+ +=======================+ | +=========+ 223 | | | | |---+ 224 | | Configuration |--| Association | +=========+ 225 | | | | |-------| Remote | 226 | +===============+ +=======================+ +=========+ 227 | | 1| durable end-to-end 228 +-------+ | | state via many paths, 229 | | | policies, and prefs 230 n| | n| 231 +===========+ +==========+ 232 ephemeral | | | | 233 transport & | Transient |-------| Path | properties of 234 crypto state | |n 1| | address pair 235 +===========+ +==========+ 237 Figure 1: Abstractions and relationships in Post Sockets 239 Post is based on a small set of abstractions, centered around a 240 Message Carrier as the entry point for an application to the 241 networking API. The relationships among them are shown in 242 Figure Figure 1 and detailed in this section. 244 2.1. Message Carrier 246 A Message Carrier (or simply Carrier) is a transport protocol stack- 247 independent interface for sending and receiving messages between an 248 application and a remote endpoint; it is roughly analogous to a 249 socket in the present sockets API. 251 Sending a Message over a Carrier is driven by the application, while 252 receipt is driven by the arrival of the last packet that allows the 253 Message to be assembled, decrypted, and passed to the application. 254 Receipt is therefore asynchronous; given the different models for 255 asynchronous I/O and concurrency supported by different platforms, it 256 may be implemented in any number of ways. The abstract API provides 257 only for a way for the application to register how it wants to handle 258 incoming messages. 260 All the Messages sent to a Carrier will be received on the 261 corresponding Carrier at the remote endpoint, though not necessarily 262 reliably or in order, depending on Message properties and the 263 underlying transport protocol stack. 265 A Carrier that is backed by current transport protocol stack state 266 (such as a TCP connection; see Section 2.7) is said to be "active": 267 messages can be sent and received over it. A Carrier can also be 268 "dormant": there is long-term state associated with it (via the 269 underlying Association; see Section 2.3), and it may be able to 270 reactivated, but messages cannot be sent and received immediately. 271 Carriers become dormant when the underlying transport protocol stack 272 determines that an underlying connection has been lost and there is 273 insufficient state in the Association to re-establish it (e.g., in 274 the case of a server-side Carrier where the client's address has 275 changed unexpectedly). Passive close can be handled by the 276 application via an event on the carrier. Attempting to use a carrier 277 after passive close results in an error. 279 If supported by the underlying transport protocol stack, a Carrier 280 may be forked: creating a new Carrier associated with a new Carrier 281 at the same remote endpoint. The semantics of the usage of multiple 282 Carriers based on the same Association are application-specific. 283 When a Carrier is forked, its corresponding Carrier at the remote 284 endpoint receives a fork request, which it must accept in order to 285 fully establish the new carrier. Multiple Carriers between endpoints 286 are implemented differently by different transport protocol stacks, 287 either using multiple separate transport-layer connections, or using 288 multiple streams of multistreaming transport protocols. 290 To exchange messages with a given remote endpoint, an application may 291 initiate a Carrier given its remote (see Section 2.4 and local (see 292 Section 2.5) identities; this is an equivalent to an active open. 293 There are four special cases of Carriers, as well, supporting 294 different initiation and interaction patterns, defined in the 295 subsections below. 297 o Listener: A Listener is a special case of Message Carrier which 298 only responds to requests to create a new Carrier from a remote 299 endpoint, analogous to a server or listening socket in the present 300 sockets API. Instead of being bound to a specific remote 301 endpoint, it is bound only to a local identity; however, its 302 interface for accepting fork requests is identical to that for 303 fully fledged Carriers. 305 o Source: A Source is a special case of Message Carrier over which 306 messages can only be sent, intended for unidirectional 307 applications such as multicast transmitters. Sources cannot be 308 forked, and need not accept forks. 310 o Sink: A Sink is a special case of Message Carrier over which 311 messages can only be received, intended for unidirectional 312 applications such as multicast receivers. Sinks cannot be forked, 313 and need not accept forks. 315 o Responder: A Responder is a special case of Message Carrier which 316 may receive messages from many remote sources, for cases in which 317 an application will only ever send Messages in reply back to the 318 source from which a Message was received. This is a common 319 implementation pattern for servers in client-server applications. 320 A Responder's receiver gets a Message, as well as a Source to send 321 replies to. Responders cannot be forked, and need not accept 322 forks. 324 2.2. Message 326 A Message is the unit of communication between applications. 327 Messages can represent relatively small structures, such as requests 328 in a request/response protocol such as HTTP; relatively large 329 structures, such as files of arbitrary size in a filesystem; and 330 structures of indeterminate length, such as a stream of bytes in a 331 protocol like TCP. 333 In the general case, there is no mapping between a Message and 334 packets sent by the underlying protocol stack on the wire: the 335 transport protocol may freely segment messages and/or combine 336 messages into packets. However, a message may be marked as 337 immediate, which will cause it to be sent in a single packet when 338 possible. 340 Content may be sent and received either as Complete or Partial 341 Messages. Dealing with Complete Messages should be preferred for 342 simplicity whenever possible based on the underlying protocol. It is 343 always possible to send Complete Messages, but only protocols that 344 have a fixed maximum message length may allow clients to receive 345 Messages using an API that guarantees Complete Messages. Sending and 346 receiving Partial Messages (that is, a Message whose content spans 347 multiple calls or callbacks) is always possible. 349 To send a Message, either Complete or Partial, the Message content is 350 passed into the Carrier, and client provides a set of callbacks to 351 know when the Message was delivered or acknowledged. The client of 352 the API may use the callbacks to pace the sending of Messages. 354 To receive a Message, the client of the API schedules a completion to 355 be called when a Complete or Partial Message is available. If the 356 client is willing to accept Partial Messages, it can specify the 357 minimum incomplete Message length it is willing to receive at once, 358 and the maximum number of bytes it is willing to receive at once. If 359 the client wants Complete Messages, there are no values to tune. The 360 scheduling of the receive completion indicates to the Carrier that 361 there is a desire to receive bytes, effectively creating a "pull 362 model" in which backpressure may be applied if the client is not 363 receiving Messages or Partial Messages quickly enough to match the 364 peer's sending rate. The Carrier may have some minimal buffer of 365 incoming Messages ready for the client to read to reduce latency. 367 When receiving a Complete Message, the entire content of the Message 368 must be delivered at once, and the Message is not delivered at all if 369 the full Message is not received. This implies that both the sending 370 and receiving endpoint, whether in the application or the carrier, 371 must guarantee storage for the full size of a Message. 373 Partial Messages may be sent or received in several stages, with a 374 handle representing the total Message being associated with each 375 portion of the content. Each call to send or receive also indicates 376 whether or not the Message is now complete. This approach is 377 necessary whenever the size of the Message does not have a known 378 bound, or the size is too large to process and hold in memory. 379 Protocols that only present a concept of byte streams represent their 380 data as single Messages with unknown bounds. In the case of TCP, the 381 client will receive a single Message in pieces using the Partial 382 Message API, and that Message will only be marked as complete when 383 the peer has sent a FIN. 385 Messages are sent over and received from Message Carriers (see 386 Section 2.1). 388 On sending, Messages have properties that allow the application to 389 specify its requirements with respect to reliability, ordering, 390 priority, idempotence, and immediacy; these are described in detail 391 below. Messages may also have arbitrary properties which provide 392 additional information to the underlying transport protocol stack on 393 how they should be handled, in a protocol-specific way. These stacks 394 may also deliver or set properties on received messages, but in the 395 general case a received messages contains only a sequence of ordered 396 bytes. Message properties include: 398 o Lifetime and Partial Reliability: A Message may have a "lifetime" 399 - a wall clock duration before which the Message must be available 400 to the application layer at the remote end. If a lifetime cannot 401 be met, the Message is discarded as soon as possible. Messages 402 without lifetimes are sent reliably if supported by the transport 403 protocol stack. Lifetimes are also used to prioritize Message 404 delivery. 406 There is no guarantee that a Message will not be delivered after 407 the end of its lifetime; for example, a Message delivered over a 408 strictly reliable transport will be delivered regardless of its 409 lifetime. Depending on the transport protocol stack used to 410 transmit the message, these lifetimes may also be signalled to 411 path elements by the underlying transport, so that path elements 412 that realize a lifetime cannot be met can discard frames 413 containing the Messages instead of forwarding them. 415 o Priority: Messages have a "niceness" - a priority among other 416 messages sent over the same Carrier in an unbounded hierarchy most 417 naturally represented as a non-negative integer. By default, 418 Messages are in niceness class 0, or highest priority. Niceness 419 class 1 Messages will yield to niceness class 0 Messages sent over 420 the same Carrier, class 2 to class 1, and so on. Niceness may be 421 translated to a priority signal for exposure to path elements 422 (e.g. DSCP code point) to allow prioritization along the path as 423 well as at the sender and receiver. This inversion of normal 424 schemes for expressing priority has a convenient property: 425 priority increases as both niceness and lifetime decrease. A 426 Message may have both a niceness and a lifetime - Messages with 427 higher niceness classes will yield to lower classes if resource 428 constraints mean only one can meet the lifetime. 430 o Dependence: A Message may have "antecedents" - other Messages on 431 which it depends, which must be delivered before it (the 432 "successor") is delivered. The sending transport uses deadlines, 433 niceness, and antecedents, along with information about the 434 properties of the Paths available, to determine when to send which 435 Message down which Path. 437 o Idempotence: A sending application may mark a Message as 438 "idempotent" to signal to the underlying transport protocol stack 439 that its application semantics make it safe to send in situations 440 that may cause it to be received more than once (i.e., for 0-RTT 441 session resumption as in TCP Fast Open, TLS 1.3, and QUIC). 443 o Immediacy: A sending application may mark a Message as "immediate" 444 to signal to the underlying transport protocol stack that its 445 application semantics require it to be placed in a single packet, 446 on its own, instead of waiting to be combined with other messages 447 or parts thereof (i.e., for media transports and interactive 448 sessions with small messages). 450 Senders may also be asynchronously notified of three events on 451 Messages they have sent: that the Message has been transmitted, that 452 the Message has been acknowledged by the receiver, or that the 453 Message has expired before transmission/acknowledgement. Not all 454 transport protocol stacks will support all of these events. 456 2.3. Association 458 An Association contains the long-term state necessary to support 459 communications between a Local (see Section 2.5) and a Remote (see 460 Section 2.4) endpoint, such as trust model information, including 461 pinned public keys or anchor certificates, cryptographic session 462 resumption parameters, or rendezvous information. It uses 463 information from the Configuration (see Section 2.6) to constrain the 464 selection of transport protocols and local interfaces to create 465 Transients (see Section 2.7) to carry Messages; and information about 466 the paths through the network available available between them (see 467 Section 2.8). 469 All Carriers are bound to an Association. New Carriers will reuse an 470 Association if they can be carried from the same Local to the same 471 Remote over the same Paths; this re-use of an Association may implies 472 the creation of a new Transient. 474 Associations may exist and be created without a Carrier. This may be 475 done if peer cryptographic state such as a pre-shared key is 476 established out-of-band. Thus, Associations may be created without 477 the need to send application data to a peer, that is, without a 478 Carrier. Associations are mutable. Association state may expire 479 over time, after which it is removed from the Association, and 480 Transients may export cryptographic state to store in an Association 481 as needed. Moreover, this state may be exported directly into the 482 Association or modified before insertion. This may be needed to 483 diversify ephemeral Transient keying material from the longer-term 484 Association keying material. 486 A primary use of Association state is to allow new Associations and 487 their derived Carriers to be quickly created without performing in- 488 band cryptographic handshakes. See [I-D.kuehlewind-taps-crypto-sep] 489 for more details about this separation. 491 2.4. Remote 493 A Remote represents information required to establish and maintain a 494 connection with the far end of an Association: name(s), address(es), 495 and transport protocol parameters that can be used to establish a 496 Transient; transport protocols to use; trust model information, 497 inherited from the relevant Association, used to identify the remote 498 on connection establishment; and so on. Each Association is 499 associated with a single Remote, either explicitly by the application 500 (when created by the initiation of a Carrier) or a Listener (when 501 created by forking a Carrier on passive open). 503 A Remote may be resolved, which results in zero or more Remotes with 504 more specific information. For example, an application may want to 505 establish a connection to a website identified by a URL 506 https://www.example.com. This URL would be wrapped in a Remote and 507 passed to a call to initiate a Carrier. The first pass resolution 508 might parse the URL, decomposing it into a name, a transport port, 509 and a transport protocol to try connecting with. A second pass 510 resolution would then look up network-layer addresses associated with 511 that name through DNS, and store any certificates available from 512 DANE. Once a Remote has been resolved to the point that a transport 513 protocol stack can use it to create a Transient, it is considered 514 fully resolved. 516 2.5. Local 518 A Local represents all the information about the local endpoint 519 necessary to establish an Association or a Listener. It encapsulates 520 the Provisioning Domain (PvD) of a single interface in the multiple 521 provisioning domain architecture [RFC7556], and adds information 522 about the service endpoint (transport protocol port), and, per 523 [I-D.pauly-taps-transport-security], cryptographic identities 524 (certificates and associated private keys) bound to this endpoint. 526 2.6. Configuration 528 A Configuration encapsulates an application's preferences around Path 529 selection and protocol options. 531 Each Association has exactly one Configuration, and all Carriers 532 belonging to that Association share the same Configuration. 534 The application cannot modify the Configuration for a Carrier or 535 Association once it is set. If a new set of options needs to be 536 used, then the application needs a new Carrier or Association 537 instance. This is necessary to ensure that a single Carrier can 538 consistently track the Paths and protocol options it uses, since it 539 is usually not possible to modify these properties without breaking 540 connectivity. 542 To influence Path selection, the application can configure a set of 543 requirements, preferences, and restrictions concerning which Paths 544 may be selected by the Association to use for creating Transients 545 between a Local and a Remote. For example, a Configuration can 546 specify that the application prefers Wi-Fi access over LTE when 547 roaming on a foreign LTE network, due to monetary cost to the user. 549 The Association uses the Configuration's Path preferences as a key 550 part of determining the Paths to use for its Transients. The 551 Configuration is provided as input when examining the complete list 552 of available Paths on the system (to limit the list, or order the 553 Paths by preference). The system's policy will further restrict and 554 modify the Path that is ultimately selected, using other aspects of 555 the Configuration (protocol options and originating application) to 556 select the most appropriate Path. 558 To influence protocol selection and options, the Configuration 559 contains one or more allowed Protocol Stack Configurations. Each of 560 these is comprised of application- and transport-layer protocols that 561 may be used together to communicate to the Remote, along with any 562 protocol-specific options. For example, a Configuration could 563 specify two alternate, but equivalent, protocol stacks: one using 564 HTTP/2 over TLS over TCP, and the other using QUIC over UDP. 565 Alternatively, the Configuration could specify two protocol stacks 566 with the same protocols, but different protocol options: one using 567 TLS with TLS 1.3 0-RTT enabled and TCP with TCP Fast-Open enabled, 568 and one using TLS with out 0-RTT and TCP without TCP Fast-Open. 570 Protocol-specific options within the Configuration include trust 571 settings and acceptable cryptographic algorithms to be used by 572 security protocols. These may be configured for specific protocols 573 to allow different settings for each (such as between TLS over TCP 574 and TLS for use with QUIC), or set as default security settings on 575 the Configuration to be used by any protocol that needs to evaluate 576 trust. Trust settings may include certificate anchors and 577 certificate pinning options. 579 2.7. Transient 581 A Transient represents a binding between a Carrier and the instance 582 of the transport protocol stack that implements it. As an 583 Association contains long-term state for communications between two 584 endpoints, a Transient contains ephemeral state for a single 585 transport protocol over a one or more Paths at a given point in time. 587 A Carrier may be served by multiple Transients at once, e.g. when 588 implementing multipath communication such that the separate paths are 589 exposed to the API by the underlying transport protocol stack. Each 590 Transient serves only one Carrier, although multiple Transients may 591 share the same underlying protocol stack; e.g. when multiplexing 592 Carriers over streams in a multistreaming protocol. 594 Transients are generally not exposed by the API to the application, 595 though they may be accessible for debugging and logging purposes. 597 2.8. Path 599 A Path represents information about a single path through the network 600 used by an Association, in terms of source and destination network 601 and transport layer addresses within an addressing context, and the 602 provisioning domain [RFC7556] of the local interface. This 603 information may be learned through a resolution, discovery, or 604 rendezvous process (e.g. DNS, ICE), by measurements taken by the 605 transport protocol stack, or by some other path information discovery 606 mechanism. It is used by the transport protocol stack to maintain 607 and/or (re-)establish communications for the Association. 609 The set of available properties is a function of the transport 610 protocol stacks in use by an association. However, the following 611 core properties are generally useful for applications and transport 612 layer protocols to choose among paths for specific Messages: 614 o Maximum Transmission Unit (MTU): the maximum size of an Message's 615 payload (subtracting transport, network, and link layer overhead) 616 which will likely fit into a single frame. Derived from signals 617 sent by path elements, where available, and/or path MTU discovery 618 processes run by the transport layer. 620 o Latency Expectation: expected one-way delay along the Path. 621 Generally provided by inline measurements performed by the 622 transport layer, as opposed to signaled by path elements. 624 o Loss Probability Expectation: expected probability of a loss of 625 any given single frame along the Path. Generally provided by 626 inline measurements performed by the transport layer, as opposed 627 to signaled by path elements. 629 o Available Data Rate Expectation: expected maximum data rate along 630 the Path. May be derived from passive measurements by the 631 transport layer, or from signals from path elements. 633 o Reserved Data Rate: Committed, reserved data rate for the given 634 Association along the Path. Requires a bandwidth reservation 635 service in the underlying transport protocol stack. 637 o Path Element Membership: Identifiers for some or all nodes along 638 the path, depending on the capabilities of the underlying network 639 layer protocol to provide this. 641 Path properties are generally read-only. MTU is a property of the 642 underlying link-layer technology on each link in the path; latency, 643 loss, and rate expectations are dynamic properties of the network 644 configuration and network traffic conditions; path element membership 645 is a function of network topology. In an explicitly multipath 646 architecture, application and transport layer requirements can be met 647 by having multiple paths with different properties to select from. 648 Transport protocol stacks can also provide signaling to devices along 649 the path, but this signaling is derived from information provided to 650 the Message abstraction. 652 3. Abstract Programming Interface 654 We now turn to the design of an abstract programming interface to 655 provide a simple interface to Post's abstractions, constrained by the 656 following design principles: 658 o Flexibility is paramount. So is simplicity. Applications must be 659 given as many controls and as much information as they may need, 660 but they must be able to ignore controls and information 661 irrelevant to their operation. This implies that the "default" 662 interface must be no more complicated than BSD sockets, and must 663 do something reasonable. 665 o Reception is an inherently asynchronous activity. While the API 666 is designed to be as platform-independent as possible, one key 667 insight it is based on is that an Message receiver's behavior in a 668 packet-switched network is inherently asynchronous, driven by the 669 receipt of packets, and that this asynchronicity must be reflected 670 in the API. The actual implementation of receive and event 671 handling will need to be aligned to the method a given platform 672 provides for asynchronous I/O. 674 o A new API cannot be bound to a single transport protocol and 675 expect wide deployment. As the API is transport-independent and 676 may support runtime transport selection, it must impose the 677 minimum possible set of constraints on its underlying transports, 678 though some API features may require underlying transport features 679 to work optimally. It must be possible to implement Post over 680 vanilla TCP in the present Internet architecture. 682 The API we design from these principles is centered around a Carrier, 683 which can be created actively via initiate() or passively via a 684 listen(); the latter creates a Listener from which new Carriers can 685 be accept()ed. Messages may be created explicitly and passed to this 686 Carrier, or implicitly through a simplified interface which uses 687 default message properties (reliable transport without priority or 688 deadline, which guarantees ordered delivery over a single Carrier 689 when the underlying transport protocol stack supports it). 691 For each connection between a Local and a Remote a new Carrier is 692 created and destroyed when the connection is closed. However, a new 693 Carrier may use an existing Association if present for the requested 694 Local-Remote pair and permitted by the PolicyContext that can be 695 provided at Carrier initiation. Further the system-wide 696 PolicyContext can contain more information that determine when to 697 create or destroy Associations other than at Carrier initiation. 698 E.g. an Association can be created at system start, based on the 699 configured PolicyContext or also by a manual action of an single 700 application, for Local-Remote pairs that are known to be likely used 701 soon, and to pre-establish, e.g., cryptographic context as well as 702 potentially collect current information about path capabilities. 703 Every time an actual connection with a specific PSI is established 704 between the Local and Remote, the Association learns new Path 705 information and stores them. This information can be used when a new 706 transient is created, e.g. to decide which PSI to use (to provide the 707 highest probably for a successful connection attempt) or which PSIs 708 to probe for (first). A Transient is created when an application 709 actually sends a Message over a Carrier. As further explained below 710 this step can actually create multiple transients for probing or 711 assign a new transient to an already active PSI, e.g. if multi- 712 streaming is provided and supported for these kind of use on both 713 sides. 715 3.1. Example Connection Patterns 717 Here, we illustrate the usage of the API for common connection 718 patterns. Note that error handling is ignored in these illustrations 719 for ease of reading. 721 3.1.1. Client-Server 723 Here's an example client-server application. The server echoes 724 messages. The client sends a message and prints what it receives. 726 The client in Figure 2 connects, sends a message, and sets up a 727 receiver to print messages received in response. The carrier is 728 inactive after the Initiate() call; the Send() call blocks until the 729 carrier can be activated. 731 // connect to a server given a remote 732 func sayHello() { 734 carrier := Initiate(local, remote) 736 carrier.Send([]byte("Hello!")) 737 carrier.Ready(func (msg InMessage) { 738 fmt.Println(string([]byte(msg)) 739 return false 740 }) 741 carrier.Close() 742 } 744 Figure 2: Example client 746 The server in Figure 3 creates a Listener, which accepts Carriers and 747 passes them to a server. The server echos the content of each 748 message it receives. 750 // run a server for a specific carrier, echo all its messages 751 func runMyServerOn(carrier Carrier) { 752 carrier.Ready(func (msg InMessage) { 753 carrier.Send(msg) 754 }) 755 } 757 // accept connections forever, spawn servers for them 758 func acceptConnections() { 759 listener := Listen(local) 760 listener.Accept(func(carrier Carrier) bool { 761 go runMyServerOn(carrier) 762 return true 763 }) 764 } 766 Figure 3: Example server 768 The Responder allows the server to be significantly simplified, as 769 shown in Figure 4. 771 func echo(msg InMessage, reply Sink) { 772 reply.Send(msg) 773 } 775 Respond(local, echo) 777 Figure 4: Example responder 779 3.1.2. Client-Server with Happy Eyeballs and 0-RTT establishment 781 The fundamental design of a client need not change at all for happy 782 eyeballs [RFC6555] (selection of multiple potential protocol stacks 783 through connection racing); this is handled by the Post Sockets 784 implementation automatically. If this connection racing is to use 785 0-RTT data (i.e., as provided by TCP Fast Open [RFC7413], the client 786 must mark the outgoing message as idempotent. 788 // connect to a server given a remote and send some 0-RTT data 789 func sayHelloQuickly() { 791 carrier := Initiate(local, remote) 793 carrier.SendMsg(OutMessage{Content: []byte("Hello!"), Idempotent: true}, nil, nil, nil) 794 carrier.Ready(func (msg InMessage) { 795 fmt.Println(string([]byte(msg))) 796 return false 797 }) 798 carrier.Close() 799 } 801 3.1.3. Peer to Peer with Network Address Translation 803 In the client-server examples shown above, the Remote given to the 804 Initiate call refers to the name and port of the server to connect 805 to. This need not be the case, however; a Remote may also refer to 806 an identity and a rendezvous point for rendezvous as in ICE 807 [RFC5245]. Here, each peer does its own Initiate call 808 simultaneously, and the result on each side is a Carrier attached to 809 an appropriate Association. 811 3.1.4. Multicast Receiver 813 A multicast receiver is implemented using a Sink attached to a Local 814 encapsulating a multicast address on which to receive multicast 815 datagrams. The following example prints messages received on the 816 multicast address forever. 818 func receiveMulticast() { 819 sink = NewSink(local) 820 sink.Ready(func (msg InMessage) { 821 fmt.Println(string([]byte(msg))) 822 return true 823 }) 824 } 826 3.1.5. Association Bootstrapping 828 Here, we show how Association state may be initialized without a 829 carrier. The goal is to create a long-term Association from which 830 Carriers may be derived and, if possible, used immediately. Per 831 [I-D.pauly-taps-transport-security], a first step is to specify trust 832 model constraints, such as pinned public keys and anchor 833 certificates, which are needed to create Remote connections. 835 We begin by creating shared security parameters that will be used 836 later for creating a remote connection. 838 // create security parameters with a set of trusted certificates 839 func createParameters(trustedCerts []Certificate) Parameters { 840 parameters := Parameters() 841 parameters = parameters.SetTrustedCerts(trustedCerts) 842 return parameters 843 } 845 Using these statically configured parameters, we now show how to 846 create an Association between a Local and Remote using these 847 parameters. 849 // create an Association using shared parameters 850 func createAssociation(local Local, remote Remote, parameters Parameters) Association { 851 association := NewAssociation(local, remote, parameters) 852 return association 853 } 855 We may also create an Association with a pre-shared key configured 856 out-of-band. 858 // create an Association using a pre-shared key 859 func createAssociationWithPSK(local Local, remote Remote, parameters Parameters, preSharedKey []byte) Association { 860 association := NewAssociation(local, remote, parameters) 861 association = association.SetPreSharedKey(preSharedKey) 862 return association 863 } 865 We now show how to create a Carrier from an existing, pre-configured 866 Association. This Association may or may not contain shared 867 cryptographic static between the Local and Remote, depending on how 868 it was configured. 870 // open a connection to a server using an existing Association and send some data, 871 // which will be sent early if possible. 872 func sayHelloWithAssociation(association Association) { 873 carrier := association.Initiate() 875 carrier.SendMsg(OutMessage{Content: []byte("Hello!"), Idempotent: true}, nil, nil, nil) 876 carrier.Ready(func (msg InMessage) { 877 fmt.Println(string([]byte(msg))) 878 return false 879 }) 880 carrier.Close() 881 } 883 3.2. API Dynamics 885 As Carriers provide the central entry point to Post, they are key to 886 API dynamics. The lifecycle of a carrier is shown in Figure 5. 887 Carriers are created by active openers by calling Initiate() given a 888 Local and a Remote, and by passive openers by calling Listen() given 889 a Local; the .Accept() method on the listener Carrier can then be 890 used to create active carriers. By default, the underlying 891 Association is automatically created and managed by the underlying 892 API. This underlying Association can be accessed by the Carrier's 893 .Association() method. Alternately, an association can be explicitly 894 created using NewAssociation(), and a Carrier on the association may 895 be accessed or initiated by calling the association's .Initiate() 896 method. 898 Once a Carrier has been created (via Initiate(), 899 Association.Initiate(), NewSource(), NewSink(), or 900 Listen()/Accept()), it may be used to send and receive Messages. The 901 existence of a Carrier does not imply the existence of an active 902 Transient or associated transport-layer connection; these may be 903 created when the carrier is, or may be deferred, depending on the 904 network environment, configuration, and protocol stacks available. 906 Listen(local) Initiate(local,remote) NewSource(local,remote) 907 | | or NewSink(local) 908 [ Carrier ] | | 909 [(listener)] +--------------------+ 910 | V 911 .Accept()-----------> [Carrier] -+----------> .Close() 912 | ^ | close [ Carrier ] 913 | | +- event -> [ (closed) ] 914 | | 915 .Association() .Carriers() 916 | .Initiate() 917 V | 918 [Association] 919 ^ 920 | 921 NewAssociation(local,remote) 923 Figure 5: Carrier and Association Life Cycle 925 Access to more detailed information is possible through accessors on 926 Carriers and Associations, as shown in Figure 6. The set of 927 currently active Transients can be accessed through the Carrier's 928 .Transients() methods. The active path(s) used by a Transient can be 929 accessed through the Transient's .Paths() method, and the set of all 930 paths for which properties are cached by an Association can be 931 accessed through the Association's .Paths() method. The set of 932 active carriers on an association can be accessed through the 933 Association's .Carriers() method. Access to transients and paths is 934 not necessary in normal operation; these accessors are provided 935 primarily for logging and debugging purposes. 937 [Carrier]---.Transients()--->[Transient] 938 | ^ | 939 | | | 940 .Association() .Carriers() .Paths() 941 | .Initiate() | 942 V | V 943 [Association]---.Paths()------>[Path] 945 Figure 6: Accessors on Carriers and Associations 947 Each Carrier has a .Send() method, by which Messages can be sent with 948 given properties, and a .Ready() method, which supplies a callback 949 for reading Messages from the remote side. .Send() is not available 950 on Sinks, and .Ready() is not available on Sources. Carriers also 951 provide .OnSent(), .OnAcked(), and .OnExpired() calls for binding 952 default send event handlers to the Carrier, and .OnClosed() for 953 handling passive close notifications. 955 +---------[incoming]-----------+ 956 | [Message ] V 957 [outgoing] ---> .Send() ---> [Carrier] <---- .Ready() <---- [Receiver] 958 [Message ] | 959 +--- .OnSent() 960 +--- .OnAcked() 961 +--- .OnExpired() 962 +--- .OnClosed() 964 Figure 7: Sending and Receiving Messages and Events 966 An application may have a global Configuation, as well as more 967 specific Configurations to apply to the establishment of a given 968 Association or Carrier. These Configurations are optional arguments 969 to the Association and Carrier creation calls. 971 In order to initiate a connection with a remote endpoint, a user of 972 Post Sockets must start from a Remote (see Section 2.4). A Remote 973 encapsulates identifying information about a remote endpoint at a 974 specific level of resolution. A new Remote can be wrapped around 975 some identifying information by via the NewRemote() call. A Remote 976 has a .Resolve() method, which can be iteratively revoked to increase 977 the level of resolution; a call to Resolve on a given Remote may 978 result in one to many Remotes, as shown in Figure 8. Remotes at any 979 level of resolution may be passed to Post Sockets calls; each call 980 will continue resolution to the point necessary to establish or 981 resume a Carrier. 983 +----------------------------+ 984 n | | 1 985 NewRemote(identifiers) ---+--->[Remote] --.Resolve()---+ 987 Figure 8: Recursive resolution of Remotes 989 Information about the local endpoint is also necessary to establish 990 an Association, whether explicitly or implicitly through the creation 991 of a Carrier or Listener. This is passed in the form of a Local (see 992 Section 2.5). A Local is created with a NewLocal() call, which takes 993 a Configuration (including certificates to present and secret keys 994 associated with them) and identifying information (interface(s) and 995 port(s) to use). 997 4. Implementation Considerations 999 Here we discuss an incomplete list of API implementation 1000 considerations that have arisen with experimentation with prototype 1001 implementations of Post. 1003 4.1. Protocol Stack Instance (PSI) 1005 A PSI encapsulates an arbitrary stack of protocols (e.g., TCP over 1006 IPv6, SCTP over DTLS over UDP over IPv4). PSIs provide the bridge 1007 between the interface (Carrier) plus the current state (Transients) 1008 and the implementation of a given set of transport services 1009 [I-D.ietf-taps-transports]. 1011 A given implementation makes one or more possible protocol stacks 1012 available to its applications. Selection and configuration among 1013 multiple PSIs is based on system-level or application policies, as 1014 well as on network conditions in the provisioning domain in which a 1015 connection is made. 1017 +=========+ +=========+ +==========+ +==========+ 1018 | Carrier | | Carrier | | Carrier | | Carrier | 1019 +=========+ +=========+ +==========+ +==========+ 1020 | | | | 1021 +=========+ +=========+ +==========+ +==========+ 1022 |Transient| |Transient| |Transient | |Transient | 1023 +=========+ +=========+ +==========+ +==========+ 1024 | \ / / \ 1025 +=========+ +=========+ +=========+ +=========+ 1026 | PSI | | PSI | | PSI | | PSI | 1027 +===+-----++ +===+-----++ +===+-----++ ++-----+===+ 1028 |TLS | |SCTP | |TLS | | TLS| 1029 |TCP | |DTLS | |TCP | | TCP| 1030 |IPv6 | |UDP | |IPv6 | | IPv4| 1031 |802.3 | |IPv6 | |802.11| |802.11| 1032 +------+ |802.3 | +------+ +------+ 1033 +------+ 1034 (a) Transient (b) Carrier multiplexing (c) Multiple candidates 1035 bound to PSI over a multi-streaming racing during session 1036 transport protocol establishment 1038 Figure 9: Example Protocol Stack Instances 1040 For example, Figure 9(a) shows a TLS over TCP stack, usable on most 1041 network connections. Protocols are layered to ensure that the PSI 1042 provides all the transport services required by the application. A 1043 single PSI may be bound to multiple Carriers, as shown in 1044 Figure 9(b): a multi-streaming transport protocol like QUIC or SCTP 1045 can support one carrier per stream. Where multi-streaming transport 1046 is not available, these carriers could be serviced by different PSIs 1047 on different flows. On the other hand, multiple PSIs are bound to a 1048 single transient during establishment, as shown in Figure 9(c). 1049 Here, the losing PSI in a happy-eyeballs race will be terminated, and 1050 the carrier will continue using the winning PSI. 1052 4.2. Message Framing, Parsing, and Serialization 1054 While some transports expose a byte stream abstraction, most higher 1055 level protocols impose some structure onto that byte stream. That 1056 is, the higher level protocol operates in terms of messages, protocol 1057 data units (PDUs), rather than using unstructured sequences of bytes, 1058 with each message being processed in turn. Protocols are specified 1059 in terms of state machines acting on semantic messages, with parsing 1060 the byte stream into messages being a necessary annoyance, rather 1061 than a semantic concern. Accordingly, Post Sockets exposes a 1062 message-based API to applications as the primary abstraction. 1063 Protocols that deal only in byte streams, such as TCP, represent 1064 their data in each direction as a single, long message. When framing 1065 protocols are placed on top of byte streams, the messages used in the 1066 API represent the framed messages within the stream. 1068 There are other benefits of providing a message-oriented API beyond 1069 framing PDUs that Post Sockets should provide when supported by the 1070 underlying transport. These include: 1072 o the ability to associate deadlines with messages, for transports 1073 that care about timing; 1075 o the ability to provide control of reliability, choosing what 1076 messages to retransmit in the event of packet loss, and how best 1077 to make use of the data that arrived; 1079 o the ability to manage dependencies between messages, when some 1080 messages may not be delivered due to either packet loss or missing 1081 a deadline, in particular the ability to avoid (re-)sending data 1082 that relies on a previous transmission that was never received. 1084 All require explicit message boundaries, and application-level 1085 framing of messages, to be effective. Once a message is passed to 1086 Post Sockets, it can not be cancelled or paused, but prioritization 1087 as well as lifetime and retransmission management will provide the 1088 protocol stack with all needed information to send the messages as 1089 quickly as possible without blocking transmission unnecessarily. 1090 Post Sockets provides this by handling message, with known identity 1091 (sequence numbers, in the simple case), lifetimes, niceness, and 1092 antecedents. 1094 Transport protocols such as SCTP provide a message-oriented API that 1095 has similar features to those we describe. Other transports, such as 1096 TCP, do not. To support a message oriented API, while still being 1097 compatible with stream-based transport protocols, Post Sockets must 1098 provide APIs for parsing and serialising messages that understand the 1099 protocol data. That is, we push message parsing and serialisation 1100 down into the Post Sockets stack, allowing applications to send and 1101 receive strongly typed data objects (e.g., a receive call on an HTTP 1102 Message Carrier should return an object representing the HTTP 1103 response, with pre-parsed status code, headers, and any message body, 1104 rather than returning a byte array that the application has to parse 1105 itself). This is backwards compatible with existing protocols and 1106 APIs, since the wire format of messages does not change, but gives a 1107 Post Sockets stack additional information to allow it to make better 1108 use of modern transport services. 1110 The Post Sockets approach is therefore to raise the semantic level of 1111 the transport API: applications should send and receive messages in 1112 the form of meaningful, strongly typed, protocol data. Parsing and 1113 serialising such messages should be a re-usable function of the 1114 protocol stack instance not the application. This is well-suited to 1115 implementation in modern systems languages, such as Swift, Go, Rust, 1116 or C++, but can also be implemented with some loss of type safety in 1117 C. 1119 4.3. Message Size Limitations 1121 Ideally, Messages can be of infinite size. However, protocol stacks 1122 and protocol stack implementations may impose their own limits on 1123 message sizing; For example, SCTP [RFC4960] and TLS 1124 [I-D.ietf-tls-tls13] impose record size limitations of 64kB and 16kB, 1125 respectively. Message sizes may also be limited by the available 1126 buffer at the receiver, since a Message must be fully assembled by 1127 the transport layer before it can be passed on to the application 1128 layer. Since not every transport protocol stack implements the 1129 signaling necessary to negotiate or expose message size limitations, 1130 these may need to be defined out of band, and are probably best 1131 exposed through the Configuration. 1133 A truly infinite message service - e.g. large file transfer where 1134 both endpoints have committed persistent storage to the message - is 1135 probably best realized as a layer above Post Sockets, and may be 1136 added as a new type of Message Carrier to a future revision of this 1137 document. 1139 4.4. Back-pressure 1141 Regardless of how asynchronous reception is implemented, it is 1142 important for an application to be able to apply receiver back- 1143 pressure, to allow the protocol stack to perform receiver flow 1144 control. Depending on how asynchronous I/O works in the platform, 1145 this could be implemented by having a maximum number of concurrent 1146 receive callbacks, or by bounding the maximum number of outstanding, 1147 unread bytes at any given time, for example. 1149 4.5. Associations, Transients, Racing, and Rendezvous 1151 As the network has evolved, even the simple act of establishing a 1152 connection has become increasingly complex. Clients now regularly 1153 race multiple connections, for example over IPv4 and IPv6, to 1154 determine which protocol to use. The choice of outgoing interface 1155 has also become more important, with differential reachability and 1156 performance from multiple interfaces. Name resolution can also give 1157 different outcomes depending on the interface the query was issued 1158 from. Finally, but often most significantly, NAT traversal, relay 1159 discovery, and path state maintenance messages are an essential part 1160 of connection establishment, especially for peer-to-peer 1161 applications. 1163 Post Sockets accordingly breaks communication establishment down into 1164 multiple phases: 1166 o Gathering Locals 1168 The set of possible Locals is gathered. In the simple case, this 1169 merely enumerates the local interfaces and protocols, and 1170 allocates ephemeral source ports for transients. For example, a 1171 system that has WiFi and Ethernet and supports IPv4 and IPv6 might 1172 gather four candidate locals (IPv4 on Ethernet, IPv6 on Ethernet, 1173 IPv4 on WiFi, and IPv6 on WiFi) that can form the source for a 1174 transient. 1176 If NAT traversal is required, the process of gathering locals 1177 becomes broadly equivalent to the ICE candidate gathering phase 1178 [RFC5245]. The endpoint determines its server reflexive locals 1179 (i.e., the translated address of a local, on the other side of a 1180 NAT) and relayed locals (e.g., via a TURN server or other relay), 1181 for each interface and network protocol. These are added to the 1182 set of candidate locals for this association. 1184 Gathering locals is primarily an endpoint local operation, 1185 although it might involve exchanges with a STUN server to derive 1186 server reflexive locals, or with a TURN server or other relay to 1187 derive relayed locals. It does not involve communication with the 1188 remote. 1190 o Resolving the Remote 1192 The remote is typically a name that needs to be resolved into a 1193 set of possible addresses that can be used for communication. 1194 Resolving the remote is the process of recursively performing such 1195 name lookups, until fully resolved, to return the set of 1196 candidates for the remote of this association. 1198 How this is done will depend on the type of the Remote, and can 1199 also be specific to each local. A common case is when the Remote 1200 is a DNS name, in which case it is resolved to give a set of IPv4 1201 and IPv6 addresses representing that name. Some types of remote 1202 might require more complex resolution. Resolving the remote for a 1203 peer-to-peer connection might involve communication with a 1204 rendezvous server, which in turn contacts the peer to gain consent 1205 to communicate and retrieve its set of candidate locals, which are 1206 returned and form the candidate remote addresses for contacting 1207 that peer. 1209 Resolving the remote is _not_ a local operation. It will involve 1210 a directory service, and can require communication with the remote 1211 to rendezvous and exchange peer addresses. This can expose some 1212 or all of the candidate locals to the remote. 1214 o Establishing Transients 1216 The set of candidate locals and the set of candidate remotes are 1217 paired, to derive a priority ordered set of Candidate Paths that 1218 can potentially be used to establish a connection. 1220 Then, communication is attempted over each candidate path, in 1221 priority order. If there are multiple candidates with the same 1222 priority, then transient establishment proceeds simultaneously and 1223 uses the transient that wins the race to be established. 1224 Otherwise, transients establishment is sequential, paced at a rate 1225 that should not congest the network. Depending on the chosen 1226 transport, this phase might involve racing TCP connections to a 1227 server over IPv4 and IPv6 [RFC6555], or it could involve a STUN 1228 exchange to establish peer-to-peer UDP connectivity [RFC5245], or 1229 some other means. 1231 o Confirming and Maintaining Transients 1233 Once connectivity has been established, unused resources can be 1234 released and the chosen path can be confirmed. This is primarily 1235 required when establishing peer-to-peer connectivity, where 1236 connections supporting relayed locals that were not required can 1237 be closed, and where an associated signalling operation might be 1238 needed to inform middleboxes and proxies of the chosen path. 1239 Keep-alive messages may also be sent, as appropriate, to ensure 1240 NAT and firewall state is maintained, so the transient remains 1241 operational. 1243 By encapsulating these four phases of communication establishment 1244 into the PSI, Post Sockets aims to simplify application development. 1245 It can provide reusable implementations of connection racing for TCP, 1246 to enable happy eyeballs, that will be automatically used by all TCP 1247 clients, for example. With appropriate callbacks to drive the 1248 rendezvous signalling as part of resolving the remote, we believe a 1249 generic ICE implementation ought also to be possible. This procedure 1250 can even be repeated fully or partially during a connection to enable 1251 seamless hand-over and mobility within the network stack. 1253 5. Acknowledgments 1255 Many thanks to Laurent Chuat and Jason Lee at the Network Security 1256 Group at ETH Zurich for contributions to the initial design of Post 1257 Sockets. Thanks to Joe Hildebrand, Martin Thomson, and Michael Welzl 1258 for their feedback, as well as the attendees of the Post Sockets 1259 workshop in February 2017 in Zurich for the discussions, which have 1260 improved the design described herein. 1262 This work is partially supported by the European Commission under 1263 Horizon 2020 grant agreement no. 688421 Measurement and Architecture 1264 for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat 1265 for Education, Research, and Innovation under contract no. 15.0268. 1266 This support does not imply endorsement. 1268 6. References 1270 6.1. Normative References 1272 [I-D.ietf-taps-transports] 1273 Fairhurst, G., Trammell, B., and M. Kuehlewind, "Services 1274 provided by IETF transport protocols and congestion 1275 control mechanisms", draft-ietf-taps-transports-14 (work 1276 in progress), December 2016. 1278 6.2. Informative References 1280 [I-D.ietf-quic-transport] 1281 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1282 and Secure Transport", draft-ietf-quic-transport-07 (work 1283 in progress), October 2017. 1285 [I-D.ietf-tls-tls13] 1286 Rescorla, E., "The Transport Layer Security (TLS) Protocol 1287 Version 1.3", draft-ietf-tls-tls13-21 (work in progress), 1288 July 2017. 1290 [I-D.iyengar-minion-protocol] 1291 Jana, J., Cheshire, S., and J. Graessley, "Minion - Wire 1292 Protocol", draft-iyengar-minion-protocol-02 (work in 1293 progress), October 2013. 1295 [I-D.kuehlewind-taps-crypto-sep] 1296 Kuehlewind, M., Pauly, T., and C. Wood, "Separating Crypto 1297 Negotiation and Communication", draft-kuehlewind-taps- 1298 crypto-sep-00 (work in progress), July 2017. 1300 [I-D.pauly-taps-transport-security] 1301 Pauly, T. and C. Wood, "A Survey of Transport Security 1302 Protocols", draft-pauly-taps-transport-security-00 (work 1303 in progress), July 2017. 1305 [I-D.trammell-plus-abstract-mech] 1306 Trammell, B., "Abstract Mechanisms for a Cooperative Path 1307 Layer under Endpoint Control", draft-trammell-plus- 1308 abstract-mech-00 (work in progress), September 2016. 1310 [I-D.trammell-plus-statefulness] 1311 Kuehlewind, M., Trammell, B., and J. Hildebrand, 1312 "Transport-Independent Path Layer State Management", 1313 draft-trammell-plus-statefulness-03 (work in progress), 1314 March 2017. 1316 [MinimaLT] 1317 Petullo, W., Zhang, X., Solworth, J., Bernstein, D., and 1318 T. Lange, "MinimaLT, Minimal-latency Networking Through 1319 Better Security", May 2013. 1321 [NEAT] Grinnemo, K-J., Tom Jones, ., Gorry Fairhurst, ., David 1322 Ros, ., Anna Brunstrom, ., and . Per Hurtig, "Towards a 1323 Flexible Internet Transport Layer Architecture", June 1324 2016. 1326 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1327 RFC 793, DOI 10.17487/RFC0793, September 1981, 1328 . 1330 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1331 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1332 . 1334 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 1335 (ICE): A Protocol for Network Address Translator (NAT) 1336 Traversal for Offer/Answer Protocols", RFC 5245, 1337 DOI 10.17487/RFC5245, April 2010, 1338 . 1340 [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with 1341 Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April 1342 2012, . 1344 [RFC6698] Hoffman, P. and J. Schlyter, "The DNS-Based Authentication 1345 of Named Entities (DANE) Transport Layer Security (TLS) 1346 Protocol: TLSA", RFC 6698, DOI 10.17487/RFC6698, August 1347 2012, . 1349 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 1350 "TCP Extensions for Multipath Operation with Multiple 1351 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 1352 . 1354 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1355 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 1356 2014, . 1358 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 1359 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 1360 . 1362 [RFC7556] Anipko, D., Ed., "Multiple Provisioning Domain 1363 Architecture", RFC 7556, DOI 10.17487/RFC7556, June 2015, 1364 . 1366 Appendix A. Open Issues 1368 This document is under active development; a list of current open 1369 issues is available at https://github.com/mami-project/draft- 1370 trammell-post-sockets/issues 1372 Authors' Addresses 1374 Brian Trammell 1375 ETH Zurich 1376 Gloriastrasse 35 1377 8092 Zurich 1378 Switzerland 1380 Email: ietf@trammell.ch 1382 Colin Perkins 1383 University of Glasgow 1384 School of Computing Science 1385 Glasgow G12 8QQ 1386 United Kingdom 1388 Email: csp@csperkins.org 1389 Tommy Pauly 1390 Apple Inc. 1391 1 Infinite Loop 1392 Cupertino, California 95014 1393 United States of America 1395 Email: tpauly@apple.com 1397 Mirja Kuehlewind 1398 ETH Zurich 1399 Gloriastrasse 35 1400 8092 Zurich 1401 Switzerland 1403 Email: mirja.kuehlewind@tik.ee.ethz.ch 1405 Chris Wood 1406 Apple Inc. 1407 1 Infinite Loop 1408 Cupertino, California 95014 1409 United States of America 1411 Email: cawood@apple.com