idnits 2.17.1 draft-trammell-taps-post-sockets-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 37 instances of too long lines in the document, the longest one being 32 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1102 has weird spacing: '... Port int...' -- The document date (March 08, 2017) is 2605 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-01 == Outdated reference: A later version (-28) exists of draft-ietf-tls-tls13-18 == Outdated reference: A later version (-04) exists of draft-trammell-plus-statefulness-02 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) -- Obsolete informational reference (is this intentional?): RFC 6555 (Obsoleted by RFC 8305) -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TAPS Working Group B. Trammell 3 Internet-Draft ETH Zurich 4 Intended status: Informational C. Perkins 5 Expires: September 9, 2017 University of Glasgow 6 T. Pauly 7 Apple Inc. 8 M. Kuehlewind 9 ETH Zurich 10 March 08, 2017 12 Post Sockets, An Abstract Programming Interface for the Transport Layer 13 draft-trammell-taps-post-sockets-00 15 Abstract 17 This document describes Post Sockets, an asynchronous abstract 18 programming interface for the atomic transmission of messages in an 19 inherently multipath environment. Post replaces connections with 20 long-lived associations between endpoints, with the possibility to 21 cache cryptographic state in order to reduce amortized connection 22 latency. We present this abstract interface as an illustration of 23 what is possible with present developments in transport protocols 24 when freed from the strictures of the current sockets API. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on September 9, 2017. 43 Copyright Notice 45 Copyright (c) 2017 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Abstractions and Terminology . . . . . . . . . . . . . . . . 5 62 2.1. Message Carrier . . . . . . . . . . . . . . . . . . . . . 6 63 2.1.1. Listener . . . . . . . . . . . . . . . . . . . . . . 7 64 2.1.2. Source . . . . . . . . . . . . . . . . . . . . . . . 8 65 2.1.3. Sink . . . . . . . . . . . . . . . . . . . . . . . . 8 66 2.1.4. Responder . . . . . . . . . . . . . . . . . . . . . . 8 67 2.1.5. Stream . . . . . . . . . . . . . . . . . . . . . . . 8 68 2.2. Message . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 2.2.1. Lifetime and Partial Reliability . . . . . . . . . . 9 70 2.2.2. Priority . . . . . . . . . . . . . . . . . . . . . . 10 71 2.2.3. Dependence . . . . . . . . . . . . . . . . . . . . . 10 72 2.2.4. Idempotence . . . . . . . . . . . . . . . . . . . . . 10 73 2.2.5. Immediacy . . . . . . . . . . . . . . . . . . . . . . 10 74 2.2.6. Additional Events . . . . . . . . . . . . . . . . . . 10 75 2.3. Association . . . . . . . . . . . . . . . . . . . . . . . 11 76 2.4. Remote . . . . . . . . . . . . . . . . . . . . . . . . . 11 77 2.5. Local . . . . . . . . . . . . . . . . . . . . . . . . . . 12 78 2.6. Transient . . . . . . . . . . . . . . . . . . . . . . . . 12 79 2.7. Path . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 2.8. Policy Context . . . . . . . . . . . . . . . . . . . . . 13 81 3. Abstract Programming Interface . . . . . . . . . . . . . . . 14 82 3.1. Example Connection Patterns . . . . . . . . . . . . . . . 15 83 3.1.1. Client-Server . . . . . . . . . . . . . . . . . . . . 15 84 3.1.2. Client-Server with Happy Eyeballs and 0-RTT 85 establishment . . . . . . . . . . . . . . . . . . . . 16 86 3.1.3. Peer to Peer with Network Address Translation . . . . 17 87 3.1.4. Multicast Receiver . . . . . . . . . . . . . . . . . 17 88 3.2. Implementation Considerations . . . . . . . . . . . . . . 17 89 3.2.1. Message Framing and Deframing . . . . . . . . . . . . 18 90 3.2.2. Message Size Limitations . . . . . . . . . . . . . . 18 91 3.2.3. Backpressure . . . . . . . . . . . . . . . . . . . . 18 92 4. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 93 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 94 5.1. Normative References . . . . . . . . . . . . . . . . . . 19 95 5.2. Informative References . . . . . . . . . . . . . . . . . 19 97 Appendix A. API sketch in Golang . . . . . . . . . . . . . . . . 21 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 100 1. Introduction 102 The BSD Unix Sockets API's SOCK_STREAM abstraction, by bringing 103 network sockets into the UNIX programming model, allowing anyone who 104 knew how to write programs that dealt with sequential-access files to 105 also write network applications, was a revolution in simplicity. It 106 would not be an overstatement to say that this simple API is the 107 reason the Internet won the protocol wars of the 1980s. SOCK_STREAM 108 is tied to the Transmission Control Protocol (TCP), specified in 1981 109 [RFC0793]. TCP has scaled remarkably well over the past three and a 110 half decades, but its total ubiquity has hidden an uncomfortable 111 fact: the network is not really a file, and stream abstractions are 112 too simplistic for many modern application programming models. 114 In the meantime, the nature of Internet access, and the variety of 115 Internet transport protocols, is evolving. The challenges that new 116 protocols and access paradigms present to the sockets API and to 117 programming models based on them inspire the design elements of a new 118 approach 120 Many end-user devices are connected to the Internet via multiple 121 interfaces, which suggests it is time to promote the paths by which 122 two endpoints are connected to each other to a first-order object. 123 While implicit multipath communication is available for these 124 multihomed nodes in the present Internet architecture with the 125 Multipath TCP extension (MPTCP) [RFC6824], MPTCP was specifically 126 designed to hide multipath communication from the application for 127 purposes of compatibility. Since many multihomed nodes are connected 128 to the Internet through access paths with widely different properties 129 with respect to bandwidth, latency and cost, adding explicit path 130 control to MPTCP's API would be useful in many situations. 131 Applications also need control over cooperation with path elements 132 via mechanisms such as that proposed by the Path Layer UDP Substrate 133 (PLUS) effort (see [I-D.trammell-plus-statefulness] and 134 [I-D.trammell-plus-abstract-mech]). 136 Another trend straining the traditional layering of the transport 137 stack associated with the SOCK_STREAM interface is the widespread 138 interest in ubiquitous deployment of encryption to guarantee 139 confidentiality, authenticity, and integrity, in the face of 140 pervasive surveillance [RFC7258]. Layering the most widely deployed 141 encryption technology, Transport Layer Security (TLS), strictly atop 142 TCP (i.e., via a TLS library such as OpenSSL that uses the sockets 143 API) requires the encryption-layer handshake to happen after the 144 transport-layer handshake, which increases connection setup latency 145 on the order of one or two round-trip times, an unacceptable delay 146 for many applications. Integrating cryptographic state setup and 147 maintenance into the path abstraction naturally complements efforts 148 in new protocols (e.g. QUIC [I-D.ietf-quic-transport]) to mitigate 149 this strict layering. 151 To meet these challenges, we present the Post-Socket Application 152 Programming Interface (API), described in detail in this work. Post 153 is designed to be language, transport protocol, and architecture 154 independent, allowing applications to be written to a common abstract 155 interface, easily ported among different platforms, and used even in 156 environments where transport protocol selection may be done 157 dynamically, as proposed in the IETF's Transport Services working 158 group. 160 Post replaces the traditional SOCK_STREAM abstraction with an Message 161 abstraction, which can be seen as a generalization of the Stream 162 Control Transmission Protocol's [RFC4960] SOCK_SEQPACKET service. 163 Messages are sent and received on Carriers, which logically group 164 Messages for transmission and reception. For backward compatibility, 165 these Carriers can also be opened as Streams, presenting a file-like 166 interface to the network as with SOCK_STREAM. 168 Post replaces the notions of a socket address and connected socket 169 with an Association with a remote endpoint via set of Paths. 170 Implementation and wire format for transport protocol(s) implementing 171 the Post API are explicitly out of scope for this work; these 172 abstractions need not map directly to implementation-level concepts, 173 and indeed with various amounts of shimming and glue could be 174 implemented with varying success atop any sufficiently flexible 175 transport protocol. 177 The key features of Post as compared with the existing sockets API 178 are: 180 o Explicit Message orientation, with framing and atomicity 181 guarantees for Message transmission. 183 o Asynchronous reception, allowing all receiver-side interactions to 184 be event-driven. 186 o Explicit support for multistreaming and multipath transport 187 protocols and network architectures. 189 o Long-lived Associations, whose lifetimes may not be bound to 190 underlying transport connections. This allows associations to 191 cache state and cryptographic key material to enable fast 192 resumption of communication, and for the implementation of the API 193 to explicitly take care of connection establishment mechanics such 194 as connection racing [RFC6555] and peer-to-peer rendezvous 195 [RFC5245]. 197 o Transport protocol stack independence, allowing applications to be 198 written in terms of the semantics best for the application's own 199 design, separate from the protocol(s) used on the wire to achieve 200 them. This enables applications written to a single API to make 201 use of transport protocols in terms of the features they provide, 202 as in [I-D.ietf-taps-transports]. 204 This work is the synthesis of many years of Internet transport 205 protocol research and development. It is inspired by concepts from 206 the Stream Control Transmission Protocol (SCTP) [RFC4960], TCP Minion 207 [I-D.iyengar-minion-protocol], and MinimaLT[MinimaLT], among other 208 transport protocol modernization efforts. We present Post Sockets as 209 an illustration of what is possible with present developments in 210 transport protocols when freed from the strictures of the current 211 sockets API. While much of the work for building parts of the 212 protocols needed to implement Post are already ongoing in other IETF 213 working groups (e.g. MPTCP, QUIC, TLS), we argue that an abstract 214 programming interface unifying access all these efforts is necessary 215 to fully exploit their potential. 217 2. Abstractions and Terminology 218 +===============+ 219 | Message | 220 +===============+ 221 | ^ initiate() listen() 222 send() ready() | | 223 V | V V 224 +======================+ accept() +============+ 225 | |<---+------| | 226 | Carrier | | | Listener | 227 | |----+ | | 228 +======================+ +============+ 229 | | | 230 | | | 231 | +=======================+ 232 | | | durable end-to-end 233 | | Association | state via many paths/ 234 | | | policies and prefs 235 | +=======================+ 236 | | | 237 | | | 238 | +=========+ +=========+ 239 | | Local | | Remote | 240 | +=========+ +=========+ 241 | | | 242 +===========+ +==========+ 243 ephemeral | | | | 244 transport & | Transient |------->| Path | properties of 245 crypto state | | | | address pair 246 +===========+ +==========+ 248 Figure 1: Abstractions and relationships in Post Sockets 250 Post is based on a small set of abstractions, centered around a 251 Message Carrier as the entry point for an application to the 252 networking API. The relationships among them are shown in 253 Figure Figure 1 and detailed in this section. 255 2.1. Message Carrier 257 A Message Carrier (or simply Carrier) is a transport protocol stack- 258 independent interface for sending and receiving messages between an 259 application and a remote endpoint; it is roughly analogous to a 260 socket in the present sockets API. 262 Sending a Message over a Carrier is driven by the application, while 263 receipt is driven by the arrival of the last packet that allows the 264 Message to be assembled, decrypted, and passed to the application. 266 Receipt is therefore asynchronous; given the different models for 267 asynchronous I/O and concurrency supported by different platforms, it 268 may be implemented in any number of ways. The abstract API provides 269 only for a way for the application to register how it wants to handle 270 incoming messages. 272 All the Messages sent to a Message Carrier will be received on the 273 corresponding Message Carrier at the remote endpoint, though not 274 necessarily reliably or in order, depending on Message properties and 275 the underlying transport protocol stack. 277 A Message Carrier that is backed by current transport protocol stack 278 state (such as a TCP connection; see Section 2.6) is said to be 279 "active": messages can be sent and received over it. A Message 280 Carrier can also be "dormant": there is long-term state associated 281 with it (via the underlying Association; see Section 2.3), and it may 282 be able to reactivated, but messages cannot be sent and received 283 immediately. 285 If supported by the underlying transport protocol stack, a Message 286 Carrier may be forked: creating a new Message Carrier associated with 287 a new Message Carrier at the same remote endpoint. The semantics of 288 the usage of multiple Message Carriers based on the same Association 289 are application-specific. When a Message Carrier is forked, its 290 corresponding Message Carrier at the remote endpoint receives a fork 291 request, which it must accept in order to fully establish the new 292 carrier. Multiple message carriers between endpoints are implemented 293 differently by different transport protocol stacks, either using 294 multiple separate transport-layer connections, or using multiple 295 streams of multistreaming transport protocols. 297 To exchange messages with a given remote endpoint, an application may 298 initiate a Message Carrier given its remote (see Section 2.4 and 299 local (see Section 2.5) identities; this is an equivalent to an 300 active open. There are five special cases of Message Carriers, as 301 well, supporting different initiation and interaction patterns, 302 defined in the subsections below. 304 2.1.1. Listener 306 A Listener is a special case of Message Carrier which only responds 307 to requests to create a new Carrier from a remote endpoint, analogous 308 to a server or listening socket in the present sockets API. Instead 309 of being bound to a specific remote endpoint, it is bound only to a 310 local identity; however, its interface for accepting fork requests is 311 identical to that for fully fledged Message Carriers. 313 2.1.2. Source 315 A Source is a special case of Message Carrier over which messages can 316 only be sent, intended for unidirectional applications such as 317 multicast transmitters. Sources cannot be forked, and need not 318 accept forks. 320 2.1.3. Sink 322 A Sink is a special case of Message Carrier over which messages can 323 only be received, intended for unidirectional applications such as 324 multicast receivers. Sinks cannot be forked, and need not accept 325 forks. 327 2.1.4. Responder 329 A Responder is a special case of Message Carrier which may receive 330 messages from many remote sources, for cases in which an application 331 will only ever send Messages in reply back to the source from which a 332 Message was received. This is a common implementation pattern for 333 servers in client-server applications. A Responder's receiver gets a 334 Message, as well as a Source to send replies to. Responders cannot 335 be forked, and need not accept forks. 337 2.1.5. Stream 339 A Message Carrier may be irreversibly morphed into a Stream, in order 340 to provide a strictly ordered, reliable service as with SOCK_STREAM. 341 Morphing a Message Carrier into a Stream should return a "file-like 342 object" as appropriate for the platform implementing the API. 343 Typically, both ends of a communication using a stream service will 344 morph their respective Message Carriers independently before sending 345 any Messages. 347 Writing a byte to a Stream will cause it to be received by the 348 remote, in order, or will cause an error condition and termination of 349 the stream if the byte cannot be delivered. Due to the strong 350 sequential dependence on a stream, streams must always be reliable 351 and ordered. A Message Carrier may only be morphed to a Stream if it 352 uses transport protocol stack that provides reliable, ordered 353 service, and only before it is used to send a Message. 355 2.2. Message 357 A Message is an atomic unit of communication between applications. A 358 Message that cannot be delivered in its entirety within the 359 constraints of the network connectivity and the requirements of the 360 application is not delivered at all. 362 Messages can represent both relatively small structures, such as 363 requests in a request/response protocol such as HTTP; as well as 364 relatively large structures, such as files of arbitrary size in a 365 filesystem. 367 In the general case, there is no mapping between a Message and 368 packets sent by the underlying protocol stack on the wire: the 369 transport protocol may freely segment messages and/or combine 370 messages into packets. However, a message may be marked as 371 immediate, which will cause it to be sent in a single packet, if it 372 will fit. 374 This implies that both the sending and receiving endpoint, whether in 375 the application layer or the transport layer, must guarantee storage 376 for the full size of an Message. 378 Messages are sent over and received from Message Carriers (see 379 Section 2.1). 381 On sending, Messages have properties that allow the application to 382 specify its requirements with respect to reliability, ordering, 383 priority, idempotence, and immediacy; these are described in detail 384 below. Messages may also have arbitrary properties which provide 385 additional information to the underlying transport protocol stack on 386 how they should be handled, in a protocol-specific way. These stacks 387 may also deliver or set properties on received messages, but in the 388 general case a received messages contains only a sequence of ordered 389 bytes. 391 2.2.1. Lifetime and Partial Reliability 393 A Message may have a "lifetime" - a wallclock duration before which 394 the Message must be available to the application layer at the remote 395 end. If a lifetime cannot be met, the Message is discarded as soon 396 as possible. Messages without lifetimes are sent reliably if 397 supported by the transport protocol stack. Lifetimes are also used 398 to prioritize Message delivery. 400 There is no guarantee that a Message will not be delivered after the 401 end of its lifetime; for example, a Message delivered over a strictly 402 reliable transport will be delivered regardless of its lifetime. 403 Depending on the transport protocol stack used to transmit the 404 message, these lifetimes may also be signaled to path elements by the 405 underlying transport, so that path elements that realize a lifetime 406 cannot be met can discard frames containing the Messages instead of 407 forwarding them. 409 2.2.2. Priority 411 Messages have a "niceness" - a priority among other messages sent 412 over the same Message Carrier in an unbounded hierarchy most 413 naturally represented as a non-negative integer. By default, 414 Messages are in niceness class 0, or highest priority. Niceness 415 class 1 Messages will yield to niceness class 0 Messages sent over 416 the same Carrier, class 2 to class 1, and so on. Niceness may be 417 translated to a priority signal for exposure to path elements (e.g. 418 DSCP codepoint) to allow prioritization along the path as well as at 419 the sender and receiver. This inversion of normal schemes for 420 expressing priority has a convenient property: priority increases as 421 both niceness and lifetime decrease. A Message may have both a 422 niceness and a lifetime - Messages with higher niceness classes will 423 yield to lower classes if resource constraints mean only one can meet 424 the lifetime. 426 2.2.3. Dependence 428 A Message may have "antecedents" - other Messages on which it 429 depends, which must be delivered before it (the "successor") is 430 delivered. The sending transport uses deadlines, niceness, and 431 antecedents, along with information about the properties of the Paths 432 available, to determine when to send which Message down which Path. 434 2.2.4. Idempotence 436 A sending application may mark a Message as "idempotent" to signal to 437 the underlying transport protocol stack that its application 438 semantics make it safe to send in situations that may cause it to be 439 received more than once (i.e., for 0-RTT session resumption as in TCP 440 Fast Open, TLS 1.3, and QUIC). 442 2.2.5. Immediacy 444 A sending application may mark a Message as "immediate" to signal to 445 the underlying transport protocol stack that its application 446 semantics require it to be placed in a single packet, on its own, 447 instead of waiting to be combined with other messages or parts 448 thereof (i.e., for media transports and interactive sessions with 449 small messages). 451 2.2.6. Additional Events 453 Senders may also be asynchronously notified of three events on 454 Messages they have sent: that the Message has been transmitted, that 455 the Message has been acknowledged by the receiver, or that the 456 Message has expired before transmission/acknowledgment. Not all 457 transport protocol stacks will support all of these events. 459 2.3. Association 461 An Association contains the long-term state necessary to support 462 communications between a Local (see Section 2.5) and a Remote (see 463 Section 2.4) endpoint, such as cryptographic session resumption 464 parameters or rendezvous information; information about the policies 465 constraining the selection of transport protocols and local 466 interfaces to create Transients (see Section 2.6) to carry Messages; 467 and information about the paths through the network available 468 available between them (see Section 2.7). 470 All Message Carriers are bound to an Association. New Message 471 Carriers will reuse an Association if they can be carried from the 472 same Local to the same Remote over the same Paths; this re-use of an 473 Association may implies the creation of a new Transient. 475 2.4. Remote 477 A Remote represents information required to establish and maintain a 478 connection with the far end of an Association: name(s), address(es), 479 and transport protocol parameters that can be used to establish a 480 Transient; transport protocols to use; information about public keys 481 or certificate authorities used to identify the remote on connection 482 establishment; and so on. Each Association is associated with a 483 single Remote, either explicitly by the application (when created by 484 the initiation of a Message Carrier) or a Listener (when created by 485 forking a Message Carrier on passive open). 487 A Remote may be resolved, which results in zero or more Remotes with 488 more specific information. For example, an application may want to 489 establish a connection to a website identified by a URL 490 https://www.example.com. This URL would be wrapped in a Remote and 491 passed to a call to initiate a Message Carrier. The first pass 492 resolution might parse the URL, decomposing it into a name, a 493 transport port, and a transport protocol to try connecting with. A 494 second pass resolution would then look up network-layer addresses 495 associated with that name through DNS, and store any certificates 496 available from DANE. Once a Remote has been resolved to the point 497 that a transport protocol stack can use it to create a Transient, it 498 is considered fully resolved. 500 2.5. Local 502 A Local represents all the information about the local endpoint 503 necessary to establish an Association or a Listener: interface, port, 504 and transport protocol stack information, as well as certificates and 505 associated private keys to use to identify this endpoint. 507 2.6. Transient 509 A Transient represents a binding between a Message Carrier and the 510 instance of the transport protocol stack that implements it. As an 511 Association contains long-term state for communications between two 512 endpoints, a Transient contains ephemeral state for a single 513 transport protocol over a single Path at a given point in time. 515 A Message Carrier may be served by multiple Transients at once, e.g. 516 when implementing multipath communication such that the separate 517 paths are exposed to the API by the underlying transport protocol 518 stack. Each Transient serves only one Message Carrier, although 519 multiple Transients may share the same underlying protocol stack; 520 e.g. when multiplexing Carriers over streams in a multistreaming 521 protocol. 523 Transients are generally not exposed by the API to the application, 524 though they may be accessible for debugging and logging purposes. 526 2.7. Path 528 A Path represents information about a single path through the network 529 used by an Association, in terms of source and destination network 530 and transport layer addresses within an addressing context, and the 531 provisioning domain [RFC7556] of the local interface. This 532 information may be learned through a resolution, discovery, or 533 rendezvous process (e.g. DNS, ICE), by measurements taken by the 534 transport protocol stack, or by some other path information discovery 535 mechanism. It is used by the transport protocol stack to maintain 536 and/or (re-)establish communications for the Association. 538 The set of available properties is a function of the transport 539 protocol stacks in use by an association. However, the following 540 core properties are generally useful for applications and transport 541 layer protocols to choose among paths for specific Messages: 543 o Maximum Transmission Unit (MTU): the maximum size of an Message's 544 payload (subtracting transport, network, and link layer overhead) 545 which will likely fit into a single frame. Derived from signals 546 sent by path elements, where available, and/or path MTU discovery 547 processes run by the transport layer. 549 o Latency Expectation: expected one-way delay along the Path. 550 Generally provided by inline measurements performed by the 551 transport layer, as opposed to signaled by path elements. 553 o Loss Probability Expectation: expected probability of a loss of 554 any given single frame along the Path. Generally provided by 555 inline measurements performed by the transport layer, as opposed 556 to signaled by path elements. 558 o Available Data Rate Expectation: expected maximum data rate along 559 the Path. May be derived from passive measurements by the 560 transport layer, or from signals from path elements. 562 o Reserved Data Rate: Committed, reserved data rate for the given 563 Association along the Path. Requires a bandwidth reservation 564 service in the underlying transport protocol stack. 566 o Path Element Membership: Identifiers for some or all nodes along 567 the path, depending on the capabilities of the underlying network 568 layer protocol to provide this. 570 Path properties are generally read-only. MTU is a property of the 571 underlying link-layer technology on each link in the path; latency, 572 loss, and rate expectations are dynamic properties of the network 573 configuration and network traffic conditions; path element membership 574 is a function of network topology. In an explicitly multipath 575 architecture, application and transport layer requirements can be met 576 by having multiple paths with different properties to select from. 577 Transport protocol stacks can also provide signaling to devices along 578 the path, but this signaling is derived from information provided to 579 the Message abstraction. 581 2.8. Policy Context 583 A Local and a Remote is not necessarily enough to establish a Message 584 Carrier between two endpoints. For instance, an application may 585 require or prefer certain transport features (see 586 [I-D.ietf-taps-transports]) in the transport protocol stacks used by 587 the Transients underlying the Carrier; it may also prefer Paths over 588 one interface to those over another (e.g. WiFi access over LTE when 589 roaming on a foreign LTE network, due to cost). These policies are 590 expressed in a Policy Context bound to an Association. Multiple 591 policy contexts may be active at once; e.g. a system Policy Context 592 expressing administrative preferences about interface and protocol 593 selection, an application Policy Context expressing transport feature 594 information. The expression of policy contexts and the resolution of 595 conflicts among Policy Contexts is currently implementation-specific; 596 note that these are equivalent to the Policy API in the NEAT 597 architeture [NEAT]. 599 3. Abstract Programming Interface 601 We now turn to the design of an abstract programming interface to 602 provide a simple interface to Post's abstractions, constrained by the 603 following design principles: 605 o Flexibility is paramount. So is simplicity. Applications must be 606 given as many controls and as much information as they may need, 607 but they must be able to ignore controls and information 608 irrelevant to their operation. This implies that the "default" 609 interface must be no more complicated than BSD sockets, and must 610 do something reasonable. 612 o Reception is an inherently asynchronous activity. While the API 613 is designed to be as platform-independent as possible, one key 614 insight it is based on is that an Message receiver's behavior in a 615 packet-switched network is inherently asynchronous, driven by the 616 receipt of packets, and that this asynchronicity must be reflected 617 in the API. The actual implementation of receive and event 618 handling will need to be aligned to the method a given platform 619 provides for asynchronous I/O. 621 o A new API cannot be bound to a single transport protocol and 622 expect wide deployment. As the API is transport-independent and 623 may support runtime transport selection, it must impose the 624 minimum possible set of constraints on its underlying transports, 625 though some API features may require underlying transport features 626 to work optimally. It must be possible to implement Post over 627 vanilla TCP in the present Internet architecture. 629 The API we design from these principles is centered around a Carrier, 630 which can be created actively via initiate() or passively via a 631 listen(); the latter creates a Listener from which new Carriers can 632 be accept()ed. Messages may be created explicitly and passed to this 633 Carrier, or implicitly through a simplified interface which uses 634 default message properties (reliable transport without priority or 635 deadline, which guarantees ordered delivery over a single Carrier 636 when the underlying transport protocol stack supports it). 638 The current state of API development is illustrated as a set of 639 interfaces and function prototypes in the Go programming language in 640 Appendix A; future revisions of this document will give more a more 641 abstract specification of the API as development completes. 643 3.1. Example Connection Patterns 645 Here, we illustrate the usage of the API outlined in Appendix A for 646 common connection patterns. Note that error handling is ignored in 647 these illustrations for ease of reading. 649 3.1.1. Client-Server 651 Here's an example client-server application. The server echoes 652 messages. The client sends a message and prints what it receives. 654 The client in Figure 2 connects, sends a message, and sets up a 655 receiver to print messages received in response. The carrier is 656 inactive after the Initiate() call; the Send() call blocks until the 657 carrier can be activated. 659 // connect to a server given a remote 660 func sayHello() { 662 carrier := Initiate(local, remote) 664 carrier.Send([]byte("Hello!")) 665 carrier.Ready(func (msg InMessage) { 666 fmt.Println(string([]byte(msg)) 667 return false 668 }) 669 carrier.Close() 670 } 672 Figure 2: Example client 674 The server in Figure 3 creates a Listener, which accepts Carriers and 675 passes them to a server. The server echos the content of each 676 message it receives. 678 // run a server for a specific carrier, echo all its messages 679 func runMyServerOn(carrier Carrier) { 680 carrier.Ready(func (msg InMessage) { 681 carrier.Send(msg) 682 }) 683 } 685 // accept connections forever, spawn servers for them 686 func acceptConnections() { 687 listener := Listen(local) 688 listener.Accept(func(carrier Carrier) bool { 689 go runMyServerOn(carrier) 690 return true 691 }) 692 } 694 Figure 3: Example server 696 The Responder allows the server to be significantly simplified, as 697 shown in Figure 4. 699 func echo(msg InMessage, reply Sink) { 700 reply.Send(msg) 701 } 703 Respond(local, echo) 705 Figure 4: Example responder 707 3.1.2. Client-Server with Happy Eyeballs and 0-RTT establishment 709 The fundamental design of a client need not change at all for happy 710 eyeballs [RFC6555] (selection of multiple potential protocol stacks 711 through connection racing); this is handled by the Post Sockets 712 implementation automatically. If this connection racing is to use 713 0-RTT data (i.e., as provided by TCP Fast Open [RFC7413], the client 714 must mark the outgoing message as idempotent. 716 // connect to a server given a remote 717 func sayHelloQuickly() { 719 carrier := Initiate(local, remote) 721 carrier.SendMsg(OutMessage{Content: []byte("Hello!"), Idempotent: true}, nil, nil, nil) 722 carrier.Ready(func (msg InMessage) { 723 fmt.Println(string([]byte(msg))) 724 return false 725 }) 726 carrier.Close() 727 } 729 3.1.3. Peer to Peer with Network Address Translation 731 In the client-server examples shown above, the Remote given to the 732 Initiate call refers to the name and port of the server to connect 733 to. This need not be the case, however; a Remote may also refer to 734 an identity and a rendezvous point for rendezvous as in ICE 735 [RFC5245]. Here, each peer does its own Initiate call 736 simultaneously, and the result on each side is a Carrier attached to 737 an appropriate Association. 739 3.1.4. Multicast Receiver 741 A multicast receiver is implemented using a Sink attached to a Local 742 encapsulating a multicast address on which to receive multicast 743 datagrams. The following example prints messages received on the 744 multicast address forever. 746 func receiveMulticast() { 747 sink = NewSink(local) 748 sink.Ready(func (msg InMessage) { 749 fmt.Println(string([]byte(msg))) 750 return true 751 }) 752 } 754 3.2. Implementation Considerations 756 Here we discuss an incomplete list of API implementation 757 considerations that have arisen with experimentation with the 758 prototype in Appendix A. 760 3.2.1. Message Framing and Deframing 762 An obvious goal of Post Sockets is interoperability with non-Post 763 Sockets endpoints: a Post Sockets endpoint using a given protocol 764 stack must be able to communicate with another endpoint using the 765 same protocol stack, but not using Post Sockets. This implies that 766 the underlying transport protocol stack must support object framing, 767 in order to delimit Messages carried by protocol stacks that are not 768 themselves message-oriented. 770 Another goal of Post Sockets is to work over unmodified TCP. We 771 could simply define a Message Carrier over TCP to support only stream 772 morphing, but this would fall far short of our goal to transport 773 independence. Another approach is to recognize that almost every 774 protocol using TCP already has its own message delimiters, and to 775 allow the receiver of a Message to provide a deframing primitive to 776 the API. Experimentation with the best way to achieve this within 777 Post Sockets is underway. 779 3.2.2. Message Size Limitations 781 Ideally, Messages can be of infinite size. However, protocol stacks 782 and protocol stack implementations may impose their own limits on 783 message sizing; For example, SCTP [RFC4960] and TLS 784 [I-D.ietf-tls-tls13] impose record size limitations of 64kB and 16kB, 785 respectively. Message sizes may also be limited by the available 786 buffer at the receiver, since a Message must be fully assembled by 787 the transport layer before it can be passed on to the application 788 layer. Since not every transport protocol stack implements the 789 signaling necessary to negotiate or expose message size limitations, 790 these are currently configured out of band, and are probably best 791 exposed through the policy context. 793 A truly infinite message service - e.g. large file transfer where 794 both endpoints have committed persistent storage to the message - is 795 probably best realized as a layer above Post Sockets, and may be 796 added as a new type of Message Carrier to a future revision of this 797 document. 799 3.2.3. Backpressure 801 Regardless of how asynchronous reception is implemented, it is 802 important for an application to be able to apply receiver 803 backpressure, to allow the protocol stack to perform receiver flow 804 control. Depending on how asynchronous I/O works in the platform, 805 this could be implemented by having a maximum number of concurrent 806 receive callbacks, for example. 808 4. Acknowledgments 810 Many thanks to Laurent Chuat and Jason Lee at the Network Security 811 Group at ETH Zurich for contributions to the initial design of Post 812 Sockets. Thanks to Joe Hildebrand, Martin Thomson, and Michael Welzl 813 for their feedback, as well as the attendees of the Post Sockets 814 workshop in February 2017 in Zurich for the discussions, which have 815 improved the design described herein. 817 This work is partially supported by the European Commission under 818 Horizon 2020 grant agreement no. 688421 Measurement and Architecture 819 for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat 820 for Education, Research, and Innovation under contract no. 15.0268. 821 This support does not imply endorsement. 823 5. References 825 5.1. Normative References 827 [I-D.ietf-taps-transports] 828 Fairhurst, G., Trammell, B., and M. Kuehlewind, "Services 829 provided by IETF transport protocols and congestion 830 control mechanisms", draft-ietf-taps-transports-14 (work 831 in progress), December 2016. 833 5.2. Informative References 835 [I-D.ietf-quic-transport] 836 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 837 and Secure Transport", draft-ietf-quic-transport-01 (work 838 in progress), January 2017. 840 [I-D.ietf-tls-tls13] 841 Rescorla, E., "The Transport Layer Security (TLS) Protocol 842 Version 1.3", draft-ietf-tls-tls13-18 (work in progress), 843 October 2016. 845 [I-D.iyengar-minion-protocol] 846 Jana, J., Cheshire, S., and J. Graessley, "Minion - Wire 847 Protocol", draft-iyengar-minion-protocol-02 (work in 848 progress), October 2013. 850 [I-D.trammell-plus-abstract-mech] 851 Trammell, B., "Abstract Mechanisms for a Cooperative Path 852 Layer under Endpoint Control", draft-trammell-plus- 853 abstract-mech-00 (work in progress), September 2016. 855 [I-D.trammell-plus-statefulness] 856 Kuehlewind, M., Trammell, B., and J. Hildebrand, 857 "Transport-Independent Path Layer State Management", 858 draft-trammell-plus-statefulness-02 (work in progress), 859 December 2016. 861 [MinimaLT] 862 Petullo, W., Zhang, X., Solworth, J., Bernstein, D., and 863 T. Lange, "MinimaLT, Minimal-latency Networking Through 864 Better Security", May 2013. 866 [NEAT] Grinnemo, K-J., Tom Jones, ., Gorry Fairhurst, ., David 867 Ros, ., Anna Brunstrom, ., and . Per Hurtig, "Towards a 868 Flexible Internet Transport Layer Architecture", June 869 2016. 871 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 872 RFC 793, DOI 10.17487/RFC0793, September 1981, 873 . 875 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 876 RFC 4960, DOI 10.17487/RFC4960, September 2007, 877 . 879 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 880 (ICE): A Protocol for Network Address Translator (NAT) 881 Traversal for Offer/Answer Protocols", RFC 5245, 882 DOI 10.17487/RFC5245, April 2010, 883 . 885 [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with 886 Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April 887 2012, . 889 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 890 "TCP Extensions for Multipath Operation with Multiple 891 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 892 . 894 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 895 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 896 2014, . 898 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 899 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 900 . 902 [RFC7556] Anipko, D., Ed., "Multiple Provisioning Domain 903 Architecture", RFC 7556, DOI 10.17487/RFC7556, June 2015, 904 . 906 Appendix A. API sketch in Golang 908 The following sketch is a snapshot of an API currently under 909 development in Go, available at https://github.com/mami-project/ 910 postsocket. The details of the API are still under development; once 911 the API definition stabilizes, this will be expanded into prose in a 912 future revision of this draft. 914 // The interface to path information is TBD 915 type Path interface{} 917 // An association encapsulates an endpoint pair and the set of paths between them. 918 type Association interface { 919 Local() Local 920 Remote() Remote 921 Paths() []Path 922 } 924 // A message together with with metadata needed to send it 925 type OutMessage struct { 926 // The content of this message, as a byte array 927 Content []byte 928 // The niceness of this message. 0 is highest priority. 929 Niceness uint 930 // The lifetime of this message. After this duration, the message may expire. 931 Lifetime time.Duration 932 // Pointers to messages that must be sent before this one. 933 Antecedent []*OutMessage 934 // True if the message is safe to send such that it may be received multiple times (i.e. for 0-RTT). 935 Idempotent bool 936 } 938 // A message received from a stream 939 type InMessage []byte 941 // A Carrier is a transport protocol stack-independent interface for sending and 942 // receiving messages between an application and a remote endpoint; it is roughly 943 // analogous to a socket in the present sockets API. 944 type Carrier interface { 945 // Send a byte array on this Carrier as a message with default metadata 946 // and no notifications. 947 Send(buf []byte) error 949 // Send a message on this Carrier. The optional onSent function will be 950 // called when the protocol stack instance has sent the message. The 951 // optional onAcked function will be called when the receiver has 952 // acknowledged the message. The optional onExpired function will be 953 // called if the message's lifetime expired before the message coult be 954 // sent. If the Carrier is not active, attempt to activate the Carrier 955 // before sending. 956 Sendmsg(msg *OutMessage, onSent func(), onAcked func(), onExpired func()) error 958 // Signal that an application is ready to receive messages via a given callback. 959 // Messages will be given to the callback until it returns false, or until the 960 // Carrier is closed. 961 Ready(receive func(InMessage) bool) error 963 // Retrieve the Association over which this Carrier is running. 964 Association() *Association 966 // Retrieve the active Transients over which this carrier is running, if active. 967 Transients() []Transient 969 // Determine whether the Carrier is currently active 970 IsActive() bool 972 // Ensure that the Carrier is active and ready to send and receive messages. 973 // Attempts to bring up at least one Transient. 974 Activate(isActive func()) error 976 // Terminate the Carrier 977 Close() 979 // Mutate to a file-like object 980 AsStream() io.ReadWriteCloser 982 // Attempt to fork a new Carrier for communicating with the same Remote 983 Fork() (Carrier, error) 985 // Signal that an application is ready to accept forks via a given callback. 986 // Forked carriers will be given to the callback until it returns false or 987 // until the Carrier is closed. 988 Accept(accept func(Carrier) bool) error 989 } 991 // Initiate a Carrier from a given Local to a given Remote. Returns a new 992 // Carrier, which may be bound to an existing or a new Association. The 993 // initiated Carrier is not yet active. 994 func Initiate(local Local, remote Remote) (Carrier, error) 996 type Listener interface { 997 // Signal that an application is ready to accept forks via a given callback. 999 // Accept will terminate when the callback returns false, or until the 1000 // Listener is closed. 1001 Accept(accept func(Carrier) bool) error 1003 // Terminate this Listener 1004 Close() 1005 } 1007 // Create a Listener on a given Local which will pass new Carriers to the 1008 // given channel until that channel is closed. 1009 func Listen(local Local) (Listener, error) 1011 // A Source is a unidirectional, send-only Carrier. 1012 type Source interface { 1013 // Send a byte array on this Source as a message with default metadata 1014 // and no notifications. 1015 Send(buf []byte) error 1017 // Send a message on this Source. The optional onSent function will be 1018 // called when the protocol stack instance has sent the message. The 1019 // optional onAcked function will be called when the receiver has 1020 // acknowledged the message. The optional onExpired function will be 1021 // called if the message's lifetime expired before the message coult be 1022 // sent. If the Source is not active, attempt to activate the Source 1023 // before sending. 1024 Sendmsg(msg *OutMessage, onSent func(), onAcked func(), onExpired func()) error 1026 // Retrieve the Association over which this Source is running. 1027 Association() *Association 1029 // Determine whether the Source is currently active 1030 IsActive() bool 1032 // Ensure that the Source is active and ready to send messages. 1033 // Attempts to bring up at least one Transient. 1034 Activate() error 1036 // Terminate the Source 1037 Close() 1038 } 1040 // Initiate a Source from a given Local to a given Remote. Returns a new 1041 // Source, which may be bound to an existing or a new Association. The 1042 // initiated Source is not yet active. 1043 func NewSource(local Local, remote Remote) (Source, error) 1045 // A Sink is a unidirectional, receive-only Carrier, bound only to a local. 1046 type Sink interface { 1047 // Signal that an application is ready to receive messages via a given callback. 1048 // Messages will be given to the callback until it returns false, or until the 1049 // Sink is closed. 1050 Ready(receive func(InMessage) bool) error 1052 // Retrieve the Association over which this Sink is running. 1053 Association() *Association 1055 // Terminate the Sink 1056 Close() 1057 } 1059 // Initiate a Sink on a given Local. Returns a new 1060 // Sink, which may be bound to an existing or a new Association. 1061 func NewSink(local Local) (Sink, error) 1063 // Initiate a Responder on a given Local. For each incoming Message, calls the 1064 // respond function with the Message and a Sink to send replies to. Calls the 1065 // Responder until it returns False, then terminates 1066 func Respond(local Local, respond func(msg InMessage, reply Sink) bool) error 1068 // A local identity 1069 type Local struct { 1070 // A string identifying an interface or set of interfaces to accept messages and new carriers on. 1071 Interface string 1072 // A transport layer port 1073 Port int 1074 // A set of zero or more end entity certificates, together with private 1075 // keys, to identify this application with. 1076 Certificates []tls.Certificate 1077 } 1079 // Encapsulate a remote identity. Since the contents of a Remote are highly 1080 // dependent on its level of resolution; some examples are below. 1081 type Remote interface { 1082 // Resolve this Remote Identity to a 1083 Resolve() ([]RemoteIdentity, error) 1084 // Returns True if the Remote is completely resolved; i.e., cannot be resol 1085 Complete() bool 1086 } 1088 // Remote consisting of a URL 1089 type URLRemote struct { 1090 URL string 1091 } 1093 // Remote encapsulating a name and port number 1094 type NamedEndpointRemote struct { 1095 Hostname string 1096 Port int 1097 } 1099 // Remote encapsulating an IP address and port number 1100 type IPEndpointRemote struct { 1101 Address net.IP 1102 Port int 1103 } 1105 // Remote encapsulating an IP address and port number, and a set of presented certificates 1106 type IPEndpointCertRemote struct { 1107 Address net.IP 1108 Port int 1109 Certificates []tls.Certificate 1110 } 1112 Authors' Addresses 1114 Brian Trammell 1115 ETH Zurich 1116 Gloriastrasse 35 1117 8092 Zurich 1118 Switzerland 1120 Email: ietf@trammell.ch 1122 Colin Perkins 1123 University of Glasgow 1124 School of Computing Science 1125 Glasgow G12 8QQ 1126 United Kingdom 1128 Email: csp@csperkins.org 1130 Tommy Pauly 1131 Apple Inc. 1132 1 Infinite Loop 1133 Cupertino, California 95014 1134 United States of America 1136 Email: tpauly@apple.com 1137 Mirja Kuehlewind 1138 ETH Zurich 1139 Gloriastrasse 35 1140 8092 Zurich 1141 Switzerland 1143 Email: mirja.kuehlewind@tik.ee.ethz.ch