idnits 2.17.1 draft-gettys-webmux-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 999 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 416 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** There are 11 instances of lines with control characters in the document. ** The abstract seems to contain references ([15], [3], [16], [17], [5], [6], [7], [8], [9], [10], [28], [11], [12], [13], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1, 1998) is 9399 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 465 looks like a reference -- Missing reference section? '28' on line 413 looks like a reference -- Missing reference section? '9' on line 125 looks like a reference -- Missing reference section? '10' on line 125 looks like a reference -- Missing reference section? '6' on line 154 looks like a reference -- Missing reference section? '11' on line 154 looks like a reference -- Missing reference section? '5' on line 171 looks like a reference -- Missing reference section? '8' on line 172 looks like a reference -- Missing reference section? '12' on line 180 looks like a reference -- Missing reference section? '15' on line 196 looks like a reference -- Missing reference section? '16' on line 196 looks like a reference -- Missing reference section? '13' on line 197 looks like a reference -- Missing reference section? '7' on line 224 looks like a reference -- Missing reference section? '3' on line 304 looks like a reference -- Missing reference section? '17' on line 428 looks like a reference -- Missing reference section? '20' on line 526 looks like a reference -- Missing reference section? '21' on line 622 looks like a reference -- Missing reference section? '22' on line 686 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 21 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT Jim Gettys, Compaq Computer Corporation 2 draft-gettys-webmux-00.txt Henrik Frystyk Nielsen, W3C, M.I.T 3 Expires January 1, 1999 August 1, 1998 5 The WebMUX Protocol 7 Status of This Document 9 This document is an Internet-Draft. Internet-Drafts are working documents of te 10 Internet Engineering Task Force (IETF), its areas, and its working groups. Note 11 that other groups may also distribute working documents as Internet-Drafts. 13 Internet-Drafts are draft documents valid for a maximum of six months and may be 14 updated, replaced, or obsoleted by other documents at any time. It is 15 inappropriate to use Internet-Drafts as reference material or to cite them other 16 than as "work in progress." 18 To view the entire list of current Internet-Drafts, please check the 19 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories 20 on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it 21 (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or 22 ftp.isi.edu (US West Coast). 24 This document describes an experimental design for a multiplexing transport, 25 intended for, but not restricted to use with the Web. WebMUX has been 26 implemented as part of the HTTP/NG project. Use of this protocol is EXPERIMENTAL 27 at this time and the protocol may change. In particular, transition strategies 28 to use of WebMUX have not been definitively worked out. You have been warned! 30 Distribution of this document is unlimited. Please send comments to the HTTP-NG 31 mailing list at . Discussions are archived at 32 "http://lists.w3.org/Archives/Public/www-http-ng-comments/". 34 Please read the "HTTP-NG Short- and Longterm Goals Document" [1] for a 35 discussion of goals and requirements of a potential new generation of the HTTP 36 protocol and how we intend to evaluate these goals. 38 General information about the Project as well as new draft revisions, related 39 discussions, and background information is linked from 40 "http://www.w3.org/Protocols/HTTP-NG/". 42 Note: Since internet drafts are subject to frequent change, you are advised to 43 reference the Internet Draft directory. This work is part of the W3C HTTP/NG 44 Activity (for current status, see http://www.w3.org/Protocols/HTTP-NG/Activity). 46 Abstract 48 This document defines the experimental multiplexing protocol referred to as 49 "WebMUX". WebMUX is a session management protocol separating the underlying 50 transport from the upper level application protocols. It provides a lightweight 51 communication channel to the application layer by multiplexing data streams on 52 top of a reliable stream oriented transport. By supporting coexistence of 53 multiple application level protocols (e.g. HTTP and HTTP/NG), WebMUX should ease 54 transitions to future Web protocols, and communications of client applets using 55 private protocols with servers over the same TCP connection as the HTTP 56 conversation. 58 WebMUX is intended for, but by no means restricted to, transport of Web related 59 protocols; the name has been chosen to reduce confusion with other existing 60 multiplexing protocols. 62 This document is part of a suite of documents describing the HTTP-NG design and 63 prototype implementation: 64 * HTTP-NG Short- and Longterm Goals, ID 65 * HTTP-NG Architectural Model, ID 66 * HTTP-NG Wire Protocol, ID 67 * The Classic Web Interfaces in HTTP-NG, ID 68 * Description of the HTTP-NG Testbed, ID 70 Changes from Previous Version 71 * Changed name from SMUX to WebMUX to reduce confusion with SNMP related 72 protocol. 73 * Split protocol ID address space to allow an address space for servers to 74 use to identify protocols outside of the control of this document. 75 * Elaborated endpoint usage. 76 * Prepared to meet IETF ID standards. 77 * Added acknowlegements section. 78 * Some reorganization of the document 80 ------------------------------------------------------ 82 Contents 83 1. The WebMUX Protocol 84 2. Status of This Document 85 3. Abstract 86 1. Changes from Previous Version 87 4. Contents 88 5. Introduction 89 1. Goals 90 6. WebMUX Protocol Operation 91 1. Key Words 92 2. Deadlock Schenario 93 3. Deadlock Avoidance 94 4. Operation and Implementation Considerations 95 5. WebMUX Header 96 6. Alignment 97 7. Long Fragments 98 8. Atoms 99 9. Protocol ID's 100 10. Session ID Allocation 101 11. Session Establishment 102 12. Graceful Release 103 13. Disgraceful Release 104 14. Message Boundaries 105 15. Flow Control 106 16. End Points 107 17. Control Messages 108 7. Security Considerations 109 8. Remaining Issues for Discussion 110 9. Comparison with SCP (TMP) 111 10. Closed Issues from Discussion and Email 112 11. Acknowlegments 113 12. References 114 13. Author's Addresses 116 ------------------------------------------------------ 118 Introduction 120 The Internet is suffering from the effects of the HTTP/1.0 protocol, which was 121 designed without understanding of the underlying TCP [1] transport protocol. 122 HTTP/1.0 opens a TCP connection for each URI [28] retrieved (at a cost of both 123 packets and round trip times (RTTs)), and then closes the TCP connection. For 124 small HTTP requests, these TCP connections have poor performance due to TCP slow 125 start [9] [10] as well as the round trips required to open and close each TCP 126 connection. 128 There are (at least) three reasons why multiple simultaneous TCP connections 129 have come into widespread use on the Internet despite the apparent 130 inefficiencies: 131 1. A client using multiple TCP connections gains a significant advantage in 132 perceived performance by the end-user, as it allows for early retrieval of 133 metadata (e.g. size) of embedded objects in a page. This allows a client 134 to format a page sooner without suffering annoying reformatting of the 135 page. Clients which open multiple TCP connections in parallel to the same 136 server, however could cause self congestion on heavily congested links, 137 since packets generated by TCP opens and closes are not themselves 138 congestion controlled. 139 2. The additional TCP opens cause performance problems in the network, but a 140 client that opens multiple TCP connections simultaneously to the same 141 server may also receive an "unfair" bandwidth advantage in the network 142 relative to clients that use a single TCP connection. This problem is not 143 solvable at the application level; only the network itself can enforce 144 such "fairness". 145 3. To keep low bandwidth/high latency links busy (e.g. dialup lines), more 146 than one TCP connection has been necessary since slow start may cause the 147 line to be partially idle. 149 The "Keep-Alive" extension to HTTP/1.0 is a form of persistent TCP connections 150 but does not work through HTTP/1.0 proxies and does not take pipelining of 151 requests into account. Instead a revised version of persistent TCP connections 152 was introduced in HTTP/1.1 as the default mode of operation. 154 HTTP/1.1 [6] persistent connections and pipelining [11] will reduce network 155 traffic and the amount of TCP overhead caused by opening and closing TCP 156 connections. However, the serialized behavior of HTTP/1.1 pipelining does not 157 adequately support simultaneous rendering of inlined objects - part of most Web 158 pages today; nor does it provide suitable fairness between protocol flows, or 159 allow for graceful abortion of HTTP transactions without closing the TCP 160 connection (quite common in HTTP operation). 162 Persistent connections and pipelining, however, do not fully address the 163 rendering nor the fairness problems described above. A "hack" solution is 164 possible using HTTP range requests; however, this approach does not, for 165 example, allow a server to send just the metadata contained in embedded object 166 before sending the object itself, nor does it solve the TCP connection abort 167 problem. 169 Current TCP implementations do not share congestion information across multiple 170 simultaneous TCP connections between two peers, which increases the overhead of 171 opening new TCP connections. We expect that Transactional TCP [5] and sharing of 172 congestion information in TCP control blocks [8] will improve TCP performance by 173 using less RTTs and better congestion behavior, making it more suitable for HTTP 174 transactions. 176 The solution to these problems requires two actions; either by itself will not 177 entirely discourage opening multiple TCP connections to the same server from a 178 client. 179 * Internet service providers should enable the Random Early Detection (RED) 180 [12] or other active congestion control algorithms in their routers to 181 ensure bandwidth fairness to clients when the network is congested. RED 182 also addresses queue length problems observed in routers today. 183 * Development and deployment of a multiplexing protocol for use with HTTP 184 (and eventually other protocols), so that multiple objects from a web 185 server can be fetched approximately simultaneously over a single TCP 186 connection, so that the metadata to objects can be sent to clients without 187 other metadata waiting for the rest of the first object requested. 189 This document describes such an experimental multiplexing protocol. It is 190 designed to multiplex a TCPconnection underneath HTTP so that HTTP itself does 191 not have to change, and allow coexistence of multiple protocols (e.g. HTTP and 192 HTTP/NG), which will ease transitions to future Web protocols, and 193 communications of client applets using private protocols with servers over the 194 same TCP connection as the HTTP conversation. 196 Ideas from this design come from Simon Spero's SCP [15] [16] description and 197 from experience from the X Window System's protocol design [13]. 199 Goals 201 We believe WebMUX meets the following goals we believe necessary for the use of 202 a multiplexing protocol for the Web: 203 * Unconfirmed service without negotiation or round trips to the server 204 * simple design 205 * high performance 206 * deadlock-free, by a credit based flow control scheme. 207 * allow multiple protocols to be multiplexed over same TCP connection 208 * allow connections to be established in either direction (enabling 209 callbacks to the session initiator). 210 * ability to build a full function socket interface above this protocol. 211 * low overhead 212 * preserves alignment in the data stream, so that it is easy to use with 213 protocols that marshal their data in a binary form. 215 ------------------------------------------------------ 217 WebMUX Protocol Operation 219 Key Words 221 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 222 "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be 223 interpreted as described in RFC 2119 [7]. 225 Deadlock Scenario 227 Multiplexing multiple sessions over a single transport TCP connection introduces 228 a potential deadlock that WebMUX is designed to avoid. 230 Here is an example of potential deadlock: 231 * Presume that each session is being handled by an independent thread and 232 that memory available to the WebMUX implementation is limited (for 233 example, on a thin client on a meter reader). 234 * For the purposes of this example, presume the thin client has 50K bytes of 235 buffer available to its WebMUX implementation, and cannot get more. 236 * The sender of data decides to send, as part of a session request (SYN 237 message), 100K bytes of initial data. There are no other senders, so all 238 of the data gets transmitted. But the thread to deal with the message is 239 blocked, and cannot make progress. 240 * Unless WebMUX can buffer all 100K (or 1 meg, or pick your favorite 241 numbers), any other session's data would be blocked behind this initial 242 transmission until and unless WebMUX can read and buffer the data 243 someplace (and since it has no buffer available, the deadlock occurs). 244 Many similar (but possibly harder to explain) deadlocks are possible. 246 This example points out that deadlock is possible: WebMUX must be able to buffer 247 data independently of the consumers of the data. It must also have some way to 248 throttle sessions where the consumer of the data is not responsive in the 249 multiplexing layer (in this example, prevent the transmission of more than 50 250 Kbytes of data). Note that this deadlock is independent of the size of any 251 multiplexing fragment, but strictly dependent on availability of buffer space in 252 WebMUX for a particular session. 254 Deadlock Avoidance 256 In WebMUX, the receiver makes a promise (sends a credit) to the transmitter that 257 a certain amount of buffer space is available (or at least that it will consume 258 the bytes, if not buffer them, e.g. a real time audio protocol where the data is 259 disposed of), and the transmitter promises not to send more data than the 260 receiver has promised (no more than the credit). If these promises are met, then 261 WebMUX will not deadlock. The AddCredit control message is used to add a credit 262 to a session. 264 A WebMUX implementation MUST maintain and adhere to the credit system or it can 265 deadlock. Implementations on systems with large amounts of memory (e.g. VM 266 systems) may be quite different than ones on thin clients with limited, 267 non-virtual memory. It is reasonable on a VM system to hand out credits freely 268 (analogous to the virtual socket buffering found in TCP implementations); but 269 your implementation must be careful to test its credit mechanisms so that they 270 will inter operate with limited memory systems. Credit control messages MAY be 271 sent on sessions that are not active. 273 Sessions have an initial credit size (initial_default_credit) of 16 KB on each 274 session; the SetDefaultCredit control message can set this initial credit to 275 something larger than the default. 277 Operation and Implementation Considerations 279 A transmitter MUST NOT transmit more data in a fragment than the available 280 credit on the session (or it could deadlock). 282 An WebMUX implementation MUST fragment streams when transmitting them into 283 fragments. The fragment size can be controlled using the SetMSS control message. 284 The max_fragment_size, a variable which is maintained on (currently) a per 285 transport TCP connection basis, determines the largest possible fragment a 286 sender should ever send to a receiver. This determines the maximum latency 287 introduced by a WebMUX layer above and beyond the inherent TCP latencies (socket 288 buffering on both sender and receiver and the delay-bandwidth product amount of 289 data that could be in flight at any given instant). A client on a low bandwidth 290 link, or with limited memory buffering might decide to set the 291 max_fragment_size down to control latency and buffer space required. 293 If max_fragment_size is set to zero, the transmitter is left to determine the 294 fragment size and MAY take into account application protocol knowledge (e.g. a 295 WebMUX implementation for HTTP might send fragments of the metadata of embedded 296 objects, or the next phase of a progressive image format, which it only knows). 297 An implementation SHOULD honor the max_fragment_size as it transmits data, if it 298 has been set by the receiver. 300 An WebMUX implementation that does not have explicit knowledge or experience of 301 good fragment sizes might use these guidelines as a starting point: 302 * The path_MTU of the TCP connection, minus the size of the TCP and IP 303 headers (remember that IPV6 may have longer headers!) and 8 bytes for an 304 WebMUX header, if this information is available [3]. 305 * The MSS of the TCP connection, if the path_MTU is not available 306 * In either case, you probably want to subtract 8 bytes to make sure a 307 WebWebMUX header can be added without forcing another TCP segment. 309 This would result in fragmentation roughly similar to TCP segmentation over 310 multiple TCP connections. 312 An implementation should round robin between sessions with data to send in some 313 fashion to avoid starving sessions, or allowing a single thread to monopolize 314 the TCP connection. Exact details of such behavior is left to the 315 implementation. To achieve highest bandwidth and lowest overhead WebMUX 316 behavior, credits should be handed out in reasonably large chunks. TCP 317 implementations typically send an ack message on every other packet, and it is 318 very hard to arrange to piggyback acks on data segments in implementations. 319 Therefore, for WebMUX to have reasonably low overhead credits should be handed 320 out in some significant multiple (4 or more times larger) than the ~3000 bytes 321 represented by two packets on an ethernet. The outstanding credit balance across 322 active sessions will also have to be larger than the bandwidth/delay product of 323 the TCP connection if WebMUX is not to become a limit on TCP transport 324 performance. 326 Both of these arguments indicate that outstanding credits in many 327 implementations should be 10K bytes or more. Implementations SHOULD piggyback 328 credit messages on data packets where possible, to avoid unneeded packets on the 329 wire. A careful implementation in which both ends of the TCP connection are 330 regularly sending some payload should be able to avoid sending extra packets on 331 the network. 333 If necessary, we could add in a future version fragmentation control messages to 334 do some bandwidth allocation, but for now, we are not bothering. 336 WebMUX Header 338 WebMUX headers are always in big endian byte order. 339 If people want, we could expand out the union below on a control message type 340 basis (e.g. the way the C bindings to X events were written out...). For this 341 draft, I'm not doing so. 342 #define MUX_CONTROL 0x00800000 343 #define MUX_SYN 0x00400000 344 #define MUX_FIN 0x00200000 345 #define MUX_RST 0x00100000 346 #define MUX_PUSH 0x00080000 347 #define MUX_SESSION 0xFF000000 348 #define MUX_LONG_LENGTH 0xFF040000 349 #define MUX_LENGTH 0x0003FFFF 351 typedef unsigned int flagbit; 352 struct w3mux_hdr { 353 union { 354 struct { 355 unsigned int session_id : 8; 356 flagbit control : 1; 357 flagbit syn : 1; 358 flagbit fin : 1; 359 flagbit rst : 1; 360 flagbit push : 1; 361 flagbit long_length : 1; 362 unsigned int fragment_size : 18; 363 int long_fragment_size : 32; 364 /* only present if long_length is set */ 365 } data_hdr; 366 struct { 367 unsigned int session_id : 8; 368 flagbit control : 1; 369 unsigned int control_code : 4; 370 flagbit long_length : 1; 371 unsigned int fragment_size : 18; 372 int long_fragment_size : 32; 373 /* only present if long_length is set */ 374 } control_message; 375 } contents; 376 }; 378 The fragment_size is always the size in bytes of the fragment, excluding the 379 WebMUX header and any padding. 381 Alignment 383 WebMUX headers are always (at least) 32 bit aligned. To find the next WebMUX 384 header, take the fragment_size, and round up to the next 32 bit boundary. 386 Transmitters MAY insert NoOp control messages to force 64 bit alignment of the 387 protocol stream. 389 Long Fragments 391 A WebMUX header with the long_length bit set must use the 32 bits following the 392 WebMUX header (the long_fragment_size field) for the value of the fragment_size 393 field, for whatever purpose the fragment_size field is being used for. 395 Atoms 397 Atoms are integers that are used as short-hand names for strings, which are 398 defined using the InternAtom control message. Atoms are only used as protocol 399 ID's in this version of WebMUX, though they might be used for other purposes in 400 future versions. Since the atom might be redefined at any time, it is not safe 401 to use an atom unless you have defined it (i.e. you cannot use atoms defined by 402 the other end of a mux connection). Atoms are therefore not unique values, and 403 only make sense in the context of a particular direction of a particular mux 404 connection. This restriction is to avoid having to define some protocol for 405 deallocating atoms, with any round trip overhead that would likely imply. 407 Strings are defined to be UTF-8 encoded UNICODE strings. (Note that an ascii 408 string is valid UTF-8). The definition of structure of these strings is outside 409 of the scope of this document, though we expect they will often be URI's, naming 410 a protocol or stack of protocols. Atoms always have values between 0x20000 and 411 0x200ff (a maximum of 256 atoms can be defined). 413 Strings used for protocol id's MUST be URIs [28]. 415 Protocol ID's 417 The protocol used by a session is identified by a Protocol ID, which can either 418 be an IANA port number, or an atom. 419 1. To allow higher layers to stack protocols (e.g. HTTP on top of deflate 420 compression, on top of TCP). 421 2. To identify the protocol or protocol stack in use so that application 422 firewall relays can perform sanity checking and policy enforcement on the 423 multiplexed protocols . 425 Firewall proxies can presume that the bytes should conform to that protocol 426 identified by the Protocol ID. 427 * 0-0xFFFF: IANA-registered TCP protocols [17] 428 * 0x10000-0x1FFFF: IANA-registered UDP protocols [17] 429 * 0x20000-0x2FFFF: per-underlying-connection-defined MUX atoms. 430 The scheme name of the URI indicates the protocol family being used (e.g. 431 http, ftp, etc.). 432 * 0x30000-0x3FFFF: server-assigned protocol IDs 433 The assignment of these ID's are outside the scope of this protocol, and 434 may pose additional security hazards. 436 Session ID Allocation 438 Each session is allocated a session identifier. Session Identifiers below 0 and 439 1 are reserved for future use. Session IDs allocated by initiator of the 440 transport TCP connection are even; those allocated by the receiver of the 441 transport connection odd. Proxies that do not understand messages of reserved 442 Session ID's should forward them unchanged. A session identifier MUST only be 443 deallocated and potentially reused by new sessions when a session is fully 444 closed in both directions. 446 Session Establishment 448 To establish a new session, the initiating end sends a SYN message, allocating a 449 free session number out of its address space. A session is established by 450 setting the SYN bit in the first message sent on that session. The session is 451 specified by the session_id field. The fragment_size field is interpreted as the 452 protocol ID of the session, as discussed above. 454 The receiver MUST either open the reverse path of that session (send a SYN 455 message), or it MUST send a FIN message to indicate that the reverse path is not 456 going to be used further, or send a RST message to indicate an error. This 457 enables the initiator of a session to know when it is safe to reuse that session 458 ID. 460 Graceful Release 462 A session is ended by sending a fragment with the FIN bit set. Each end of a 463 WebMUX connection may be closed independently. 465 WebMUX uses a half-close mechanism like TCP[1] to close data flowing in each 466 direction in a session. After sending a FIN fragment, the sender MUST NOT send 467 any more payload in that direction. 469 Disgraceful Release 471 A session may be terminated by sending a message with the RST bit set. All 472 pending data for that session should be discarded. "No such protocol" errors 473 detected by the receiver of a new session are signaled to the originator on 474 session creation by sending a message with the RST bit set. (Same as in TCP). 476 The payload of the fragment containing the RST bit contains the null terminated 477 string containing the URI of an error message (note that content negotiation 478 makes this message potentially multi-lingual), followed by a null terminated 479 UTF-8 string containing the reason for the reset (in case the URI is not 480 accessable). 482 Message Boundaries 484 A message boundary is marked by sending a message with the PUSH bit set. The 485 boundary is set between the last octet in this message, including that octet, 486 and the first byte of a subsequent message. This differs slightly from TCP, as 487 PUSH can be reliably used as a record mark. 489 Flow Control 491 Flow control is determined by a simple credit scheme described above by using 492 the AddCredit control message defined below. Fragments transmitted MUST never 493 exceed the outstanding credit for that session. The initial outstanding credit 494 for a session is 16Kbytes. 496 End Points 498 One of the major design goals of WebMUX is to allow callbacks to objects in the 499 process that initiated the transport TCP connection without requiring additional 500 TCP connections (with the overhead in both machine resources and time that this 501 would cause, or the problems with TCP connection establishment through 502 firewalls). 504 The DefineEndpoint control message allows one to advertize that a particular 505 (set of) URI's are reachable over the transport TCP connection. 507 A MUX protocol ID only identifies a MUX channel relative to a particular 508 "endpoint". The pair of completely identify a MUX 509 channel, without regard to IP address, TCP port, or other information. Endpoint 510 IDs are URI names for endpoints. Any endpoint may have multiple endpoint IDs. We 511 do not place any further restrictions on the types of URIs that are used as 512 endpoint IDs. 514 A client connecting from a MUX endpoint A to a MUX channel on a different 515 endpoint B may send an ID for A to B via the DefineEndpoint control message. If 516 a client in endpoint B then needs to connect to a MUX channel in endpoint A, it 517 may do so by using the existing lower-level byte stream originated from endpoint 518 A. A connection initiator may send multiple DefineEndpoint control messages with 519 different endpoint IDs for the same endpoint. 521 Connection initiators may wish to control the disclosure of endpoint 522 information, both for security purposes and for optimal application timing, and 523 should be given reasonable 525 Whether this relative URI naming can be used depends upon the scheme of the URI 526 [20], which defines its structure. For example, a firewall proxy might advertize 527 just "http:" for the proxy, claiming it can be used to contact any HTTP protocol 528 object anywhere, or "http://foo.com/bar/" to indicate that any object below that 529 point in the URI space on the server foo.com may be reached by this TCP 530 connection. A client might advertize that "http://myhost.com/" is available via 531 this transport TCP connection. 533 Control Messages 535 The control bit of the WebMUX header is always set in a control message. Control 536 messages can be sent on any session, even sessions that are not (yet) open. The 537 control_code reuses the SYN, FIN, RST, and PUSH bits of the WebMUX header. The 538 control_code of the control message determines the control message type. Any 539 unused data in a control message must be ignored. 541 The revised version of WebMUX means that a session creation costs 4 bytes (a 542 control message with SYN set, and with the protocol ID in the message). 543 Therefore the first fragment of payload has a total overhead of 8 bytes. (This 544 is presuming using an IANA based protocol, rather than a named protocol). This 545 is the same as the previous version, though it means two messages rather than 546 one. 548 The individual control message types are listed below (code Name direction; 549 description): 550 0 InternAtom Both 551 The session_id is used as the Atom to be defined (offset by 0x2000), so a 552 value of 0 is defining ID 0x2000). The fragment_size field is the length 553 of the UTF-8 encoded string. The fragment itself contains the string to be 554 interned. This allows the interning of 256 strings. (is this enough?). 555 1 DefineEndpoint Both 556 The session_id is ignored. The fragment_size is interpreted as the 557 protocol ID, naming an endpoint actually available on this transport TCP 558 connection. This enables a single transport TCP connection to be used for 559 callbacks, or to advertise that a protocol endpoint can be reached to the 560 process on the other end of the transport TCP connection. 561 2 SetMSS Both 562 This sets a limit on fragment sizes below the outstanding credit limit. 563 The session_id must be zero. The fragment_size field is used as 564 max_fragment_size (the largest fragment that be sent on any session on 565 this transport TCP connection.). A max_fragment_size of zero means there 566 is no limit on the fragment size allowed for this session. 567 3 AddCredit R->T 568 The session_id specifies the session. The fragment_size specifies the flow 569 control credit granted (to be added to the current outstanding credit 570 balance). A value of zero indicates no limit on how much data may be sent 571 on this session. 572 4 SetDefaultCredit R->T 573 The session_id must be zero. The fragment_size field is used as to set the 574 initial default credit limit for any incoming WebMUX connections over this 575 transport TCP connection. (i.e. it is short hand for sending a series of 576 AddCredit messages for each session ID). 577 5 NoOp Both 578 This control message is defined to perform no function. Any data in the 579 payload should be ignored. 580 6-15 - Undefined. 581 Reserved for future use. Must be ignored if not understood, and forwarded 582 by any proxies. The fragment_size is always used for the length of the 583 control message, and any data for the control message will be in the 584 payload of the control message (to allow proxies to be able to forward 585 future control messages). 587 ------------------------------------------------------ 589 Security Considerations 591 Advertizing endpoints inappropriately might allow a client to connect to 592 services that should be protected. 594 Using the protocol ID range 0x30000-0x3FFFF for server-assigned protocol IDs may 595 prevent a firewall proxy from having enough information to safely proxy 596 protocols of those types. Firewall proxy implementers should not blindly forward 597 protocols of this range. 599 Firewall proxies implementing WebMUX should enforce appropriate policies for 600 protocols being multiplexed over WebMUX, in a fashion similar to the policies 601 imposed for native protocols. 603 Clearly, any security consideration for a protocol is likely to still apply to 604 its use when being multiplexed via WebMUX. 606 ------------------------------------------------------ 608 Remaining Issues for Discussion 610 When can WebMUX be used??? 611 * What are the appropriate strategies for determining if the WebMUX protocol 612 can be used? 613 * Name server hack? 614 * UPGRADE in HTTP? 615 * Remember that previous UPGRADE to use WebMUX worked? 616 * Should there be a more compact open message? 618 ------------------------------------------------------ 620 Comparison with SCP (TMP) 622 Note that TIP (Transaction Internet Protocol) [21] defines a version of SCP 623 called TMP . 625 Goals: 626 * Unconfirmed service without negotiation. 627 * SCP allows data to be sent with the session establishment; the recipient 628 does not confirm successful mux connection establishment, but may reject 629 unsuccessful attempts. This simplifies the design of the protocol, and 630 removes the latency required for a confirmed operation. 631 * simple design 632 * performance where critical 634 There are five issues that make SCP (TMP) inadequate for our use: 635 * SCP can deadlock, unless unlimited amounts of memory is available. 636 * it has no provision for multiplexing multiple protocols over the same 637 transport TCP connection, essential for graceful transition without 638 dependency on the currently incomplete NG design, and to allow other uses 639 which could use the same multiplexed connection (e.g. applet communication 640 with serverlets). 641 * SCP's 8 byte overhead is not reasonable most of the time. WebMUX uses four 642 bytes in the default case. The design below permits an 8 byte header if 643 you care to preserve 64 bit alignment at the cost of bytes. In practice, 644 there seems few data formats or architectures that actually require more 645 than 32 bit alignment. 646 * Without some form of flow control, infinite buffering in clients 647 (receivers) would be required. 648 * Alignment is preserved in the data stream. This allows compact, high speed 649 (un)marshalling code in implementations of binary protocols, without extra 650 data copies, which in such protocols can be significant overhead. 651 * SCP SYN in Version 2 requires a second message, which costs a round trip. 653 So far, WebMUX is similar to SCP. There are some important differences: 654 * deadlock-free (we believe), by a credit based flow control scheme. 655 * allow multiple protocols to be multiplexed over same TCP connection (not 656 available in SCP). 657 * lower overhead than SCP, while preserving data alignment (very important 658 for binary protocol marshaling code) 659 * ability to build a full function socket interface above this protocol. 660 * WebMUX avoids the SYN round trip of SCP V2 by session ID's being allocated 661 in independent address spaces. This also avoids many of the state 662 transitions of SCP, simplifying the protocol greatly. 663 * SCP has 224 sessions, which seems highly excessive, and reserves 1024 of 664 them for future use. 666 ------------------------------------------------------ 668 Closed Issues from Discussion and Mail 670 Some of the comments below allude to previous versions of the specification, and 671 may not make sense in the context of the current version. It will likely be 672 eliminated in future versions, but may answer some questions that arise when 673 reading this document. 675 Flow control: priority vs. credit schemes 677 Henrik and I have convinced ourselves there are fundamental differences between 678 a priority scheme and the credit scheme in this draft. They interact quite 679 differently with TCP, and priority schemes have no way to limit the total amount 680 of data being transmitted, though priority schemes are better matched to what 681 the Web wants. We've decided, at least for now, to defer any priority schemes to 682 higher level protocols. 684 Stacking Protocols and Transports (Stacks) 686 ILU [22] style protocol stacks are a GOOD THING. There have been too many 687 worries about the birthday problem for people to be comfortable with Bill 688 Janssen's hashing schemes (see Henrik Frystyk Nielsen and Robert Thau's mail on 689 this topic). We tried putting this directly in WebMUX in a previous version, and 690 experience shows that it didn't really help an implementer (in particular, Bill 691 Janssen while implementing ILU). This version has just the name of the protocol, 692 and it is left to others to implement any stacking (e.g. ILU). 694 We believe the name of the protocol is necessary, if WebMUX is ever to be used 695 with firewalls. Application level firewall relays need the protocol information 696 to sanity check the protocol being relayed. Application level relays are 697 considered much more secure than just punching holes in the firewall for 698 particular protocol families, which small organizations often find sufficient, 699 as the relay can sanity check the protocol stream and enable better policy 700 decisions (for example, to forbid certain datatypes in HTTP to transit a 701 firewall). Large organizations and large targets typically only run application 702 level proxies. 704 Byte Usage 706 Wasting bytes in general, and in particular at TCP connection establishment, for 707 a multiplexing transport must be avoided. There are several reasons for this: 708 * if the initial segment is too long, a network round trip will be lost to 709 TCP slow start, so bytes near the beginning of a conversation MAY BE much 710 more precious than bytes later in the conversation, once slow start 711 overhead has been paid. If the first segment is too long, you fall off a 712 cliff. 713 * Directly affects user perceived response; no cleverness of later packing 714 and batching of request can get the time back; each goes directly to 715 perceived latency when a user talks to the server for the first time. 717 So there is more than the usual tension between generality vs. performance. 718 Performance analysis 720 Human perception is about 30 milliseconds; if much more than this, the user 721 perceives delay. At 14.4 K baud, one byte uncompressed costs .55 milliseco nds 722 (ignoring modem latencies). On an airplane via telephone today, you get a 723 munificent 4800 baud, which is 3X slower. Cellular modems transmitting data 724 (CDPD), as I understand it, will give us around 20Kbaud, when deployed. 726 So basic multiplexing @ 4 byte overhead costs ~ 2 milliseconds on common modems. 727 This means basic overhead is small vs. human perception, for most low speed 728 situations, a good position to be in. 730 On WebMUX connection open, with above protocol we send 4 bytes in the setup 731 message, and then must open a session, requiring at least 8 bytes more. 12 bytes 732 == 7 milliseconds at 14.4K. Not 64 bit aligned, and 4 bytes costs of order 2 733 milliseconds. Ugh... Maybe a setup message isn't a good idea; other uses (e.g. 734 security) can be dealt with by a control message. 736 Multiple protocols over one WebMUX 738 We want to WebMUX multiple protocols simultaneously over the same transport TCP 739 connection, so we need to know what protocol is in use with each session, so the 740 demultipexor can hand the data to the right person. (e.g. SUNRPC and DCERCP 741 simultaneously). 743 There are two obvious ways I can see to do this: 744 a) Send a control message when a session is first used, indicating the 745 protocol. 746 Disadvantage: costs probably 8 bytes to do so (4 WebMUX overhead, and 4 747 byte message), and destroys potential 64 bit alignment. 748 b) If syn is set indicating new session, then steal mux_length field to 749 indicate protocol in use on that session. 750 (overhead; 4 bytes for the WebMUX header used just to establish the 751 session.) 753 Opinions? Mine is that b) is better than a. Answer: b) is the adopted strategy. 755 Priority... 757 For a given stream, priority will affect which session is handled when 758 multiplexing data; sending the priority on every block is unneeded, and would 759 waste bytes. There is one case in which priority might be useful: at an 760 intermediate proxy relaying sessions (and maybe remultiplexing them). 762 If so, it should be sent only when sessions are established or changed. Changes 763 can be handled by a control message. Opinions? 765 A priority field can be hacked into the length field with the protocol field 766 using b) above. 768 So the question is: is it important to send priority at all in this WebMUX 769 protocol? Or should priority control, if needed, be a control message? ; 770 (control message). 772 Answer: Not in this protocol. Opens Pandora's box with remultiplexors, which 773 could have denial of service attacks. 775 Setup message 777 Is any setup message needed? I don't think it is,. and initial bytes are 778 precious (see performance discussion above), and it complicates trivial use. If 779 we move the byte order flag to the WebMUX header, and use control messages if 780 other information needs to be sent, we can dispense with it, and the layer is 781 simpler. This is my current position, and unless someone objects with reasons, 782 I'll nuke it in the next version of this document. 784 Answer: Not needed. Nuked. 786 Byte order flags 788 While higher layer protocols using host dependent byte order can be a performan 789 ce win (when sending larger objects such as arrays of data), the overhead at 790 this layer isn't much, and may not be worth bothering with. Worst case (naive 791 code) would be four memory reads and 3 shift overhead/payload. Smart code is one 792 load and appropriate shifts etc. 794 Opinions? I'm still leaning toward swapping bytes here, but there are other 795 examples of byte load and shift (particularly slow on Alpha, but not much of an 796 issue on other systems). 798 Answer: Not sufficient performance gain at WebMUX level to be worth doing. 799 Defined as LE byte order for WebMUX headers. 801 Error handling 803 There are several error conditions, probably best reported via control messages 804 from server: 805 * No such protocol. Some sort of serial number should be reported, I 806 suppose; this serial number can be implicit as in X 807 * bad message. 808 * Some combinations of flag bits are not legal. 809 * Priority if it exists? 811 Any others? Any twists to worry about? 813 Answer: Only error that can occur is no such protocol, given no priority in the 814 base protocol. May still be some unresolved issues here around "Christma s Tree" 815 message (all bits turned on). 817 Length Field 819 Any reason to believe that the 32 bit length field for a single payload is 820 inadequate? I don't think so, and I live on an Alpha. 822 Answer: 32 bit extended length field for a single fragment is sufficient. 824 Compression 826 Does there need to be a bit saying the payload is compressed to avoid explosion 827 of protocol types? 829 Answer: Yes; introduction of control message to allow specification of transport 830 stacks achieves this. 832 Stacks 834 I think that we should be able to multiplex any TCP, UDP, or IP protocol. 835 Internet protocol numbers are 8 bit fields. 837 So we need 16 bits for TCP, one bit to distinguish TCP and UDP, and one bit more 838 we can use for IP protocol numbers and address space we can allocate privately. 839 This argues for an 18 bit length field to allow for this reuse. * 18 bit length 840 field * * 8 bit session field * * 4 control bits * * 1 long length bit * 842 The last bit is used to define control messages, which reuse the syn, fin, rst, 843 and push bits as a control_code to define the control message. There are 844 escapes, both by undefined control codes, and by the reservation of two sessions 845 for further use if there needs to be further extensions. The spec above reflects 846 this. 848 Alignment 850 Back to alignment. If we demand 4 byte alignment, for all requests that do not 851 end up naturally aligned, we waste bytes. Two bytes are wasted on average. At 852 14.4Kbaud the overhead for protocols that do not pad up would on mean be 6 bytes 853 or ~3ms, rather than 4 bytes or ~ 2 ms (presuming even distributions of length). 854 Note that this DOES NOT effect initial request latency (time to get first URL), 855 and is therefore less critical than elsewhere. 857 I have one related worry; it can sometimes be painful to get padding bytes at 858 the end of a buffer; I've heard of people losing by having data right up to the 859 end of a page, so implementations are living slightly dangerous ly if they 860 presume they can send the padding bytes by sending the 1, 2 or 3 bytes after the 861 buffer (rather than an independent write to the OS for padding bytes). 863 Alternatively, the buffer alignment requirement can be satisfied by 864 implementations remembering how many pad bytes have to be sent, and adjusting 865 the beginning address of the subsequent write by that many bytes before the 866 buffer where the WebMUX header has been put. Am I being unnecessarily paranoid? 868 Opinion: I believe alignment of fragments in general is a GOOD THING, and will 869 simplify both the WebMUX transport and protocols at higher levels if they can 870 make this presumption in their implementations. So I believe this overhead is 871 worth the cost; if you want to do better and save these bytes, then start 872 building an application specific compression scheme. If not, please make your 873 case. 875 Control bits 877 Are the four bits defined in Simon's flags field what we need? Are there any 878 others? 880 Answer: no. More bits than we need. Current protocol doesn't use as many. I've 881 ended back at the original bits specified, rather than the smaller set suggested 882 by Bill Janssen. This enables full emulation of all the details of a socket 883 interface, which would not otherwise be possible. See details around TCP and 884 socket handling, discussed in books like "TCP/IP Illustrated," by W. Richard 885 Stevens. 887 Am I all wet? 889 Opinion: I believe that we should do this. 891 Control Messages 893 Question: do we want/need a short control message? Right now, the out for 894 extensibility are control messages sent in the reserved (and as yet unspecified 895 ) control session. This requires a minimum of 8 bytes on the wire. We could 896 steal the last available bit, and allow for a 4 byte short control message, that 897 would have 18 bits of payload. 899 Opinion: Flow control needs it; protocol/transport stacks need it. Document 900 above now defines some control messages. 902 Simplicity of default Behavior 904 The above specification allows for someone who just wants to WebMUX a single 905 protocol to entirely ignore protocol ID's. 907 ------------------------------------------------------ 909 Acknowledgements 911 Contributors include (at least): Bill Janssen, Mike Spreitzer, Robert Thau, 912 Larry Masinter, Paul Leach, Paul Bennett, Rich Salz, Simon Spero, Mark Handey, 913 Anselm Baird-Smith, and Wan-Teh Chang. Our apologies to anyone we've missed. 915 ------------------------------------------------------ 917 References 918 1. J.. Postel, "Transmission Control Protocol", RFC 793, Network Information 919 Center, SRI International, September 1981 920 2. J. Postel, "TCP and IP bake off", RFC 1025, September 1987 921 3. J. Mogul, S. Deering, "Path MTU Discovery", RFC 1191, DECWRL, Stanford 922 University, November 1990 923 4. T. Berners-Lee, "Universal Resource Identifiers in WWW. A Unifying Syntax 924 for the Expression of Names and Addresses of Objects on the Network as 925 used in the World-Wide Web", RFC 1630, CERN, June 1994. 926 5. R. Braden, "T/TCP -- TCP Extensions for Transactions: Functional 927 Specification", RFC 1644, USC/ISI, July 1994 928 4. R. Fielding, "Relative Uniform Resource Locators", RFC 1808, UC Irvine, 929 June 1995. 930 5. T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext Transfer Protocol -- 931 HTTP/1.0", RFC 1945, W3C/MIT, UC Irvine, W3C/MIT, May 1996 932 6. R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, T. Berners-Lee, 933 "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, U.C. Irvine, DEC 934 W3C/MIT, DEC, W3C/MIT, W3C/MIT, January 1997 935 7. S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", 936 RFC 2119, Harvard University, March 1997 937 8. J. Touch, "TCP Control Block Interdependence", RFC 2140, April 1997 938 9. W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast Retransmit, and 939 Fast Recovery Algorithms", RFC 2001, January 1997 940 10. V. Jacobson, "Congestion Avoidance and Control", Proceedings of SIGCOMM 941 '88 942 11. H. Frystyk Nielsen, J. Gettys, A. Baird-Smith, E. Prud'hommeaux, H. W. 943 Lie, and C. Lilley, "Network Performance Effects of HTTP/1.1, CSS1, and 944 PNG", Proceedings of SIGCOMM '97 945 12. S. Floyd and V. Jacobson, "Random Early Detection Gateways for 946 Congestion Avoidance", IEEE/ACM Trans. on Networking, vol. 1, no. 4, Aug. 947 1993. 948 13. R.W.Scheifler, J. Gettys, "The X Window System" ACM Transactions on 949 Graphics # 63, Special Issue on User Interface Software, 5(2):79-109 950 (1986). 951 14. V. Paxson, "Growth Trends in Wide-Area TCP Connections" IEEE Network, 952 Vol. 8 No. 4, pp. 8-17, July 1994 953 15. S. Spero, "Session Control Protocol, Version 1.0" 954 16. S. Spero, " Session Control Protocol, Version 2.0" 955 17. Keywords and Port numbers are maintained by IANA in the port-numbers 956 registry. 957 18. Keywords and Protocol numbers are maintained by IANA in the 958 protocol-numbers registry. 959 19. W. Richard Stevens, "TCP/IP Illustrated, Volume 1", Addison-Wesley, 1994 960 20. Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource 961 Identifiers (URI): Generic Syntax and Semantics," Work in Progress of the 962 IETF, November, 1997. 963 21. J. Lyon, K. Evans, J. Klein, "Transaction Internet Protocol Version 964 2.0," Work in Progress of the Transaction Internet Protocol Working Group, 965 November, 1997. 966 22. B. Janssen, M. Spreitzer, " Inter-Language Unification"; in particular 967 see the manual section on Protocols and Transports. 969 ------------------------------------------------------ 971 Authors' Addresses 972 * James Gettys 973 MIT Laboratory for Computer Science 974 545 Technology Square 975 Cambridge, MA 02139, USA 976 Fax: 1 (617) 258 8682 977 Email: jg@pa.dec.com 978 * Henrik Frystyk Nielsen 979 W3C/MIT Laboratory for Computer Science 980 545 Technology Square 981 Cambridge, MA 02139, USA 982 Fax: +1 (617) 258-8682 983 Email: frystyk@w3.org 985 ------------------------------------------------------ 987 @(#) $Id: WD-mux.html,v 1.4 1998/08/03 18:36:32 frystyk Exp $