idnits 2.17.1 

draft-gettys-webmux-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 999 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 416 instances of too long lines in the document, the longest
     one being 8 characters in excess of 72.

  ** There are 11 instances of lines with control characters in the document.

  ** The abstract seems to contain references ([15], [3], [16], [17], [5],
     [6], [7], [8], [9], [10], [28], [11], [12], [13], [1]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (August 1, 1998) is 9399 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 465 looks like a reference

  -- Missing reference section? '28' on line 413 looks like a reference

  -- Missing reference section? '9' on line 125 looks like a reference

  -- Missing reference section? '10' on line 125 looks like a reference

  -- Missing reference section? '6' on line 154 looks like a reference

  -- Missing reference section? '11' on line 154 looks like a reference

  -- Missing reference section? '5' on line 171 looks like a reference

  -- Missing reference section? '8' on line 172 looks like a reference

  -- Missing reference section? '12' on line 180 looks like a reference

  -- Missing reference section? '15' on line 196 looks like a reference

  -- Missing reference section? '16' on line 196 looks like a reference

  -- Missing reference section? '13' on line 197 looks like a reference

  -- Missing reference section? '7' on line 224 looks like a reference

  -- Missing reference section? '3' on line 304 looks like a reference

  -- Missing reference section? '17' on line 428 looks like a reference

  -- Missing reference section? '20' on line 526 looks like a reference

  -- Missing reference section? '21' on line 622 looks like a reference

  -- Missing reference section? '22' on line 686 looks like a reference


     Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 21 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET DRAFT                     Jim Gettys, Compaq Computer Corporation
2	draft-gettys-webmux-00.txt              Henrik Frystyk Nielsen, W3C, M.I.T
3	Expires January 1, 1999                                     August 1, 1998

5	The WebMUX Protocol

7	Status of This Document

9	This document is an Internet-Draft. Internet-Drafts are working documents of te
10	Internet Engineering Task Force (IETF), its areas, and its working groups. Note
11	that other groups may also distribute working documents as Internet-Drafts.

13	Internet-Drafts are draft documents valid for a maximum of six months and may be
14	updated, replaced, or obsoleted by other documents at any time. It is
15	inappropriate to use Internet-Drafts as reference material or to cite them other
16	than as "work in progress."

18	To view the entire list of current Internet-Drafts, please check the
19	"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories
20	on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it
21	(Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or
22	ftp.isi.edu (US West Coast).

24	This document describes an experimental design for a multiplexing transport,
25	intended for, but not restricted to use with the Web. WebMUX has been
26	implemented as part of the HTTP/NG project. Use of this protocol is EXPERIMENTAL
27	at this time and the protocol may change. In particular, transition strategies
28	to use of WebMUX have not been definitively worked out. You have been warned!

30	Distribution of this document is unlimited. Please send comments to the HTTP-NG
31	mailing list at <www-http-ng-comments@w3.org>. Discussions are archived at
32	"http://lists.w3.org/Archives/Public/www-http-ng-comments/".

34	Please read the "HTTP-NG Short- and Longterm Goals Document" [1] for a
35	discussion of goals and requirements of a potential new generation of the HTTP
36	protocol and how we intend to evaluate these goals.

38	General information about the Project as well as new draft revisions, related
39	discussions, and background information is linked from
40	"http://www.w3.org/Protocols/HTTP-NG/".

42	Note: Since internet drafts are subject to frequent change, you are advised to
43	reference the Internet Draft directory. This work is part of the W3C HTTP/NG
44	Activity (for current status, see http://www.w3.org/Protocols/HTTP-NG/Activity).

46	Abstract

48	This document defines the experimental multiplexing protocol referred to as
49	"WebMUX". WebMUX is a session management protocol separating the underlying
50	transport from the upper level application protocols. It provides a lightweight
51	communication channel to the application layer by multiplexing data streams on
52	top of a reliable stream oriented transport. By supporting coexistence of
53	multiple application level protocols (e.g. HTTP and HTTP/NG), WebMUX should ease
54	transitions to future Web protocols, and communications of client applets using
55	private protocols with servers over the same TCP connection as the HTTP
56	conversation.

58	WebMUX is intended for, but by no means restricted to, transport of Web related
59	protocols; the name has been chosen to reduce confusion with other existing
60	multiplexing protocols.

62	This document is part of a suite of documents describing the HTTP-NG design and
63	prototype implementation:
64	    * HTTP-NG Short- and Longterm Goals, ID
65	    * HTTP-NG Architectural Model, ID
66	    * HTTP-NG Wire Protocol, ID
67	    * The Classic Web Interfaces in HTTP-NG, ID
68	    * Description of the HTTP-NG Testbed, ID

70	Changes from Previous Version
71	    * Changed name from SMUX to WebMUX to reduce confusion with SNMP related
72	      protocol.
73	    * Split protocol ID address space to allow an address space for servers to
74	      use to identify protocols outside of the control of this document.
75	    * Elaborated endpoint usage.
76	    * Prepared to meet IETF ID standards.
77	    * Added acknowlegements section.
78	    * Some reorganization of the document

80		------------------------------------------------------

82	Contents
83	    1. The WebMUX Protocol
84	    2. Status of This Document
85	    3. Abstract
86	          1. Changes from Previous Version
87	    4. Contents
88	    5. Introduction
89	          1. Goals
90	    6. WebMUX Protocol Operation
91	          1. Key Words
92	          2. Deadlock Schenario
93	          3. Deadlock Avoidance
94	          4. Operation and Implementation Considerations
95	          5. WebMUX Header
96	          6. Alignment
97	          7. Long Fragments
98	          8. Atoms
99	          9. Protocol ID's
100	          10. Session ID Allocation
101	          11. Session Establishment
102	          12. Graceful Release
103	          13. Disgraceful Release
104	          14. Message Boundaries
105	          15. Flow Control
106	          16. End Points
107	          17. Control Messages
108	    7. Security Considerations
109	    8. Remaining Issues for Discussion
110	    9. Comparison with SCP (TMP)
111	    10. Closed Issues from Discussion and Email
112	    11. Acknowlegments
113	    12. References
114	    13. Author's Addresses

116		------------------------------------------------------

118	Introduction

120	The Internet is suffering from the effects of the HTTP/1.0 protocol, which was
121	designed without understanding of the underlying TCP [1] transport protocol.
122	HTTP/1.0 opens a TCP connection for each URI [28] retrieved (at a cost of both
123	packets and round trip times (RTTs)), and then closes the TCP connection. For
124	small HTTP requests, these TCP connections have poor performance due to TCP slow
125	start [9] [10] as well as the round trips required to open and close each TCP
126	connection.

128	There are (at least) three reasons why multiple simultaneous TCP connections
129	have come into widespread use on the Internet despite the apparent
130	inefficiencies:
131	    1. A client using multiple TCP connections gains a significant advantage in
132	      perceived performance by the end-user, as it allows for early retrieval of
133	      metadata (e.g. size) of embedded objects in a page. This allows a client
134	      to format a page sooner without suffering annoying reformatting of the
135	      page. Clients which open multiple TCP connections in parallel to the same
136	      server, however could cause self congestion on heavily congested links,
137	      since packets generated by TCP opens and closes are not themselves
138	      congestion controlled.
139	    2. The additional TCP opens cause performance problems in the network, but a
140	      client that opens multiple TCP connections simultaneously to the same
141	      server may also receive an "unfair" bandwidth advantage in the network
142	      relative to clients that use a single TCP connection. This problem is not
143	      solvable at the application level; only the network itself can enforce
144	      such "fairness".
145	    3. To keep low bandwidth/high latency links busy (e.g. dialup lines), more
146	      than one TCP connection has been necessary since slow start may cause the
147	      line to be partially idle.

149	The "Keep-Alive" extension to HTTP/1.0 is a form of persistent TCP connections
150	but does not work through HTTP/1.0 proxies and does not take pipelining of
151	requests into account. Instead a revised version of persistent TCP connections
152	was introduced in HTTP/1.1 as the default mode of operation.

154	HTTP/1.1 [6] persistent connections and pipelining [11] will reduce network
155	traffic and the amount of TCP overhead caused by opening and closing TCP
156	connections. However, the serialized behavior of HTTP/1.1 pipelining does not
157	adequately support simultaneous rendering of inlined objects - part of most Web
158	pages today; nor does it provide suitable fairness between protocol flows, or
159	allow for graceful abortion of HTTP transactions without closing the TCP
160	connection (quite common in HTTP operation).

162	Persistent connections and pipelining, however, do not fully address the
163	rendering nor the fairness problems described above. A "hack" solution is
164	possible using HTTP range requests; however, this approach does not, for
165	example, allow a server to send just the metadata contained in embedded object
166	before sending the object itself, nor does it solve the TCP connection abort
167	problem.

169	Current TCP implementations do not share congestion information across multiple
170	simultaneous TCP connections between two peers, which increases the overhead of
171	opening new TCP connections. We expect that Transactional TCP [5] and sharing of
172	congestion information in TCP control blocks [8] will improve TCP performance by
173	using less RTTs and better congestion behavior, making it more suitable for HTTP
174	transactions.

176	The solution to these problems requires two actions; either by itself will not
177	entirely discourage opening multiple TCP connections to the same server from a
178	client.
179	    * Internet service providers should enable the Random Early Detection (RED)
180	      [12] or other active congestion control algorithms in their routers to
181	      ensure bandwidth fairness to clients when the network is congested. RED
182	      also addresses queue length problems observed in routers today.
183	    * Development and deployment of a multiplexing protocol for use with HTTP
184	      (and eventually other protocols), so that multiple objects from a web
185	      server can be fetched approximately simultaneously over a single TCP
186	      connection, so that the metadata to objects can be sent to clients without
187	      other metadata waiting for the rest of the first object requested.

189	This document describes such an experimental multiplexing protocol. It is
190	designed to multiplex a TCPconnection underneath HTTP so that HTTP itself does
191	not have to change, and allow coexistence of multiple protocols (e.g. HTTP and
192	HTTP/NG), which will ease transitions to future Web protocols, and
193	communications of client applets using private protocols with servers over the
194	same TCP connection as the HTTP conversation.

196	Ideas from this design come from Simon Spero's SCP [15] [16] description and
197	from experience from the X Window System's protocol design [13].

199	Goals

201	We believe WebMUX meets the following goals we believe necessary for the use of
202	a multiplexing protocol for the Web:
203	    * Unconfirmed service without negotiation or round trips to the server
204	    * simple design
205	    * high performance
206	    * deadlock-free, by a credit based flow control scheme.
207	    * allow multiple protocols to be multiplexed over same TCP connection
208	    * allow connections to be established in either direction (enabling
209	      callbacks to the session initiator).
210	    * ability to build a full function socket interface above this protocol.
211	    * low overhead
212	    * preserves alignment in the data stream, so that it is easy to use with
213	      protocols that marshal their data in a binary form.

215		------------------------------------------------------

217	WebMUX Protocol Operation

219	Key Words

221	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
222	"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
223	interpreted as described in RFC 2119 [7].

225	Deadlock Scenario

227	Multiplexing multiple sessions over a single transport TCP connection introduces
228	a potential deadlock that WebMUX is designed to avoid.

230	Here is an example of potential deadlock:
231	    * Presume that each session is being handled by an independent thread and
232	      that memory available to the WebMUX implementation is limited (for
233	      example, on a thin client on a meter reader).
234	    * For the purposes of this example, presume the thin client has 50K bytes of
235	      buffer available to its WebMUX implementation, and cannot get more.
236	    * The sender of data decides to send, as part of a session request (SYN
237	      message), 100K bytes of initial data. There are no other senders, so all
238	      of the data gets transmitted. But the thread to deal with the message is
239	      blocked, and cannot make progress.
240	    * Unless WebMUX can buffer all 100K (or 1 meg, or pick your favorite
241	      numbers), any other session's data would be blocked behind this initial
242	      transmission until and unless WebMUX can read and buffer the data
243	      someplace (and since it has no buffer available, the deadlock occurs).
244	      Many similar (but possibly harder to explain) deadlocks are possible.

246	This example points out that deadlock is possible: WebMUX must be able to buffer
247	data independently of the consumers of the data. It must also have some way to
248	throttle sessions where the consumer of the data is not responsive in the
249	multiplexing layer (in this example, prevent the transmission of more than 50
250	Kbytes of data). Note that this deadlock is independent of the size of any
251	multiplexing fragment, but strictly dependent on availability of buffer space in
252	WebMUX for a particular session.

254	Deadlock Avoidance

256	In WebMUX, the receiver makes a promise (sends a credit) to the transmitter that
257	a certain amount of buffer space is available (or at least that it will consume
258	the bytes, if not buffer them, e.g. a real time audio protocol where the data is
259	disposed of), and the transmitter promises not to send more data than the
260	receiver has promised (no more than the credit). If these promises are met, then
261	WebMUX will not deadlock. The AddCredit control message is used to add a credit
262	to a session.

264	A WebMUX implementation MUST maintain and adhere to the credit system or it can
265	deadlock. Implementations on systems with large amounts of memory (e.g. VM
266	systems) may be quite different than ones on thin clients with limited,
267	non-virtual memory. It is reasonable on a VM system to hand out credits freely
268	(analogous to the virtual socket buffering found in TCP implementations); but
269	your implementation must be careful to test its credit mechanisms so that they
270	will inter operate with limited memory systems. Credit control messages MAY be
271	sent on sessions that are not active.

273	Sessions have an initial credit size (initial_default_credit) of 16 KB on each
274	session; the SetDefaultCredit control message can set this initial credit to
275	something larger than the default.

277	Operation and Implementation Considerations

279	A transmitter MUST NOT transmit more data in a fragment than the available
280	credit on the session (or it could deadlock).

282	An WebMUX implementation MUST fragment streams when transmitting them into
283	fragments. The fragment size can be controlled using the SetMSS control message.
284	The max_fragment_size, a variable which is maintained on (currently) a per
285	transport TCP connection basis, determines the largest possible fragment a
286	sender should ever send to a receiver. This determines the maximum latency
287	introduced by a WebMUX layer above and beyond the inherent TCP latencies (socket
288	buffering on both sender and receiver and the delay-bandwidth product amount of
289	data that could be in flight at any given instant). A client on a low bandwidth
290	link, or with limited memory buffering might decide to set the
291	max_fragment_size down to control latency and buffer space required.

293	If max_fragment_size is set to zero, the transmitter is left to determine the
294	fragment size and MAY take into account application protocol knowledge (e.g. a
295	WebMUX implementation for HTTP might send fragments of the metadata of embedded
296	objects, or the next phase of a progressive image format, which it only knows).
297	An implementation SHOULD honor the max_fragment_size as it transmits data, if it
298	has been set by the receiver.

300	An WebMUX implementation that does not have explicit knowledge or experience of
301	good fragment sizes might use these guidelines as a starting point:
302	    * The path_MTU of the TCP connection, minus the size of the TCP and IP
303	      headers (remember that IPV6 may have longer headers!) and 8 bytes for an
304	      WebMUX header, if this information is available [3].
305	    * The MSS of the TCP connection, if the path_MTU is not available
306	    * In either case, you probably want to subtract 8 bytes to make sure a
307	      WebWebMUX header can be added without forcing another TCP segment.

309	This would result in fragmentation roughly similar to TCP segmentation over
310	multiple TCP connections.

312	An implementation should round robin between sessions with data to send in some
313	fashion to avoid starving sessions, or allowing a single thread to monopolize
314	the TCP connection. Exact details of such behavior is left to the
315	implementation. To achieve highest bandwidth and lowest overhead WebMUX
316	behavior, credits should be handed out in reasonably large chunks. TCP
317	implementations typically send an ack message on every other packet, and it is
318	very hard to arrange to piggyback acks on data segments in implementations.
319	Therefore, for WebMUX to have reasonably low overhead credits should be handed
320	out in some significant multiple (4 or more times larger) than the ~3000 bytes
321	represented by two packets on an ethernet. The outstanding credit balance across
322	active sessions will also have to be larger than the bandwidth/delay product of
323	the TCP connection if WebMUX is not to become a limit on TCP transport
324	performance.

326	Both of these arguments indicate that outstanding credits in many
327	implementations should be 10K bytes or more. Implementations SHOULD piggyback
328	credit messages on data packets where possible, to avoid unneeded packets on the
329	wire. A careful implementation in which both ends of the TCP connection are
330	regularly sending some payload should be able to avoid sending extra packets on
331	the network.

333	If necessary, we could add in a future version fragmentation control messages to
334	do some bandwidth allocation, but for now, we are not bothering.

336	WebMUX Header

338	WebMUX headers are always in big endian byte order.
339	If people want, we could expand out the union below on a control message type
340	basis (e.g. the way the C bindings to X events were written out...). For this
341	draft, I'm not doing so.
342	 #define MUX_CONTROL       0x00800000
343	 #define MUX_SYN           0x00400000
344	 #define MUX_FIN           0x00200000
345	 #define MUX_RST           0x00100000
346	 #define MUX_PUSH          0x00080000
347	 #define MUX_SESSION       0xFF000000
348	 #define MUX_LONG_LENGTH   0xFF040000
349	 #define MUX_LENGTH        0x0003FFFF

351	 typedef unsigned int flagbit;
352	 struct w3mux_hdr {
353	     union {
354	        struct {
355	            unsigned int session_id : 8;
356	            flagbit control : 1;
357	            flagbit syn : 1;
358	            flagbit fin : 1;
359	            flagbit rst : 1;
360	            flagbit push : 1;
361	            flagbit long_length : 1;
362	            unsigned int fragment_size : 18;
363	            int long_fragment_size : 32;
364	                 /* only present if long_length is set */
365	        } data_hdr;
366	         struct {
367	            unsigned int session_id : 8;
368	            flagbit control : 1;
369	            unsigned int control_code : 4;
370	            flagbit long_length : 1;
371	            unsigned int fragment_size : 18;
372	            int long_fragment_size : 32;
373	                 /* only present if long_length is set */
374	        } control_message;
375	     } contents;
376	 };

378	The fragment_size is always the size in bytes of the fragment, excluding the
379	WebMUX header and any padding.

381	Alignment

383	WebMUX headers are always (at least) 32 bit aligned. To find the next WebMUX
384	header, take the fragment_size, and round up to the next 32 bit boundary.

386	Transmitters MAY insert NoOp control messages to force 64 bit alignment of the
387	protocol stream.

389	Long Fragments

391	A WebMUX header with the long_length bit set must use the 32 bits following the
392	WebMUX header (the long_fragment_size field) for the value of the fragment_size
393	field, for whatever purpose the fragment_size field is being used for.

395	Atoms

397	Atoms are integers that are used as short-hand names for strings, which are
398	defined using the InternAtom control message. Atoms are only used as protocol
399	ID's in this version of WebMUX, though they might be used for other purposes in
400	future versions. Since the atom might be redefined at any time, it is not safe
401	to use an atom unless you have defined it (i.e. you cannot use atoms defined by
402	the other end of a mux connection). Atoms are therefore not unique values, and
403	only make sense in the context of a particular direction of a particular mux
404	connection. This restriction is to avoid having to define some protocol for
405	deallocating atoms, with any round trip overhead that would likely imply.

407	Strings are defined to be UTF-8 encoded UNICODE strings. (Note that an ascii
408	string is valid UTF-8). The definition of structure of these strings is outside
409	of the scope of this document, though we expect they will often be URI's, naming
410	a protocol or stack of protocols. Atoms always have values between 0x20000 and
411	0x200ff (a maximum of 256 atoms can be defined).

413	Strings used for protocol id's MUST be URIs [28].

415	Protocol ID's

417	The protocol used by a session is identified by a Protocol ID, which can either
418	be an IANA port number, or an atom.
419	    1. To allow higher layers to stack protocols (e.g. HTTP on top of deflate
420	      compression, on top of TCP).
421	    2. To identify the protocol or protocol stack in use so that application
422	      firewall relays can perform sanity checking and policy enforcement on the
423	      multiplexed protocols .

425	Firewall proxies can presume that the bytes should conform to that protocol
426	identified by the Protocol ID.
427	    * 0-0xFFFF: IANA-registered TCP protocols [17]
428	    * 0x10000-0x1FFFF: IANA-registered UDP protocols [17]
429	    * 0x20000-0x2FFFF: per-underlying-connection-defined MUX atoms.
430	      The scheme name of the URI indicates the protocol family being used (e.g.
431	      http, ftp, etc.).
432	    * 0x30000-0x3FFFF: server-assigned protocol IDs
433	      The assignment of these ID's are outside the scope of this protocol, and
434	      may pose additional security hazards.

436	Session ID Allocation

438	Each session is allocated a session identifier. Session Identifiers below 0 and
439	1 are reserved for future use. Session IDs allocated by initiator of the
440	transport TCP connection are even; those allocated by the receiver of the
441	transport connection odd. Proxies that do not understand messages of reserved
442	Session ID's should forward them unchanged. A session identifier MUST only be
443	deallocated and potentially reused by new sessions when a session is fully
444	closed in both directions.

446	Session Establishment

448	To establish a new session, the initiating end sends a SYN message, allocating a
449	free session number out of its address space. A session is established by
450	setting the SYN bit in the first message sent on that session. The session is
451	specified by the session_id field. The fragment_size field is interpreted as the
452	protocol ID of the session, as discussed above.

454	The receiver MUST either open the reverse path of that session (send a SYN
455	message), or it MUST send a FIN message to indicate that the reverse path is not
456	going to be used further, or send a RST message to indicate an error. This
457	enables the initiator of a session to know when it is safe to reuse that session
458	ID.

460	Graceful Release

462	A session is ended by sending a fragment with the FIN bit set. Each end of a
463	WebMUX connection may be closed independently.

465	WebMUX uses a half-close mechanism like TCP[1] to close data flowing in each
466	direction in a session. After sending a FIN fragment, the sender MUST NOT send
467	any more payload in that direction.

469	Disgraceful Release

471	A session may be terminated by sending a message with the RST bit set. All
472	pending data for that session should be discarded. "No such protocol" errors
473	detected by the receiver of a new session are signaled to the originator on
474	session creation by sending a message with the RST bit set. (Same as in TCP).

476	The payload of the fragment containing the RST bit contains the null terminated
477	string containing the URI of an error message (note that content negotiation
478	makes this message potentially multi-lingual), followed by a null terminated
479	UTF-8 string containing the reason for the reset (in case the URI is not
480	accessable).

482	Message Boundaries

484	A message boundary is marked by sending a message with the PUSH bit set. The
485	boundary is set between the last octet in this message, including that octet,
486	and the first byte of a subsequent message. This differs slightly from TCP, as
487	PUSH can be reliably used as a record mark.

489	Flow Control

491	Flow control is determined by a simple credit scheme described above by using
492	the AddCredit control message defined below. Fragments transmitted MUST never
493	exceed the outstanding credit for that session. The initial outstanding credit
494	for a session is 16Kbytes.

496	End Points

498	One of the major design goals of WebMUX is to allow callbacks to objects in the
499	process that initiated the transport TCP connection without requiring additional
500	TCP connections (with the overhead in both machine resources and time that this
501	would cause, or the problems with TCP connection establishment through
502	firewalls).

504	The DefineEndpoint control message allows one to advertize that a particular
505	(set of) URI's are reachable over the transport TCP connection.

507	A MUX protocol ID only identifies a MUX channel relative to a particular
508	"endpoint". The pair of <endpoint><protocol ID> completely identify a MUX
509	channel, without regard to IP address, TCP port, or other information. Endpoint
510	IDs are URI names for endpoints. Any endpoint may have multiple endpoint IDs. We
511	do not place any further restrictions on the types of URIs that are used as
512	endpoint IDs.

514	A client connecting from a MUX endpoint A to a MUX channel on a different
515	endpoint B may send an ID for A to B via the DefineEndpoint control message. If
516	a client in endpoint B then needs to connect to a MUX channel in endpoint A, it
517	may do so by using the existing lower-level byte stream originated from endpoint
518	A. A connection initiator may send multiple DefineEndpoint control messages with
519	different endpoint IDs for the same endpoint.

521	Connection initiators may wish to control the disclosure of endpoint
522	information, both for security purposes and for optimal application timing, and
523	should be given reasonable

525	Whether this relative URI naming can be used depends upon the scheme of the URI
526	[20], which defines its structure. For example, a firewall proxy might advertize
527	just "http:" for the proxy, claiming it can be used to contact any HTTP protocol
528	object anywhere, or "http://foo.com/bar/" to indicate that any object below that
529	point in the URI space on the server foo.com may be reached by this TCP
530	connection. A client might advertize that "http://myhost.com/" is available via
531	this transport TCP connection.

533	Control Messages

535	The control bit of the WebMUX header is always set in a control message. Control
536	messages can be sent on any session, even sessions that are not (yet) open. The
537	control_code reuses the SYN, FIN, RST, and PUSH bits of the WebMUX header. The
538	control_code of the control message determines the control message type. Any
539	unused data in a control message must be ignored.

541	The revised version of WebMUX means that a session creation costs 4 bytes (a
542	control message with SYN set, and with the protocol ID in the message).
543	Therefore the first fragment of payload has a total overhead of 8 bytes. (This
544	is presuming using an IANA based protocol, rather than a named protocol). This
545	is the same as the previous version, though it means two messages rather than
546	one.

548	The individual control message types are listed below (code Name direction;
549	description):
550	   0 InternAtom Both
551	      The session_id is used as the Atom to be defined (offset by 0x2000), so a
552	      value of 0 is defining ID 0x2000). The fragment_size field is the length
553	      of the UTF-8 encoded string. The fragment itself contains the string to be
554	      interned. This allows the interning of 256 strings. (is this enough?).
555	   1 DefineEndpoint Both
556	      The session_id is ignored. The fragment_size is interpreted as the
557	      protocol ID, naming an endpoint actually available on this transport TCP
558	      connection. This enables a single transport TCP connection to be used for
559	      callbacks, or to advertise that a protocol endpoint can be reached to the
560	      process on the other end of the transport TCP connection.
561	   2 SetMSS Both
562	      This sets a limit on fragment sizes below the outstanding credit limit.
563	      The session_id must be zero. The fragment_size field is used as
564	      max_fragment_size (the largest fragment that be sent on any session on
565	      this transport TCP connection.). A max_fragment_size of zero means there
566	      is no limit on the fragment size allowed for this session.
567	   3 AddCredit R->T
568	      The session_id specifies the session. The fragment_size specifies the flow
569	      control credit granted (to be added to the current outstanding credit
570	      balance). A value of zero indicates no limit on how much data may be sent
571	      on this session.
572	   4 SetDefaultCredit R->T
573	      The session_id must be zero. The fragment_size field is used as to set the
574	      initial default credit limit for any incoming WebMUX connections over this
575	      transport TCP connection. (i.e. it is short hand for sending a series of
576	      AddCredit messages for each session ID).
577	   5 NoOp Both
578	      This control message is defined to perform no function. Any data in the
579	      payload should be ignored.
580	   6-15 - Undefined.
581	      Reserved for future use. Must be ignored if not understood, and forwarded
582	      by any proxies. The fragment_size is always used for the length of the
583	      control message, and any data for the control message will be in the
584	      payload of the control message (to allow proxies to be able to forward
585	      future control messages).

587		------------------------------------------------------

589	Security Considerations

591	Advertizing endpoints inappropriately might allow a client to connect to
592	services that should be protected.

594	Using the protocol ID range 0x30000-0x3FFFF for server-assigned protocol IDs may
595	prevent a firewall proxy from having enough information to safely proxy
596	protocols of those types. Firewall proxy implementers should not blindly forward
597	protocols of this range.

599	Firewall proxies implementing WebMUX should enforce appropriate policies for
600	protocols being multiplexed over WebMUX, in a fashion similar to the policies
601	imposed for native protocols.

603	Clearly, any security consideration for a protocol is likely to still apply to
604	its use when being multiplexed via WebMUX.

606		------------------------------------------------------

608	Remaining Issues for Discussion

610	When can WebMUX be used???
611	    * What are the appropriate strategies for determining if the WebMUX protocol
612	      can be used?
613	    * Name server hack?
614	    * UPGRADE in HTTP?
615	    * Remember that previous UPGRADE to use WebMUX worked?
616	    * Should there be a more compact open message?

618		------------------------------------------------------

620	Comparison with SCP (TMP)

622	Note that TIP (Transaction Internet Protocol) [21] defines a version of SCP
623	called TMP .

625	Goals:
626	    * Unconfirmed service without negotiation.
627	    * SCP allows data to be sent with the session establishment; the recipient
628	      does not confirm successful mux connection establishment, but may reject
629	      unsuccessful attempts. This simplifies the design of the protocol, and
630	      removes the latency required for a confirmed operation.
631	    * simple design
632	    * performance where critical

634	There are five issues that make SCP (TMP) inadequate for our use:
635	    * SCP can deadlock, unless unlimited amounts of memory is available.
636	    * it has no provision for multiplexing multiple protocols over the same
637	      transport TCP connection, essential for graceful transition without
638	      dependency on the currently incomplete NG design, and to allow other uses
639	      which could use the same multiplexed connection (e.g. applet communication
640	      with serverlets).
641	    * SCP's 8 byte overhead is not reasonable most of the time. WebMUX uses four
642	      bytes in the default case. The design below permits an 8 byte header if
643	      you care to preserve 64 bit alignment at the cost of bytes. In practice,
644	      there seems few data formats or architectures that actually require more
645	      than 32 bit alignment.
646	    * Without some form of flow control, infinite buffering in clients
647	      (receivers) would be required.
648	    * Alignment is preserved in the data stream. This allows compact, high speed
649	      (un)marshalling code in implementations of binary protocols, without extra
650	      data copies, which in such protocols can be significant overhead.
651	    * SCP SYN in Version 2 requires a second message, which costs a round trip.

653	So far, WebMUX is similar to SCP. There are some important differences:
654	    * deadlock-free (we believe), by a credit based flow control scheme.
655	    * allow multiple protocols to be multiplexed over same TCP connection (not
656	      available in SCP).
657	    * lower overhead than SCP, while preserving data alignment (very important
658	      for binary protocol marshaling code)
659	    * ability to build a full function socket interface above this protocol.
660	    * WebMUX avoids the SYN round trip of SCP V2 by session ID's being allocated
661	      in independent address spaces. This also avoids many of the state
662	      transitions of SCP, simplifying the protocol greatly.
663	    * SCP has 224 sessions, which seems highly excessive, and reserves 1024 of
664	      them for future use.

666		------------------------------------------------------

668	Closed Issues from Discussion and Mail

670	Some of the comments below allude to previous versions of the specification, and
671	may not make sense in the context of the current version. It will likely be
672	eliminated in future versions, but may answer some questions that arise when
673	reading this document.

675	Flow control: priority vs. credit schemes

677	Henrik and I have convinced ourselves there are fundamental differences between
678	a priority scheme and the credit scheme in this draft. They interact quite
679	differently with TCP, and priority schemes have no way to limit the total amount
680	of data being transmitted, though priority schemes are better matched to what
681	the Web wants. We've decided, at least for now, to defer any priority schemes to
682	higher level protocols.

684	Stacking Protocols and Transports (Stacks)

686	ILU [22] style protocol stacks are a GOOD THING. There have been too many
687	worries about the birthday problem for people to be comfortable with Bill
688	Janssen's hashing schemes (see Henrik Frystyk Nielsen and Robert Thau's mail on
689	this topic). We tried putting this directly in WebMUX in a previous version, and
690	experience shows that it didn't really help an implementer (in particular, Bill
691	Janssen while implementing ILU). This version has just the name of the protocol,
692	and it is left to others to implement any stacking (e.g. ILU).

694	We believe the name of the protocol is necessary, if WebMUX is ever to be used
695	with firewalls. Application level firewall relays need the protocol information
696	to sanity check the protocol being relayed. Application level relays are
697	considered much more secure than just punching holes in the firewall for
698	particular protocol families, which small organizations often find sufficient,
699	as the relay can sanity check the protocol stream and enable better policy
700	decisions (for example, to forbid certain datatypes in HTTP to transit a
701	firewall). Large organizations and large targets typically only run application
702	level proxies.

704	Byte Usage

706	Wasting bytes in general, and in particular at TCP connection establishment, for
707	a multiplexing transport must be avoided. There are several reasons for this:
708	    * if the initial segment is too long, a network round trip will be lost to
709	      TCP slow start, so bytes near the beginning of a conversation MAY BE much
710	      more precious than bytes later in the conversation, once slow start
711	      overhead has been paid. If the first segment is too long, you fall off a
712	      cliff.
713	    * Directly affects user perceived response; no cleverness of later packing
714	      and batching of request can get the time back; each goes directly to
715	      perceived latency when a user talks to the server for the first time.

717	So there is more than the usual tension between generality vs. performance.
718	Performance analysis

720	Human perception is about 30 milliseconds; if much more than this, the user
721	perceives delay. At 14.4 K baud, one byte uncompressed costs .55 milliseco nds
722	(ignoring modem latencies). On an airplane via telephone today, you get a
723	munificent 4800 baud, which is 3X slower. Cellular modems transmitting data
724	(CDPD), as I understand it, will give us around 20Kbaud, when deployed.

726	So basic multiplexing @ 4 byte overhead costs ~ 2 milliseconds on common modems.
727	This means basic overhead is small vs. human perception, for most low speed
728	situations, a good position to be in.

730	On WebMUX connection open, with above protocol we send 4 bytes in the setup
731	message, and then must open a session, requiring at least 8 bytes more. 12 bytes
732	== 7 milliseconds at 14.4K. Not 64 bit aligned, and 4 bytes costs of order 2
733	milliseconds. Ugh... Maybe a setup message isn't a good idea; other uses (e.g.
734	security) can be dealt with by a control message.

736	Multiple protocols over one WebMUX

738	We want to WebMUX multiple protocols simultaneously over the same transport TCP
739	connection, so we need to know what protocol is in use with each session, so the
740	demultipexor can hand the data to the right person. (e.g. SUNRPC and DCERCP
741	simultaneously).

743	There are two obvious ways I can see to do this:
744	   a) Send a control message when a session is first used, indicating the
745	   protocol.
746	      Disadvantage: costs probably 8 bytes to do so (4 WebMUX overhead, and 4
747	      byte message), and destroys potential 64 bit alignment.
748	   b) If syn is set indicating new session, then steal mux_length field to
749	   indicate protocol in use on that session.
750	      (overhead; 4 bytes for the WebMUX header used just to establish the
751	      session.)

753	Opinions? Mine is that b) is better than a. Answer: b) is the adopted strategy.

755	Priority...

757	For a given stream, priority will affect which session is handled when
758	multiplexing data; sending the priority on every block is unneeded, and would
759	waste bytes. There is one case in which priority might be useful: at an
760	intermediate proxy relaying sessions (and maybe remultiplexing them).

762	If so, it should be sent only when sessions are established or changed. Changes
763	can be handled by a control message. Opinions?

765	A priority field can be hacked into the length field with the protocol field
766	using b) above.

768	So the question is: is it important to send priority at all in this WebMUX
769	protocol? Or should priority control, if needed, be a control message? ;
770	(control message).

772	Answer: Not in this protocol. Opens Pandora's box with remultiplexors, which
773	could have denial of service attacks.

775	Setup message

777	Is any setup message needed? I don't think it is,. and initial bytes are
778	precious (see performance discussion above), and it complicates trivial use. If
779	we move the byte order flag to the WebMUX header, and use control messages if
780	other information needs to be sent, we can dispense with it, and the layer is
781	simpler. This is my current position, and unless someone objects with reasons,
782	I'll nuke it in the next version of this document.

784	Answer: Not needed. Nuked.

786	Byte order flags

788	While higher layer protocols using host dependent byte order can be a performan
789	ce win (when sending larger objects such as arrays of data), the overhead at
790	this layer isn't much, and may not be worth bothering with. Worst case (naive
791	code) would be four memory reads and 3 shift overhead/payload. Smart code is one
792	load and appropriate shifts etc.

794	Opinions? I'm still leaning toward swapping bytes here, but there are other
795	examples of byte load and shift (particularly slow on Alpha, but not much of an
796	issue on other systems).

798	Answer: Not sufficient performance gain at WebMUX level to be worth doing.
799	Defined as LE byte order for WebMUX headers.

801	Error handling

803	There are several error conditions, probably best reported via control messages
804	from server:
805	    * No such protocol. Some sort of serial number should be reported, I
806	      suppose; this serial number can be implicit as in X
807	    * bad message.
808	    * Some combinations of flag bits are not legal.
809	    * Priority if it exists?

811	Any others? Any twists to worry about?

813	Answer: Only error that can occur is no such protocol, given no priority in the
814	base protocol. May still be some unresolved issues here around "Christma s Tree"
815	message (all bits turned on).

817	Length Field

819	Any reason to believe that the 32 bit length field for a single payload is
820	inadequate? I don't think so, and I live on an Alpha.

822	Answer: 32 bit extended length field for a single fragment is sufficient.

824	Compression

826	Does there need to be a bit saying the payload is compressed to avoid explosion
827	of protocol types?

829	Answer: Yes; introduction of control message to allow specification of transport
830	stacks achieves this.

832	Stacks

834	I think that we should be able to multiplex any TCP, UDP, or IP protocol.
835	Internet protocol numbers are 8 bit fields.

837	So we need 16 bits for TCP, one bit to distinguish TCP and UDP, and one bit more
838	we can use for IP protocol numbers and address space we can allocate privately.
839	This argues for an 18 bit length field to allow for this reuse. * 18 bit length
840	field * * 8 bit session field * * 4 control bits * * 1 long length bit *

842	The last bit is used to define control messages, which reuse the syn, fin, rst,
843	and push bits as a control_code to define the control message. There are
844	escapes, both by undefined control codes, and by the reservation of two sessions
845	for further use if there needs to be further extensions. The spec above reflects
846	this.

848	Alignment

850	Back to alignment. If we demand 4 byte alignment, for all requests that do not
851	end up naturally aligned, we waste bytes. Two bytes are wasted on average. At
852	14.4Kbaud the overhead for protocols that do not pad up would on mean be 6 bytes
853	or ~3ms, rather than 4 bytes or ~ 2 ms (presuming even distributions of length).
854	Note that this DOES NOT effect initial request latency (time to get first URL),
855	and is therefore less critical than elsewhere.

857	I have one related worry; it can sometimes be painful to get padding bytes at
858	the end of a buffer; I've heard of people losing by having data right up to the
859	end of a page, so implementations are living slightly dangerous ly if they
860	presume they can send the padding bytes by sending the 1, 2 or 3 bytes after the
861	buffer (rather than an independent write to the OS for padding bytes).

863	Alternatively, the buffer alignment requirement can be satisfied by
864	implementations remembering how many pad bytes have to be sent, and adjusting
865	the beginning address of the subsequent write by that many bytes before the
866	buffer where the WebMUX header has been put. Am I being unnecessarily paranoid?

868	Opinion: I believe alignment of fragments in general is a GOOD THING, and will
869	simplify both the WebMUX transport and protocols at higher levels if they can
870	make this presumption in their implementations. So I believe this overhead is
871	worth the cost; if you want to do better and save these bytes, then start
872	building an application specific compression scheme. If not, please make your
873	case.

875	Control bits

877	Are the four bits defined in Simon's flags field what we need? Are there any
878	others?

880	Answer: no. More bits than we need. Current protocol doesn't use as many. I've
881	ended back at the original bits specified, rather than the smaller set suggested
882	by Bill Janssen. This enables full emulation of all the details of a socket
883	interface, which would not otherwise be possible. See details around TCP and
884	socket handling, discussed in books like "TCP/IP Illustrated," by W. Richard
885	Stevens.

887	Am I all wet?

889	Opinion: I believe that we should do this.

891	Control Messages

893	Question: do we want/need a short control message? Right now, the out for
894	extensibility are control messages sent in the reserved (and as yet unspecified
895	) control session. This requires a minimum of 8 bytes on the wire. We could
896	steal the last available bit, and allow for a 4 byte short control message, that
897	would have 18 bits of payload.

899	Opinion: Flow control needs it; protocol/transport stacks need it. Document
900	above now defines some control messages.

902	Simplicity of default Behavior

904	The above specification allows for someone who just wants to WebMUX a single
905	protocol to entirely ignore protocol ID's.

907		------------------------------------------------------

909	Acknowledgements

911	Contributors include (at least): Bill Janssen, Mike Spreitzer, Robert Thau,
912	Larry Masinter, Paul Leach, Paul Bennett, Rich Salz, Simon Spero, Mark Handey,
913	Anselm Baird-Smith, and Wan-Teh Chang. Our apologies to anyone we've missed.

915		------------------------------------------------------

917	References
918	    1. J.. Postel, "Transmission Control Protocol", RFC 793, Network Information
919	      Center, SRI International, September 1981
920	    2. J. Postel, "TCP and IP bake off", RFC 1025, September 1987
921	    3. J. Mogul, S. Deering, "Path MTU Discovery", RFC 1191, DECWRL, Stanford
922	      University, November 1990
923	    4. T. Berners-Lee, "Universal Resource Identifiers in WWW. A Unifying Syntax
924	      for the Expression of Names and Addresses of Objects on the Network as
925	      used in the World-Wide Web", RFC 1630, CERN, June 1994.
926	    5. R. Braden, "T/TCP -- TCP Extensions for Transactions: Functional
927	      Specification", RFC 1644, USC/ISI, July 1994
928	    4. R. Fielding, "Relative Uniform Resource Locators", RFC 1808, UC Irvine,
929	      June 1995.
930	    5. T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext Transfer Protocol --
931	      HTTP/1.0", RFC 1945, W3C/MIT, UC Irvine, W3C/MIT, May 1996
932	    6. R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, T. Berners-Lee,
933	      "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, U.C. Irvine, DEC
934	      W3C/MIT, DEC, W3C/MIT, W3C/MIT, January 1997
935	    7. S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels",
936	      RFC 2119, Harvard University, March 1997
937	    8. J. Touch, "TCP Control Block Interdependence", RFC 2140, April 1997
938	    9. W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast Retransmit, and
939	      Fast Recovery Algorithms", RFC 2001, January 1997
940	    10. V. Jacobson, "Congestion Avoidance and Control", Proceedings of SIGCOMM
941	      '88
942	    11. H. Frystyk Nielsen, J. Gettys, A. Baird-Smith, E. Prud'hommeaux, H. W.
943	      Lie, and C. Lilley, "Network Performance Effects of HTTP/1.1, CSS1, and
944	      PNG", Proceedings of SIGCOMM '97
945	    12. S. Floyd and V. Jacobson, "Random Early Detection Gateways for
946	      Congestion Avoidance", IEEE/ACM Trans. on Networking, vol. 1, no. 4, Aug.
947	      1993.
948	    13. R.W.Scheifler, J. Gettys, "The X Window System" ACM Transactions on
949	      Graphics # 63, Special Issue on User Interface Software, 5(2):79-109
950	      (1986).
951	    14. V. Paxson, "Growth Trends in Wide-Area TCP Connections" IEEE Network,
952	      Vol. 8 No. 4, pp. 8-17, July 1994
953	    15. S. Spero, "Session Control Protocol, Version 1.0"
954	    16. S. Spero, " Session Control Protocol, Version 2.0"
955	    17. Keywords and Port numbers are maintained by IANA in the port-numbers
956	      registry.
957	    18. Keywords and Protocol numbers are maintained by IANA in the
958	      protocol-numbers registry.
959	    19. W. Richard Stevens, "TCP/IP Illustrated, Volume 1", Addison-Wesley, 1994
960	    20. Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource
961	      Identifiers (URI): Generic Syntax and Semantics," Work in Progress of the
962	      IETF, November, 1997.
963	    21. J. Lyon, K. Evans, J. Klein, "Transaction Internet Protocol Version
964	      2.0," Work in Progress of the Transaction Internet Protocol Working Group,
965	      November, 1997.
966	    22. B. Janssen, M. Spreitzer, " Inter-Language Unification"; in particular
967	      see the manual section on Protocols and Transports.

969		------------------------------------------------------

971	Authors' Addresses
972	    * James Gettys
973	      MIT Laboratory for Computer Science
974	      545 Technology Square
975	      Cambridge, MA 02139, USA
976	      Fax: 1 (617) 258 8682
977	      Email: jg@pa.dec.com
978	    * Henrik Frystyk Nielsen
979	      W3C/MIT Laboratory for Computer Science
980	      545 Technology Square
981	      Cambridge, MA 02139, USA
982	      Fax: +1 (617) 258-8682
983	      Email: frystyk@w3.org

985		------------------------------------------------------

987	    @(#) $Id: WD-mux.html,v 1.4 1998/08/03 18:36:32 frystyk Exp $