idnits 2.17.1 

draft-ietf-rohc-sigcomp-algorithm-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-24) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 20
     longer pages, the longest (page 2) being 59 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([DEFLATE], [LZW]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC-2026' is mentioned on line 17, but not defined

  == Missing Reference: 'PAGE 1' is mentioned on line 44, but not defined

  == Missing Reference: 'PAGE 2' is mentioned on line 90, but not defined

  == Missing Reference: 'RFC-2119' is mentioned on line 106, but not defined

  == Missing Reference: 'PAGE 3' is mentioned on line 144, but not defined

  == Missing Reference: 'PAGE 4' is mentioned on line 200, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 322

  -- Looks like a reference, but probably isn't: '1' on line 336

  -- Looks like a reference, but probably isn't: '2' on line 234

  == Missing Reference: 'PAGE 5' is mentioned on line 255, but not defined

  == Missing Reference: 'PAGE 6' is mentioned on line 311, but not defined

  == Missing Reference: 'PAGE 7' is mentioned on line 367, but not defined

  == Missing Reference: 'PAGE 8' is mentioned on line 423, but not defined

  == Missing Reference: 'PAGE 9' is mentioned on line 479, but not defined

  == Missing Reference: 'PAGE 10' is mentioned on line 535, but not defined

  == Missing Reference: 'PAGE 11' is mentioned on line 588, but not defined

  == Missing Reference: 'PAGE 12' is mentioned on line 643, but not defined

  == Missing Reference: 'PAGE 13' is mentioned on line 698, but not defined

  == Missing Reference: 'PAGE 14' is mentioned on line 750, but not defined

  == Missing Reference: 'PAGE 15' is mentioned on line 794, but not defined

  == Missing Reference: 'PAGE 16' is mentioned on line 847, but not defined

  == Missing Reference: 'PAGE 17' is mentioned on line 902, but not defined

  == Missing Reference: 'PAGE 18' is mentioned on line 958, but not defined

  == Missing Reference: 'PAGE 19' is mentioned on line 1014, but not defined

  == Missing Reference: 'PAGE 20' is mentioned on line 1069, but not defined

  == Missing Reference: 'PAGE 21' is mentioned on line 1103, but not defined

  ** Downref: Normative reference to an Informational RFC: RFC 1951 (ref.
     'DEFLATE')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'V-J'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LZW'

  ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC
     4960)

  ** Obsolete normative reference: RFC 2543 (ref. 'SIP') (Obsoleted by RFC
     3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265)


     Summary: 7 errors (**), 0 flaws (~~), 25 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                  Richard Price, Siemens/Roke Manor
3	INTERNET-DRAFT                           Jonathan Rosenberg, dynamicsoft
4	Expires: May 2002                    Abigail Surtees, Siemens/Roke Manor
5	                                         Mark A West, Siemens/Roke Manor
6	                                     Lawrence Conroy, Siemens/Roke Manor

8	                                                       14 November, 2001

10	                     Universal Decompression Algorithm
11	                <draft-ietf-rohc-sigcomp-algorithm-00.txt>

13	Status of this Memo

15	   This document is an Internet-Draft and is in full conformance with
16	   all provisions of Section 10 of [RFC-2026].

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups. Note that other
20	   groups may also distribute working documents as Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time. It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This document is a submission to the IETF ROHC WG. Comments should be
34	   directed to the mailing list of ROHC, rohc@cdt.luth.se.

36	Abstract

38	   This specification defines a "universal decompressor" capable of
39	   interoperating with a wide range of compression algorithms. Using the
40	   basic techniques of Huffman compression and LZ77-style string
41	   substitution, the decompressor can be configured to understand the
42	   output of many well-known compressors including [DEFLATE] and [LZW].

44	Price et al.                                                  [PAGE 1]
45	Revision history

47	   Changes from <draft-ietf-rohc-sigcomp-01.txt>

49	     New COPY-LITERAL and COPY-OFFSET tokens added to reduce complexity
50	     for LZ77-based algorithms.

52	     COMPARE token modified to allow it to be used without a SWITCH
53	     token.

55	     HUFFMAN token modified to require fewer parameters.

57	     Support added for literal parameters (see Section 4.3).

59	     Example token sets added for decompression of LZ77, [DEFLATE] and
60	     [LZW] compressed messages.

62	Table of contents

64	   1.  Introduction.................................................3
65	   2.  Terminology..................................................3
66	   3.  Requirements on underlying transport protocol................3
67	   4.  Description of the universal decompressor....................4
68	   4.1.  Structure of universal decompressor dictionary.............5
69	   4.2.  Important entries in the byte buffer.......................5
70	   4.3.  Token parameters...........................................5
71	   4.4.  Decompressor actions upon receiving a compressed message...6
72	   4.5.  Decompression failure......................................7
73	   5.  Library of tokens............................................8
74	   5.1.  COPY.......................................................9
75	   5.2.  ADD / SUBTRACT.............................................10
76	   5.3.  LSHIFT / RSHIFT............................................10
77	   5.4.  COMPARE....................................................11
78	   5.5.  SWITCH.....................................................11
79	   5.6.  CALL / RETURN..............................................11
80	   5.7.  HUFFMAN....................................................12
81	   6.  Security considerations......................................14
82	   7.  References...................................................15
83	   8.  Authors' addresses...........................................15
84	   A.  Example sets of tokens.......................................16
85	   A.1.  Mnemonic language..........................................16
86	   A.2.  Example token set for simple LZ77 decompression............17
87	   A.3.  Example token set for DEFLATE decompression................17
88	   A.4.  Example token set for LZW decompression....................20

90	Price et al.                                                  [PAGE 2]
91	1.  Introduction

93	   This draft introduces the concept of a "universal decompressor".

95	   The goal of the document is to standardize a decompressor capable of
96	   interoperating with a wide range of compression algorithms.
97	   Consequently this draft describes the decompressor operation only,
98	   i.e. the actions which the decompressor takes upon receiving a
99	   certain instruction from the compressor.

101	2.  Terminology

103	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
104	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
105	   document are to be interpreted as described in [RFC-2119].

107	   Byte buffer

109	     The universal decompressor maintains a byte buffer containing any
110	     previously received text strings that might be useful for future
111	     compression.

113	   Token

115	     A token is an instruction transmitted from the compressor to the
116	     decompressor.

118	3.  Requirements on underlying transport protocol

120	   The universal decompressor takes as its input a sequence of
121	   compressed messages, which are processed and then outputted as a
122	   sequence of uncompressed messages. This chapter lists the
123	   requirements on the transport protocol used to carry compressed
124	   messages to the universal decompressor.

126	   Note that the universal decompressor outputs an uncompressed message
127	   when it encounters an explicit end-of-message character.
128	   Consequently there is no need for one uncompressed message to
129	   correspond to exactly one compressed message. Two or more compressed
130	   messages can be sent to reconstruct a single uncompressed message,
131	   which is very useful for segmenting compressed messages that are
132	   larger than the MTU of the transport protocol.

134	   Conversely, one compressed message can be sent to reconstruct
135	   several uncompressed messages. In particular, messages can be
136	   successfully decompressed even when the transport protocol provides
137	   data as a byte stream with no framing.

139	   Note however that the universal decompressor can make use of
140	   previously received messages to improve the overall compression
141	   ratio (see [V-J] for an example of a compression algorithm which
142	   uses previously received messages in this manner).

144	Price et al.                                                  [PAGE 3]
145	   Consequently, the main requirement on the underlying protocol is
146	   that it MUST ensure that the contents of the decompressor byte
147	   buffer are as expected by the next message to be decompressed.

149	   This requirement can be supported in a number of ways. If the
150	   underlying transport protocol is reliable (for example TCP or
151	   [SCTP]) then the compressed messages are provided to the universal
152	   decompressor in the correct order and free from bit errors. In this
153	   case the byte buffer is automatically updated correctly between
154	   messages.

156	   If the underlying transport protocol is unreliable (for example UDP)
157	   then the byte buffer MUST be reset to a known state between
158	   compressed messages. This ensures that messages are not compressed
159	   relative to text that has not yet been received by the universal
160	   decompressor.

162	   The "known state" of the byte buffer can be negotiated a-priori: for
163	   example a set of tokens can be downloaded from the compressor to the
164	   decompressor when the universal decompressor is being set up. All
165	   messages are then compressed relative to these tokens. Alternatively
166	   the transport protocol can indicate to the decompressor how the byte
167	   buffer should be set up before each message is decompressed.

169	4.  Description of the universal decompressor

171	   An important feature of the universal decompressor is that it can
172	   interoperate with a wide range of compression algorithms. The
173	   precise method for compression is left as an implementation
174	   decision, and in fact the standard decompressor can interoperate
175	   with any of the following classes of algorithm:

177	   *    Generic text compressor (for example [DEFLATE] or a similar
178	        algorithm).

180	   *    Protocol-aware compressor offering excellent performance for
181	        one type of text message (for example the text messages
182	        generated by [SIP]).

184	   *    Hybrid compressor with similar performance to [DEFLATE] for
185	        generic text strings and superior performance for certain types
186	        of text message.

188	   The choice of which tokens to send to the decompressor is left as a
189	   local implementation decision at the compressor. The only
190	   requirement is that of transparency, i.e. the compressor MUST NOT
191	   send tokens which cause the decompressor to incorrectly decompress a
192	   given message.

194	   Note however that it is perfectly acceptable for the compressor to
195	   send tokens which update the byte buffer at the decompressor, but
196	   which cause no decompressed message to be outputted. Indeed, this is
197	   a useful technique for pre-populating the dictionary with well-known
198	   text strings.

200	Price et al.                                                  [PAGE 4]
201	4.1.  Structure of universal decompressor dictionary

203	   The universal decompressor dictionary consists of a simple byte
204	   buffer designed to hold the current uncompressed message, the
205	   current compressed message, and any other previously received text
206	   strings that might be useful for future compression.

208	   The size buffer_size of the byte buffer can be negotiated by an
209	   externally defined mechanism (e.g. by the underlying protocol used
210	   to transport the compressed messages). Entries in the byte buffer
211	   are referred to as buffer[n] where 0 =< n < buffer_size.

213	   As all of the tokens currently use 2-byte indices into the byte
214	   buffer, the maximum size of the buffer is 64K.

216	4.2.  Important entries in the byte buffer

218	   The first few bytes in the byte buffer are used to store some
219	   important 2-byte integers. These integers are given the following
220	   names:

222	   Position in buffer:          Name:

224	        0 - 1                   first_token
225	        2 - 3                   uncompressed_start
226	        4 - 5                   uncompressed_end
227	        6 - 7                   circular_buffer
228	        8 - 9                   compressed_start
229	        10 - 11                 compressed_end
230	        12 - 13                 compressed_pointer
231	        14 - 15                 stack_free
232	        16 - 17                 stack[0]
233	        18 - 19                 stack[1]
234	        20 - 21                 stack[2]
235	           :                        :

237	   The MSBs of the integer are always stored before the LSBs. So, for
238	   example, the MSBs of first_token are stored in buffer[0] whilst the
239	   LSBs are stored in buffer[1].

241	   The use of each integer is described in the following sections of
242	   the draft.

244	4.3.  Token parameters

246	   Each of the tokens is followed by 0 or more bytes containing the
247	   parameters required by the token. At present all parameters are
248	   stored as 2-byte integers with MSBs stored in the byte preceding the
249	   LSBs in the byte buffer.

251	   The most significant bit of the 2-byte integer has a special
252	   meaning: it is used to determine whether the parameter is a literal
253	   value or an index pointing to a literal value.

255	Price et al.                                                  [PAGE 5]
256	   If the most significant bit is 0 then the 2-byte integer is
257	   interpreted as a literal value from 0 to 32767.

259	   If the most significant bit is 1 then the 2-byte integer is
260	   interpreted as an index from 0 to 32767 (the most significant bit
261	   itself is ignored). This index points to the location in the buffer
262	   containing the literal value of the parameter (MSBs stored in
263	   buffer[index] and LSBs stored in buffer[index + 1]). If the index
264	   references a location beyond the size of the byte buffer then a bad
265	   compressed message has been received and decompression failure
266	   occurs (see Section 4.5.).

268	4.4.  Decompressor actions upon receiving a compressed message

270	   When the universal decompressor is initialized all entries in the
271	   byte buffer are set to 0. Upon receiving a compressed message, the
272	   decompressor strips off the underlying protocol header and then
273	   performs the following actions:

275	   1.)   The message is copied directly into the byte buffer beginning
276	   at the byte specified in compressed_start.

278	   The underlying protocol MUST NOT pass a compressed message of more
279	   than 1460 bytes to the universal decompressor. If a larger
280	   compressed message is received, the underlying protocol passes only
281	   the first 1460 bytes to the decompressor, and provides additional
282	   working memory to store the bytes that are not currently being
283	   decompressed. The remainder of the compressed message is passed to
284	   the decompressor in blocks of 1460 bytes (or less if it is the last
285	   block in the compressed message).

287	   The decompressor MUST NOT concatenate two messages to form a single
288	   compressed message. This is because compressed messages are
289	   typically padded with trailing zero bits so that they are a whole
290	   number of bytes long. Concatenating two messages would cause these
291	   padding bits to be incorrectly interpreted as compressed data.

293	   Note that the buffer is circular, so once a byte is copied into
294	   buffer[buffer_size - 1], the next byte is copied into
295	   buffer[circular_buffer]. The parameter circular_buffer (see Section
296	   4.2) can be set to prevent the first part of the buffer from being
297	   overwritten by new messages. Typically this area of the buffer is
298	   used to hold important tokens and text strings that should be kept
299	   from one compressed message to the next. If circular_buffer lies
300	   beyond the size of the byte buffer then decompression failure occurs
301	   (see Section 4.5).

303	   After the message has been copied into the byte buffer, the position
304	   of the last byte in the compressed message is copied into
305	   compressed_end. Also, the value in compressed_start is copied into
306	   compressed_pointer.

308	   2.)   Next, the tokens contained within the byte buffer are executed
309	   beginning at the byte specified in first_token. The tokens are

311	Price et al.                                                  [PAGE 6]
312	   executed consecutively unless indicated explicitly (for example when
313	   the decompressor encounters a SWITCH token). If the next token to be
314	   executed lies outside the byte buffer then decompression failure
315	   occurs (see Section 4.5).

317	   3.)   The decompressor stops token execution when it reaches
318	   buffer[0] or buffer[1]. Depending on which buffer entry is reached,
319	   the following actions are then taken:

321	   If the decompressor reaches buffer[0], instead of executing the
322	   token contained within buffer[0] it stops token execution and
323	   outputs the uncompressed message. The location of the uncompressed
324	   message is from uncompressed_start up to, but not including
325	   uncompressed_end. After the uncompressed message has been outputted,
326	   the value in uncompressed_end is then copied into
327	   uncompressed_start. If uncompressed_start = uncompressed_end then no
328	   uncompressed message is outputted.

330	   As stated before the buffer is circular, so once a byte is copied
331	   from buffer[buffer_size - 1], the next byte is copied from
332	   buffer[circular_buffer]. If either uncompressed_start or
333	   uncompressed_end lie outside the circular buffer then decompression
334	   failure occurs.

336	   If buffer[1] is reached then token execution stops but no
337	   uncompressed message is outputted.

339	   When the next compressed message becomes available, the universal
340	   decompressor continues at Step 1.) above. Note that if the
341	   underlying transport protocol does not provide reliable, in-order
342	   message delivery then the contents of the byte buffer MUST be reset
343	   to the state expected by the next compressed message. This state can
344	   be negotiated a-priori, or the transport protocol can indicate to
345	   the decompressor how the byte buffer should be set up before the
346	   next message is decompressed.

348	4.5.  Decompression failure

350	   If the compressed messages received by the decompressor are
351	   corrupted (either accidentally or maliciously) then one of three
352	   possibilities might occur:

354	   *    A decompressed message is outputted that is incorrect.

356	   *    A token is encountered that cannot be processed successfully by
357	        the decompressor (for example a RETURN token when no CALL token
358	        has previously been encountered).

360	   *    The decompressor never finishes decompressing a message.

362	   To counter the first possibility the underlying protocol SHOULD
363	   include a checksum to verify either the compressed message or the
364	   uncompressed message. If a message fails the checksum then
365	   "decompression failure" has occurred. The decompressor does not

367	Price et al.                                                  [PAGE 7]
368	   output an uncompressed message, and ignores any future compressed
369	   message until the byte buffer is reset.

371	   If a token is encountered that cannot be successfully processed then
372	   decompression failure occurs automatically.

374	   To counter the third possibility, decompression failure SHOULD also
375	   occur after a certain number of tokens have been processed for a
376	   given compressed message. The maximum number of tokens to process is
377	   currently left as an implementation decision (but might in future be
378	   negotiated).

380	5.  Library of tokens

382	   The universal decompressor currently understands twelve types of
383	   token, chosen to support the widest possible range of compression
384	   algorithms with the minimum possible overhead.

386	   All tokens are stored as a single byte to indicate the token type,
387	   followed by 0 or more bytes containing the parameters required by
388	   the token. At present all parameters are 2-byte integers with MSBs
389	   stored before LSBs. For example, the COPY token is followed by three
390	   parameters as shown below:

392	   COPY (position, length, destination)

394	   In the byte buffer a COPY token is stored as the following 7 bytes:

396	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
397	   |0 0 0 0 0 0 0 0| Position MSB  | Position LSB  |  Length MSB   |
398	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

400	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
401	   |  Length LSB   |Destination MSB|Destination LSB|
402	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

404	   Twelve token types are currently available to the universal
405	   decompressor. The following table lists the different token types
406	   and the byte values used to transmit the tokens to the decompressor:

408	   Token type:                  Corresponding byte value:

410	   COPY                                     0
411	   COPY-LITERAL                             1
412	   COPY-OFFSET                              2
413	   ADD                                      3
414	   SUBTRACT                                 4
415	   LSHIFT                                   5
416	   RSHIFT                                   6
417	   COMPARE                                  7
418	   SWITCH                                   8
419	   CALL                                     9
420	   RETURN                                   10
421	   HUFFMAN                                  11

423	Price et al.                                                  [PAGE 8]
424	   Each token is explained in more detail below:

426	5.1.  COPY

428	   The COPY token instructs the decompressor to copy a string of bytes
429	   from one part of the byte buffer to another.

431	   A COPY token is stored in the byte buffer as 7 consecutive bytes as
432	   follows:

434	   COPY (position, length, destination)

436	   As with all tokens currently defined, the COPY token translates into
437	   a single byte for the token type followed by 2 bytes for each of the
438	   parameters.

440	   The meaning of the three parameters is explained below:

442	   Position:    2-byte integer indicating the location of the first
443	                byte in the string to be copied.

445	   Length:      2-byte integer indicating the number of bytes to be
446	                copied.

448	   Destination: 2-byte integer indicating the location to which the
449	                first byte in the string will be copied.

451	   Note that the copying function is performed on a byte-by-byte basis,
452	   with the position parameter indicating the first byte to be copied.
453	   In particular, some of the later bytes to be copied may themselves
454	   have been written into the byte buffer by the COPY token currently
455	   being executed.

457	   Equally, it is possible for a COPY token to overwrite itself or its
458	   parameters. If this occurs then the COPY token MUST continue to
459	   execute as if the parameters were still in place in the byte buffer.

461	   If the source or destination of the next byte to be copied is larger
462	   than (buffer_size - 1) then sufficient multiples of (buffer_size -
463	   circular_buffer) are subtracted until it is not. So for example,
464	   once a byte is copied into buffer[buffer_size - 1] the next byte is
465	   copied into buffer[circular_buffer]. If circular_buffer equals or
466	   exceeds buffer_size then a bad compressed message has been received
467	   and decompression failure occurs (see Section 4.5).

469	   A modified version of the COPY token is given below:

471	   COPY-LITERAL (position, length, destination)

473	   The COPY-LITERAL token is identical to COPY except that after
474	   copying, the destination parameter is replaced with the value
475	   (destination + length). If the destination parameter is a literal
476	   value then it is updated directly. If the destination parameter is
477	   an index then the literal value it references is updated instead.

479	Price et al.                                                  [PAGE 9]
480	   As above, if (destination + length) is larger than (buffer_size - 1)
481	   then sufficient multiples of (buffer_size - circular_buffer) are
482	   subtracted until it is not.

484	   A further version of the COPY-LITERAL token is given below:

486	   COPY-OFFSET (offset, length, destination)

488	   The COPY-OFFSET token is identical to COPY-LITERAL except that an
489	   offset parameter is given instead of a position parameter. The two
490	   parameters are related by position = (destination - offset).

492	   If (destination - offset) does not lie between circular_buffer and
493	   (buffer_size - 1) inclusive then sufficient multiples of
494	   (buffer_size - circular_buffer) are added or subtracted until it
495	   does.

497	5.2.  ADD / SUBTRACT

499	   The ADD token instructs the decompressor to add two 2-byte integers
500	   (addition performed modulo 2^16) and to store the result in the
501	   location of the first parameter.

503	   The format of the ADD token is given below:

505	   ADD (parameter_1, parameter_2)

507	   Note that as per Section 4.3, depending on how the MSB is set the
508	   parameters can be interpreted as literal values or indices to
509	   literal values. If parameter_1 is a literal value then it is
510	   overwritten with the result of the addition. If parameter_1 is an
511	   index to a literal value then the literal value is overwritten (not
512	   parameter_1 itself).

514	   The SUBTRACT token is the same as the ADD token except that the
515	   second integer is subtracted from the first (subtraction performed
516	   modulo 2^16):

518	   SUBTRACT (parameter_1, parameter_2)

520	5.3.  LSHIFT / RSHIFT

522	   The LSHIFT token instructs the decompressor to left shift a 2-byte
523	   value by the specified number of bits:

525	   LSHIFT (parameter, no_of_bits)

527	   The value to be shifted is stored in parameter, and the number of
528	   bits to shift is stored in no_of_bits. As usual both can be stored
529	   either as a literal value or as an index to a literal value.

531	   If the value of no_of_bits does not lie between 0 and 15 inclusive
532	   then a bad compressed message has been received and decompression
533	   failure occurs.

535	Price et al.                                                 [PAGE 10]
536	   The RSHIFT token is the same as the LSHIFT token except that the
537	   bits are right shifted:

539	   RSHIFT (parameter, no_of_bits)

541	5.4.  COMPARE

543	   The COMPARE token instructs the decompressor to compare two 2-byte
544	   values and then to jump to one of three specified indices depending
545	   on the result.

547	   COMPARE (parameter_1, parameter_2, index_1, index_2, index_3)

549	   If the parameter_1 < parameter_2 then the decompressor continues
550	   token execution at the byte position specified by index 1. If
551	   parameter_1 = parameter_2 then it jumps to index_2. If parameter_1 >
552	   parameter_2 then it jumps to index_3.

554	   If an index is specified which is beyond the size of the byte
555	   buffer, a bad compressed message has been received and decompression
556	   failure occurs.

558	5.5.  SWITCH

560	   The SWITCH token performs a conditional jump based on the value of
561	   its first parameter.

563	   SWITCH (j, index_0, index_1, ... , index_n-1)

565	   When a SWITCH token is encountered the decompressor reads the value
566	   of j. It then continues token execution at the byte position
567	   specified by index j.

569	   If j specifies an index which is beyond the size of the byte buffer,
570	   a bad compressed message has been received and decompression failure
571	   occurs.

573	5.6.  CALL / RETURN

575	   The CALL and RETURN tokens provide support for compression
576	   algorithms with a nested structure.

578	   CALL (index)

580	   RETURN

582	   When the decompressor reaches a CALL token, it finds the byte
583	   position of the token immediately following the CALL token and
584	   copies this 2-byte integer into stack[stack_free] ready for later
585	   retrieval. It then increases stack_free by 1 and continues token
586	   execution at the byte position specified by index.

588	Price et al.                                                 [PAGE 11]
589	   When the decompressor reaches a RETURN token it decreases stack_free
590	   by 1, and then continues token execution at the byte position stored
591	   in stack[stack_free].

593	   If stack_free ever becomes more than buffer_size - 1 or less than 0
594	   then a bad compressed message has been received and decompression
595	   failure occurs (see Section 4.5).

597	5.7.  HUFFMAN

599	   The HUFFMAN token maps a shorthand Huffman code onto its
600	   uncompressed equivalent.

602	   The format of a HUFFMAN token is as follows:

604	   HUFFMAN (position, bit_offset, destination, n, bits_1, uncomp_1,
605	   code_1, bits_2, uncomp_2, code_2, ... , bits_n-1, uncomp_n-1,
606	   code_n-1, bits_n, uncomp_n)

608	   The HUFFMAN token is followed by four mandatory parameters plus n
609	   additional sets of parameters. Every set contains three parameters
610	   except the nth set, which contains two. The nth set of parameters
611	   omits the parameter code_n, as the decompressor can always work out
612	   what the correct value of code_n should be.

614	   The meaning of the four mandatory parameters is explained below:

616	   Position:    Indicates the byte location of the Huffman code to be
617	                decompressed.

619	   Bit Offset:  Indicates the bit offset at which the Huffman code
620	                begins within the byte specified above.

622	   Destination: Indicates the location to which the uncompressed value
623	                will be copied.

625	   n:           Indicates the number of additional sets of parameters
626	                that follow. If n = 0 then the HUFFMAN token is ignored
627	                by the decompressor.

629	   The remaining parameters specify the actual Huffman codes and their
630	   uncompressed equivalents. Each set of 3 parameters specifies a block
631	   of Huffman codes with the same length and (when treated as integers)
632	   with values that increase by 1 for each additional Huffman code. The
633	   precise meaning of the parameters is given below:

635	   bits_j:      Indicates the additional length in bits of the Huffman
636	                codes in block j, compared to the Huffman codes in
637	                block j-1. Note that the total length of the Huffman
638	                codes in set j is (bits_1 + bits_2 + ... + bits_j).

640	   uncomp_j:    Indicates the uncompressed value of the first Huffman
641	                code in set j.

643	Price et al.                                                 [PAGE 12]
644	   code_j:      Indicates the compressed value (when read as an
645	                integer) of the last Huffman code in set j.

647	   Note that the compressed value of the first Huffman code in set j is
648	   not specified explicitly, but instead is taken to be 0 when j = 1 or
649	   (2^bits_j) x (1 + code_j-1) when j > 1. This simply means that the
650	   Huffman code is 1 greater than the largest Huffman code in set j-1,
651	   but padded with enough zeroes to give it the correct length in bits.

653	   For example, suppose that bits_1 = 2, uncomp_1 = 15 and code_1 = 1.
654	   This defines a set of 2 Huffman codes (00 and 01) with corresponding
655	   values 15 and 16.

657	   Suppose also that bits_2 = 1, uncomp_2 = 4 and code_2 = 7. This
658	   defines an additional set of 4 Huffman codes (100, 101, 110 and 111)
659	   with corresponding values 4, 5, 6 and 7.

661	   Huffman code:                Uncompressed value:

663	   00                                   15
664	   01                                   16
665	   100                                   4
666	   101                                   5
667	   110                                   6
668	   111                                   7

670	   The motivation for downloading Huffman codes to the decompressor in
671	   this form is that it is very easy to convert a compressed Huffman
672	   code into its uncompressed equivalent. This can be achieved by
673	   taking the following steps:

675	   1.)   Set j = 1, set code_0 = 65535 and set code_n = 65535.

677	   2.)   Read a total of (bits_1 + bits_2 + ... + bits_j) bits starting
678	   from the specified position and bit_offset. Interpret these bits as
679	   an integer H, with the first bit to be read as the MSB and the last
680	   bit to be read as the LSB.

682	   3.)   If (H > code_j) then set j = j + 1 and goto 2.

684	   4.)   Output (H + uncomp_j - (2^bits_j) x (1 + code_j-1)), with the
685	   arithmetic operations calculated modulo 2^16.

687	   Note that as the HUFFMAN token reads individual bits from within a
688	   byte, to avoid ambiguity it is necessary to define the order in
689	   which these bits are read. This draft specifies that a bit offset of
690	   0 indicates the least significant bit of a byte, whilst a bit offset
691	   of 7 indicates the most significant bit.

693	   For example, suppose that an 8-bit Huffman code begins at byte
694	   position 0 and bit offset 2. In this case the 8 bits of the Huffman
695	   code can be found in the following locations (Bit 0 is the first bit
696	   in the Huffman code and Bit 7 is the last bit):

698	Price et al.                                                 [PAGE 13]
699	    MSB         LSB MSB         LSB

701	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
702	   |5 4 3 2 1 0    |            7 6|
703	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

705	        Byte 0          Byte 1

707	   If the parameter bit offset does not take a value between 0 and 7
708	   inclusive then a bad compressed message has been received and
709	   decompression failure occurs (see Section 4.5.). Decompression
710	   failure also occurs if a total of more than 16 bits are read from
711	   the specified position and bit offset.

713	6.  Security considerations

715	   Note that this initial version raises the issues, without detailing
716	   a specific solution to resolve them.

718	   Introduction:

720	   In effect, a compressed message is a program; the tokens are
721	   instructions that are executed by the decompressor. This presents
722	   particular risks to the operation of the node on which the
723	   decompressor is running. This is the case for any compression
724	   algorithm, and so affects the operation of a node using this one.

726	   Transport Mechanism Requirements:

728	   The algorithm itself has no security issues, but does place
729	   requirements on any transport mechanism used to deliver the
730	   messages.

732	   If such requirements are not met, then the operation of the system
733	   can be exposed to attack ranging from indirect reduction in service,
734	   direct denial of service, and modification of subsequent messages.

736	   An initial list of requirements on the transport mechanism is:

738	   - messages should be delivered reliably to avoid corruption - for
739	   some uses of this algorithm, this corruption may have longer lasting
740	   effects.

742	   - messages should be not be duplicated - to ensure that operations
743	   implied by those messages are executed once only.

745	   - potentially, the message source should be identified and/or
746	   validated - to restrict possible insinuation of "attack" messages by
747	   third parties and to allow "blacklisting" of individual sources that
748	   "behave badly".

750	Price et al.                                                 [PAGE 14]
751	7.  References

753	   [DEFLATE]    "DEFLATE Compressed Data Format Specification version
754	                1.3", RFC 1951, P. Deutsch, May 1996

756	   [V-J]        "Compressing TCP/IP Headers for Low-Speed Serial
757	                Links", V. Jacobson, Internet Engineering Task Force,
758	                February 1990

760	   [LZW]        "LZW Data Compression", Mark Nelson, Dr. Dobb's
761	                Journal, October 1989

763	   [SCTP]       "Stream Control Transmission Protocol", Stewart et al,
764	                RFC 2960, Internet Engineering Task Force, October 2000

766	   [SIP]        "SIP: Session Initiation Protocol", Handley et al,
767	                RFC 2543, Internet Engineering Task Force, March 1999

769	8.  Authors' addresses

771	   Richard Price        Tel: +44 1794 833681
772	   Email:               richard.price@roke.co.uk

774	   Abigail Surtees      Tel: +44 1794 833131
775	   Email:               abigail.surtees@roke.co.uk

777	   Mark A West          Tel: +44 1794 833311
778	   Email:               mark.a.west@roke.co.uk

780	   Lawrence Conroy      Tel: +44 1794 833666
781	   Email:               lwc@roke.co.uk

783	   Roke Manor Research Ltd
784	   Romsey, Hants, SO51 0ZN
785	   United Kingdom

787	   Jonathan Rosenberg
788	   dynamicsoft
789	   72 Eagle Rock Avenue
790	   First Floor
791	   East Hanover, NJ 07936
792	   Email: jdrosen@dynamicsoft.com

794	Price et al.                                                 [PAGE 15]
795	Appendix A.  Example sets of tokens

797	   This appendix gives three example sets of tokens which can be
798	   downloaded to the universal decompressor. The first set of tokens
799	   can be used to decompress a very simple variant of LZ77, which
800	   illustrates how the different token types are intended to be used.

802	   The next two examples show how the universal decompressor can be
803	   configured to understand the output of existing compression
804	   algorthms. One set of tokens allows the decompressor to understand
805	   [DEFLATE] compressed messages, whilst the other set allows the
806	   decompressor to understand [LZW] compressed messages.

808	A.1.  Mnemonic language

810	   Presenting the example set of tokens as a set of bytes would not be
811	   very informative, so they are written instead using a simple
812	   mnemonic language. Most importantly, the language allows the
813	   parameters of a token to be specified as text references rather than
814	   as 2-byte integers.

816	   The mnemonic language is translated into bytes as follows:

818	   Tokens:     Token names are given in capitals. Replace each
819	               name with the corresponding 1-byte value as per
820	               Chapter 5.

822	   Labels:     Label names are given as a colon followed by lowercase
823	               text. They are deleted when converting the mnemonics to
824	               bytes.

826	   References: These are indicated by a label name without the
827	               colon. Replace each reference with a 2-byte value
828	               specifying the location of the byte immediately
829	               following the label. Moreover if the reference is
830	               preceded by a "$" symbol then it is interpreted as an
831	               pointer rather than a literal value, and hence the MSB
832	               of the 2-byte value is set to 1.

834	   .short:     Each decimal number following ".short" is interpreted as
835	               a 2-byte integer with MSBs preceding LSBs. Unless
836	               otherwise specified, this is the default.

838	   .byte:      Each decimal number following ".byte" is interpreted as
839	               a single byte.

841	   Whitespace: All whitespace, brackets and commas just delimit the
842	               tokens. Delete.

844	   Comments:   These are indicated by a semicolon and continue
845	               to the end of the line. Delete.

847	Price et al.                                                 [PAGE 16]
848	A.2.  Example token set for simple LZ77 decompression

850	   The first example gives the tokens required to support a very simple
851	   LZ77-based algorithm. The decompressor is instructed to interpret a
852	   compressed message as a set of 4-byte characters, where each
853	   character contains a 2-byte position integer followed by a 2-byte
854	   length integer. Taken together these integers point to a previously
855	   received text string in the byte buffer, which is then copied to the
856	   end of the uncompressed message.

858	   Since the compressor can only send references to strings already
859	   present in the byte buffer, before the first message is decompressed
860	   the buffer must be initialized with a static dictionary containing
861	   the 256 ASCII characters.

863	   :first_token            unpack_static_dictionary
864	   :uncompressed_start     1792
865	   :uncompressed_end       2048
866	   :circular_buffer        2048
867	   :compressed_start       compressed_message_start
868	   :compressed_end         0
869	   :compressed_pointer     0

871	   :bit_offset             0
872	   :position               0
873	   :length                 0
874	   :token_start            next_character

876	   :unpack_static_dictionary

878	   COPY-LITERAL (17, 1, $uncompressed_start)
879	   ADD ($position, 1)
880	   COMPARE ($position, 256, unpack_static_dictionary, continue, 0)

882	   ; The above tokens are used to initialize the static dictionary.

884	   :continue

886	   COPY (token_start, 2, first_token)

888	   :next_character

890	   HUFFMAN ($compressed_pointer, $bit_offset, position, 1, 16, 0)
891	   HUFFMAN ($compressed_pointer, $bit_offset, length, 1, 16, 0)
892	   COPY-LITERAL ($position, $length, $uncompressed_end)
893	   COMPARE ($compressed_pointer, $compressed_end, next_character, 0, 0)

895	   :compressed_message_start

897	A.3.  Example token set for DEFLATE decompression

899	   This example gives the tokens required to understand [DEFLATE]
900	   compressed messages as defined in RFC 1951. Note that the example

902	Price et al.                                                 [PAGE 17]
903	   only covers static Huffman codes (Block Type 01); dynamic Huffman
904	   codes will be added in a future version.

906	   Note also that the order in which the HUFFMAN token decompresses
907	   bits within a byte differs from that specified in RFC 1951. An
908	   alternative version of the HUFFMAN token may be introduced to
909	   support the exact bit order of [DEFLATE] in future (although this is
910	   not an urgent issue, as the order in which bits are decompressed
911	   does not affect the overall compression ratio).

913	   :first_token            next_character
914	   :uncompressed_start     2048
915	   :uncompressed_end       2048
916	   :circular_buffer        2048
917	   :compressed_start       compressed_message_start
918	   :compressed_end         0
919	   :compressed_pointer     0

921	   :bit_offset             0
922	   :index                  0
923	   :extra_length_bits      0
924	   :length_value           0
925	   :extra_distance_bits    0
926	   :distance_value         0

928	   :next_character

930	   HUFFMAN ($compressed_pointer, $bit_offset, index, 4, 7, 16428, 23,
931	            1, 0, 191, 0, 16452, 199, 1, 144)

933	   ; The above HUFFMAN token decompresses a Huffman code from the
934	   ; length/literal alphabet.

936	   COMPARE ($index, 16428, literal, 0, length)

938	   ; The COMPARE token is used to determine whether the decoded
939	   ; character should be interpreted as a length value, a literal value
940	   ; or an end-of-block character. In the latter case the decompressor
941	   ; stops and outputs the uncompressed message.

943	   :literal

945	   COPY-LITERAL (17, 1, $uncompressed_end)
946	   COMPARE ($compressed_pointer, $compressed_end, next_character, 1, 1)

948	   ; If the decoded character is to be interpreted as a literal value
949	   ; then it is copied directly to the uncompressed message. The
950	   ; decompressor then checks whether more compressed characters are
951	   ; available. If it has run out then it pauses and waits for some
952	   ; more.

954	   :length

956	   LSHIFT ($index, 2)

958	Price et al.                                                 [PAGE 18]
959	   COPY ($index, 4, extra_length_bits)

961	   ; If the decoded character is to be interpreted as a length value
962	   ; then as specified in [DEFLATE], the decompressor must first add a
963	   ; certain number of additional bits from the compressed message to
964	   ; this length value.

966	   HUFFMAN ($compressed_pointer, $bit_offset, extra_length_bits, 1,
967	            $extra_length_bits, 0)

969	   ; The above HUFFMAN token obtains the correct number of additional
970	   ; length bits from the compressed message.

972	   ADD ($length_value, $extra_length_bits)

974	   :distance

976	   HUFFMAN ($compressed_pointer, $bit_offset, index, 1, 5, 74)

978	   ; In [DEFLATE] a length code is always followed by a distance code.
979	   ; The above HUFFMAN token decompresses this character.

981	   LSHIFT ($index, 2)
982	   COPY ($index, 4, extra_distance_bits)
983	   HUFFMAN ($compressed_pointer, $bit_offset, extra_distance_bits, 1,
984	            $extra_distance_bits, 0)

986	   ; The above HUFFMAN token obtains the correct number of additional
987	   ; distance bits from the compressed message.

989	   ADD ($distance_value, $extra_distance_bits)
990	   COPY-OFFSET ($distance_value, $length_value, $uncompressed_end)

992	   ; The text string specified by the length and distance codes is then
993	   ; copied to the end of the uncompressed message.

995	   COMPARE ($compressed_pointer, $compressed_end, next_character, 1, 1)

997	   .byte 0 0 0 .short    ; 3 bytes worth of padding

999	   :length_table

1001	   0       3       0       4       0       5
1002	   0       6       0       7       0       8
1003	   0       9       0       10      1       11
1004	   1       13      1       15      1       17
1005	   2       19      2       23      2       27
1006	   2       31      3       35      3       43
1007	   3       51      3       59      4       67
1008	   4       83      4       99      4       115
1009	   5       131     5       163     5       195
1010	   5       227     0       258

1012	   ; This is the length table from Page 11 of [DEFLATE].

1014	Price et al.                                                 [PAGE 19]
1015	   :distance_table

1017	   0       1       0       2       0       3
1018	   0       4       1       5       1       7
1019	   2       9       2       13      3       17
1020	   3       25      4       33      4       49
1021	   5       65      5       97      6       129
1022	   6       193     7       257     7       385
1023	   8       513     8       767     9       1025
1024	   9       1537    10      2049    10      3073
1025	   11      4097    11      6145    12      8193
1026	   12      12289   13      16385   13      24577

1028	   ; This is the distance table from Page 11 of [DEFLATE].

1030	   :compressed_message_start

1032	   ; The compressed messages are stored in the byte buffer following
1033	   ; the above set of tokens.

1035	A.4.  Example token set for LZW decompression

1037	   This example gives the tokens required to understand [LZW]
1038	   compressed messages. This particular variant of LZW uses a codebook
1039	   containing up to 1024 entries, and so each compressed character is a
1040	   10-bit value referencing one of the entries in the codebook.

1042	   :first_token            unpack_static_dictionary
1043	   :uncompressed_start     5888
1044	   :uncompressed_end       1
1045	   :circular_buffer        6144
1046	   :compressed_start       compressed_message_start
1047	   :compressed_end         0
1048	   :compressed_pointer     0

1050	   :bit_offset             0
1051	   :index                  0
1052	   :position_value         0
1053	   :length_value           0
1054	   :codebook_next          2048
1055	   :token_start            next_character

1057	   :unpack_static_dictionary

1059	   COPY-LITERAL (uncompressed_start, 4, $codebook_next)
1060	   COPY-LITERAL (17, 1, $uncompressed_start)
1061	   ADD ($index, 1)
1062	   COMPARE ($index, 256, unpack_static_dictionary, continue, 0)

1064	   ; The above tokens are used to initialize the first 256 entries in
1065	   ; the LZW codebook with single ASCII characters.

1067	   :continue

1069	Price et al.                                                 [PAGE 20]
1070	   COPY (circular_buffer, 2, uncompressed_end)
1071	   COPY (token_start, 2, first_token)

1073	   :next_character

1075	   HUFFMAN ($compressed_pointer, $bit_offset, index, 1, 10, 512)

1077	   ; The above HUFFMAN token extracts 10 bits from the compressed
1078	   ; message.

1080	   LSHIFT ($index, 2)
1081	   COPY ($index, 4, position_value)

1083	   ; The contents of the corresponding codebook entry is retrieved by
1084	   ; the above COPY token.

1086	   COPY-LITERAL (uncompressed_end, 2, $codebook_next)
1087	   COPY-LITERAL ($position_value, $length_value, $uncompressed_end)

1089	   ; The above COPY-LITERAL token appends the selected text string to
1090	   ; the end of the uncompressed message.

1092	   ADD ($length_value, 1)
1093	   COPY-LITERAL (length_value, 2, $codebook_next)

1095	   ; As per the LZW algorithm, a new entry is added to the codebook
1096	   ; containing the text string copied to the end of the uncompressed
1097	   ; message, plus the next character in the uncompressed message.

1099	   COMPARE ($compressed_pointer, $compressed_end, next_character, 0, 0)

1101	   :compressed_message_start

1103	Price et al.                                                 [PAGE 21]