idnits 2.17.1 

draft-ietf-rohc-sigcomp-udvm-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-26) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 42
     longer pages, the longest (page 2) being 59 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([DEFLATE]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 328 has weird spacing: '...v  data    v  ...'

  == Line 830 has weird spacing: '...nnnnnnn   memo...'

  == Line 850 has weird spacing: '...nnnnnnn   memo...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'PAGE 1' is mentioned on line 43, but not defined

  == Missing Reference: 'PAGE 2' is mentioned on line 96, but not defined

  == Missing Reference: 'PAGE 3' is mentioned on line 151, but not defined

  == Missing Reference: 'PAGE 4' is mentioned on line 207, but not defined

  == Missing Reference: 'PAGE 5' is mentioned on line 255, but not defined

  == Missing Reference: 'X' is mentioned on line 805, but not defined

  == Missing Reference: 'PAGE 6' is mentioned on line 310, but not defined

  == Missing Reference: 'PAGE 7' is mentioned on line 365, but not defined

  == Missing Reference: 'PAGE 8' is mentioned on line 421, but not defined

  == Missing Reference: 'PAGE 9' is mentioned on line 476, but not defined

  == Missing Reference: 'PAGE 10' is mentioned on line 532, but not defined

  == Missing Reference: 'PAGE 11' is mentioned on line 587, but not defined

  == Missing Reference: 'PAGE 12' is mentioned on line 640, but not defined

  == Missing Reference: 'PAGE 13' is mentioned on line 693, but not defined

  == Missing Reference: 'PAGE 14' is mentioned on line 748, but not defined

  == Missing Reference: 'PAGE 15' is mentioned on line 804, but not defined

  == Missing Reference: 'N' is mentioned on line 850, but not defined

  == Missing Reference: 'PAGE 16' is mentioned on line 860, but not defined

  == Missing Reference: 'PAGE 17' is mentioned on line 914, but not defined

  == Missing Reference: 'PAGE 18' is mentioned on line 968, but not defined

  == Missing Reference: 'PAGE 19' is mentioned on line 1024, but not defined

  == Missing Reference: 'PAGE 20' is mentioned on line 1079, but not defined

  == Missing Reference: 'PAGE 21' is mentioned on line 1134, but not defined

  == Missing Reference: 'PAGE 22' is mentioned on line 1186, but not defined

  == Missing Reference: 'PAGE 23' is mentioned on line 1240, but not defined

  == Missing Reference: 'PAGE 24' is mentioned on line 1293, but not defined

  == Missing Reference: 'PAGE 25' is mentioned on line 1349, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 1370

  -- Looks like a reference, but probably isn't: '1' on line 1371

  -- Looks like a reference, but probably isn't: '2' on line 1372

  == Missing Reference: 'PAGE 26' is mentioned on line 1404, but not defined

  == Missing Reference: 'PAGE 27' is mentioned on line 1460, but not defined

  == Missing Reference: 'PAGE 28' is mentioned on line 1516, but not defined

  == Missing Reference: 'PAGE 29' is mentioned on line 1569, but not defined

  == Missing Reference: 'PAGE 30' is mentioned on line 1625, but not defined

  == Missing Reference: 'PAGE 31' is mentioned on line 1680, but not defined

  == Missing Reference: 'PAGE 32' is mentioned on line 1735, but not defined

  == Missing Reference: 'PAGE 33' is mentioned on line 1790, but not defined

  == Missing Reference: 'PAGE 34' is mentioned on line 1845, but not defined

  == Missing Reference: 'PAGE 35' is mentioned on line 1900, but not defined

  == Missing Reference: 'PAGE 36' is mentioned on line 1955, but not defined

  == Missing Reference: 'PAGE 37' is mentioned on line 1988, but not defined

  == Missing Reference: 'PAGE 38' is mentioned on line 2044, but not defined

  == Missing Reference: 'PAGE 39' is mentioned on line 2099, but not defined

  == Missing Reference: 'PAGE 40' is mentioned on line 2155, but not defined

  == Missing Reference: 'PAGE 41' is mentioned on line 2208, but not defined

  == Missing Reference: 'PAGE 42' is mentioned on line 2264, but not defined

  == Missing Reference: 'PAGE 43' is mentioned on line 2285, but not defined

  ** Downref: Normative reference to an Informational RFC: RFC 1951 (ref.
     'DEFLATE')

  ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC
     4960)

  ** Obsolete normative reference: RFC 2543 (ref. 'SIP') (Obsoleted by RFC
     3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265)

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     'MD5')


     Summary: 8 errors (**), 0 flaws (~~), 50 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                  Richard Price, Siemens/Roke Manor
3	INTERNET-DRAFT                           Jonathan Rosenberg, dynamicsoft
4	Expires: July 2002                       Carsten Bormann, TZI/Uni Bremen
5	                                                      H. Hannu, Ericsson
6	                                                           Z. Liu, Nokia

8	                                                        28 January, 2002

10	               Universal Decompressor Virtual Machine (UDVM)
11	                   <draft-ietf-rohc-sigcomp-udvm-00.txt>

13	Status of this Memo

15	   This document is an Internet-Draft and is in full conformance with
16	   all provisions of Section 10 of [RFC-2026].

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups. Note that other
20	   groups may also distribute working documents as Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time. It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This document is a submission to the IETF ROHC WG. Comments should be
34	   directed to the mailing list of ROHC, rohc@cdt.luth.se.

36	Abstract

38	   This draft defines a "Universal Decompressor Virtual Machine"
39	   optimized for the task of running decompression algorithms. The UDVM
40	   can be configured to understand the output of many well-known
41	   compressors such as [DEFLATE].

43	Price et al.                                                  [PAGE 1]
44	Revision history

46	   Changes from <draft-ietf-rohc-sigcomp-algorithm-00.txt>

48	     State creation mechanism modified to use MD5 hash for improved
49	     security

51	     Support added for streaming compressed data over TCP

53	     Memory format modified to allow compilation of UDVM code

55	     Additional instructions added for bit manipulation etc.

57	     Feedback mechanism added for bidirectional UDVM operation

59	Table of contents

61	   1.  Introduction.................................................3
62	   2.  Terminology..................................................3
63	   3.  Description of the UDVM architecture.........................5
64	   3.1.  UDVM architecture..........................................5
65	   3.2.  Requirements on application................................7
66	   3.3.  Requirements on transport mechanism........................9
67	   3.4.  Requirements on compressor.................................10
68	   3.5.  Application-defined parameters.............................11
69	   4.  Overview of the UDVM.........................................14
70	   4.1.  UDVM memory allocation.....................................14
71	   4.2.  Well-known variables.......................................15
72	   4.3.  Instruction parameters.....................................15
73	   4.4.  Byte copying...............................................16
74	   5.  Decompressing a compressed message...........................17
75	   5.1.  Invoking the UDVM..........................................17
76	   5.2.  Successful decompression...................................19
77	   5.3.  Decompression failure......................................20
78	   6.  UDVM instruction set.........................................21
79	   6.1.  Bit manipulation instructions..............................22
80	   6.2.  Arithmetic instructions....................................23
81	   6.3.  Memory management instructions.............................23
82	   6.4.  Program flow instructions..................................25
83	   6.5.  I/O instructions...........................................27
84	   7.  Feedback information.........................................31
85	   7.1.  UDVM version...............................................33
86	   7.2.  Memory size and CPU cycles.................................33
87	   7.3.  State identifiers..........................................34
88	   8.  Security considerations......................................34
89	   9.  Acknowledgements.............................................36
90	   10. References...................................................36
91	   11. Authors' addresses...........................................36
92	   Appendix A.  Mnemonic language...................................38
93	   Appendix B.  Example application-defined parameters..............40
94	   Appendix C.  Example decompression algorithms....................42

96	Price et al.                                                  [PAGE 2]
97	1.  Introduction

99	   This draft defines a "Universal Decompressor Virtual Machine" (UDVM).
100	   The UDVM is a virtual machine much like the Java Virtual Machine but
101	   with a key difference: it is designed solely for the purpose of
102	   running decompression algorithms.

104	   The motivation for creating the UDVM is to provide unlimited
105	   flexibility when choosing how to compress a given item of data.
106	   Rather than picking one of a small number of pre-negotiated
107	   compression algorithms, the implementer has the freedom to select an
108	   algorithm of their choice. The compressed data is then combined with
109	   a set of UDVM instructions that allow the original data to be
110	   extracted, and the result is outputted as UDVM bytecode.

112	   Since the UDVM is optimized specifically for running decompression
113	   algorithms, the code size of a typical algorithm is small (often sub
114	   100 bytes). Moreover the UDVM approach does not add significant extra
115	   processing or memory requirements compared to running a fixed pre-
116	   programmed decompression algorithm.

118	2.  Terminology

120	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
121	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
122	   document are to be interpreted as described in [RFC-2119].

124	   Virtual machine

126	     A machine architecture designed to be implemented in software
127	     (although silicon implementations are of course possible).

129	   Universal Decompressor Virtual Machine (UDVM)

131	     The virtual machine described in this draft. The UDVM is designed
132	     specifically for the task of running decompression algorithms.

134	   Bytecode

136	     Machine code that can be executed by a virtual machine. UDVM
137	     bytecode is a combination of UDVM instructions and compressed data.

139	   Application

141	     Entity which invokes the UDVM. The application is also responsible
142	     for supplying the compressed data to the UDVM and making use of the
143	     uncompressed data.

145	   Transport mechanism

147	     Mechanism for passing data between two instances of an application.
148	     The UDVM is designed to work in conjunction with a wide range of
149	     transport mechanisms including TCP, UDP and [SCTP].

151	Price et al.                                                  [PAGE 3]
152	   Message-oriented transport mechanism

154	     A transport mechanism that carries data as a set of distinct,
155	     bounded messages.

157	   Stream-oriented transport mechanism

159	     A transport mechanism that carries data as a continuous stream
160	     with no message boundaries. In this case, the UDVM reserves a
161	     specific character to delimit messages in the compressed stream.

163	   Compressor

165	     Entity which converts application data into compressed data that
166	     can be reconstructed by the UDVM.

168	   Application-defined parameters

170	     Parameters that must be agreed upon by the application invoking the
171	     compressor and the application invoking the UDVM. Depending on the
172	     application these parameters might be fixed a-priori or negotiated.

174	   Per-message compression

176	     Compression that does not reference data from previous messages.
177	     The UDVM can decompress a message of this type using only the
178	     application-defined parameters and the data in the message itself.

180	   Dynamic compression

182	     Compression relative to messages sent prior to the current
183	     compressed message. The UDVM stores and retrieves this data using
184	     the secure state reference mechanism.

186	   State

188	     Information which is saved by the UDVM and retrieved for the
189	     decompression of subsequent messages. For security reasons, state
190	     can only be saved with the permission of the application and can
191	     only be retrieved using an [MD5] hash of the state.

193	   State identifier

195	     A 16-byte value used to access an item of stored state information.
196	     (for security it is the first n bytes of an [MD5] hash of the state
197	     to be accessed). The minimum acceptable value of n is fixed for
198	     security purposes, but implementers can choose higher values of n.

200	   CPU cycles

202	     A measure of the amount of "CPU power" required to execute a UDVM
203	     instruction (the simplest UDVM instructions require a single CPU
204	     cycle). An upper limit is placed on the number of cycles that can
205	     be used to decompress each bit in a compressed message.

207	Price et al.                                                  [PAGE 4]
208	3.  Description of the UDVM architecture

210	   This chapter describes the overall UDVM architecture including the
211	   interfaces between the UDVM and its environment. The requirements on
212	   the entities external to the UDVM are also given.

214	   In the architecture the UDVM is considered to provide a decompression
215	   service for a certain application. The application invokes the UDVM,
216	   and is responsible for supplying compressed data to the UDVM and
217	   making use of the corresponding uncompressed data.

219	   In general the UDVM can offer a decompression service to a wide range
220	   of applications. The principal motivation for developing the UDVM has
221	   been the compression of application-layer protocols, in particular
222	   text-based signaling protocols such as [SIP]. The UDVM architecture
223	   is designed to operate securely and to provide a high compression
224	   ratio for this case.

226	   Note however that the UDVM can be used in any situation provided that
227	   the requirements detailed in this chapter are satisfied by the
228	   application and the transport mechanism.

230	   The following sections describe the overall UDVM architecture and the
231	   requirements on entities external to the UDVM such as the transport
232	   mechanism and the compressor.

234	3.1.  UDVM architecture

236	   The UDVM architecture includes the following basic entities, each of
237	   which is defined in subsequent sections of the document:

239	   UDVM
240	   Application (including state)
241	   Transport mechanism
242	   Compressor

244	   Two variants of the architecture are available, depending on whether
245	   the transport mechanism offers unidirectional or bidirectional data
246	   transport. The unidirectional architecture can be considered to be a
247	   special case of the bidirectional version.

249	   Note that the UDVM itself does not need to know which architecture
250	   has been chosen, because its operation is identical for both cases.

252	   If bidirectional data transport is unavailable or undesirable for any
253	   other reason, then the UDVM architecture is illustrated in Figure 1.

255	Price et al.                                                  [PAGE 5]
256	       +--------------------+                 +--------------------+
257	       |     Compressor     |                 |        UDVM        |
258	       +--------------------+                 +--------------------+
259	            ^    Appl. ^  |                     ^  |  Appl.   ^
260	            |    data  |  |                     |  v  data    |
261	       +----|-------------|-+                 +-|-------------|----+
262	       |    |             | |                 | |            [X]   |
263	       |    |   Appl. 1   v | Compressed data | |    Appl. 2  |    |
264	       |    v             +---------->----------+             v    |
265	       |   +-------+        |                 |        +-------+   |
266	       |   | State |        | Data transport  |        | State |   |
267	       |   +-------+        |                 |        +-------+   |
268	       |                    |                 |                    |
269	       |                    |                 |                    |
270	       |                    |                 |                    |
271	       +--------------------+                 +--------------------+

273	       Figure 1: UDVM architecture for unidirectional data transport

275	   In the unidirectional case the UDVM has two 2-way interfaces to the
276	   application. The first interface passes compressed data from the
277	   application to the UDVM, and provides the corresponding uncompressed
278	   data in return. The second interface allows the UDVM to request the
279	   creation of state (information that may improve the compression ratio
280	   of subsequent messages), and to access previously stored state.

282	   Note that both of these interfaces can be provided as extensions to
283	   an existing application (e.g. a SIP client) or as a "shim" layer
284	   between a compression-unaware client and the UDVM. In the latter
285	   case, the term "application" refers to the combined client and shim
286	   layer.

288	   The [X] symbol denotes that the application has a veto over the
289	   corresponding interface. In this case the application has veto over
290	   the state interface and can refuse state creation requests if it
291	   considers them to be inappropriate. See Section 3.2.2 for further
292	   details.

294	   Note that although the UDVM architecture only shows one compression
295	   entity, it is possible for the UDVM to decompress messages from
296	   multiple compressors at different physical locations in a network.
297	   The UDVM architecture is designed to prevent data from one compressor
298	   interfering with data from a different compressor. A consequence of
299	   this design choice is that it is difficult for a malicious user to
300	   disrupt UDVM operation by inserting false compressed messages on the
301	   transport mechanism.

303	   If the transport mechanism exchanges data in both directions then the
304	   architecture of Figure 2 can also be used. In this case, two
305	   instances of the application communicate using a bidirectional
306	   transport mechanism. Both instances of the application invoke a
307	   compressor to compress their data, and a UDVM to retrieve the
308	   uncompressed data sent by the remote application.

310	Price et al.                                                  [PAGE 6]
311	       +--------------------+                 +--------------------+
312	       |    Compressor 1    |                 |       UDVM 2       |
313	       +--------------------+                 +--------------------+
314	         ^  ^    Appl. ^  |                     ^  |  Appl.   ^  |
315	         |  |    data  |  |                     |  v  data    |  v
316	       +-|--|-------------|-+                 +-|-------------|--|-+
317	       | |  |             | | Compressed data | |            [X][X]|
318	       | |  |   Appl. 1   v |    plus acks    | |   Appl. 2   |  | |
319	       | |  v             +---------->----------+             v  | |
320	       | A +-------+        |                 |        +-------+ A |
321	       | c | State |        | Data transport  |        | State | c |
322	       | k +-------+        |                 |        +-------+ k |
323	       | s  ^             +----------<----------+             ^  s |
324	       | |  |             | | Compressed data | ^             |  | |
325	       |[X][X]            | |    plus acks    | |             |  | |
326	       +-|--|-------------|-+                 +-|-------------|--|-+
327	         ^  |    Appl. ^  |                     |  |  Appl.   |  |
328	         |  v    data  |  v                     |  v  data    v  v
329	       +--------------------+                 +--------------------+
330	       |       UDVM 1       |                 |    Compressor 2    |
331	       +--------------------+                 +--------------------+

333	       Figure 2: UDVM architecture for bidirectional data transport

335	   For a bidirectional transport mechanism an additional interface is
336	   provided from the UDVM, via the application, to the compressor on the
337	   reverse transport channel. This interface can be used to send
338	   feedback information from an application to the remote compressor.
339	   The path taken by feedback data between Application 2 and Compressor
340	   1 is as follows:

342	     Appl. 2 --> Compressor 2 --> UDVM 1 --> Appl. 1 --> Compressor 1

344	   This feedback information monitors the behavior of the UDVM,
345	   including whether data has been successfully decompressed, the amount
346	   of available memory etc. The compressor can make use of this
347	   information to improve the overall compression ratio.

349	   Note that it is an implementation decision whether to use the
350	   feedback channel or not, and compressors must operate successfully
351	   even if no feedback information is received.

353	3.2.  Requirements on application

355	   The application is the entity responsible for invoking the UDVM,
356	   supplying the UDVM with compressed data, and making use of the
357	   corresponding uncompressed data.

359	   Note that in order to use the UDVM decompression service an
360	   application (e.g. a SIP client) will require interfaces to the UDVM.
361	   These interfaces can be provided by extending the original client, or
362	   by providing a "shim" layer between a compression-unaware client and
363	   the UDVM.

365	Price et al.                                                  [PAGE 7]
366	   In addition, two instances of an application MUST agree on how to
367	   invoke the UDVM decompression service, and MUST fix or negotiate a
368	   common set of application-defined parameters (e.g.
369	   maximum_compressed_size) as per Section 3.5.

371	   Certain application-defined parameters can be modified on the fly
372	   using the state creation mechanism and the feedback mechanism. The
373	   application SHOULD additionally provide some external means of
374	   resetting or renegotiating these parameters (possibly by terminating
375	   the decompression service offered by the UDVM).

377	   The UDVM has a total of three interfaces to the environment: a two-
378	   way interface for exchanging compressed and uncompressed data, a two-
379	   way interface for storing and receiving state, and a one-way
380	   interface for forwarding feedback data. To protect against the
381	   malicious establishment of false state or false feedback data all of
382	   the UDVM interfaces pass through the application, and requests for
383	   state creation and feedback can be rejected if they are not
384	   accompanied by a valid uncompressed message.

386	   Each of the three interfaces is described in greater detail below:

388	3.2.1.  Compressed and uncompressed data

390	   The first interface supplies compressed data to the UDVM and
391	   retrieves the corresponding uncompressed data. Note that when the
392	   UDVM is invoked it does not receive any compressed data by default,
393	   but instead requests new data explicitly using a specific
394	   instruction. This means that the first part of a message can be
395	   decompressed without waiting for the entire message to arrive, which
396	   is especially useful over a stream-oriented transport such as TCP.

398	   Uncompressed data is also passed to the application using a specific
399	   instruction. It is an application decision whether to make use of the
400	   data immediately or to buffer and wait for a complete message to be
401	   successfully decompressed.

403	3.2.2.  Storing and retrieving state

405	   To provide security against the malicious insertion of false
406	   compressed data, the contents of the UDVM memory are reinitialized
407	   after each compressed message. This ensures that damaged compressed
408	   messages do not prevent the successful decompression of subsequent
409	   valid messages.

411	   Note however that the overall compression ratio is often
412	   significantly higher if messages can be compressed relative to the
413	   information stored in previous messages. For this reason it is
414	   possible for the UDVM to create "state" information for access when a
415	   later message is being decompressed.

417	   Both the creation and access of state are designed to be secure
418	   against malicious tampering with the compressed data. State can only
419	   be created when a complete message has been successfully

421	Price et al.                                                  [PAGE 8]
422	   decompressed, and the application can veto a state creation request
423	   based on the contents of the decompressed message. This is especially
424	   useful if the application has an authentication mechanism that can be
425	   applied to determine whether the uncompressed data is legitimate.

427	   Furthermore, the UDVM can only access previously created state
428	   information by providing an [MD5] hash of the state to be accessed.
429	   The advantage of using a secure hash to access state information is
430	   that it is very difficult to guess the correct hash value without
431	   complete knowledge of the state being accessed.

433	   Also note that state is not deleted when it is accessed. So even if a
434	   malicious user manages to access state information, subsequent
435	   messages compressed relative to this state can still be successfully
436	   decompressed. Instead, the application is responsible for deleting
437	   state information once it determines that the state will no longer be
438	   needed.

440	3.2.3.  Feedback information

442	   The final interface is only used when the transport mechanism is
443	   bidirectional. It provides feedback information from the UDVM to the
444	   compressor on the reverse channel, and can be used to improve the
445	   overall compression ratio.

447	   Note that the feedback information is forwarded via the application.
448	   Just as for the state interface above, the application can veto
449	   feedback information if it considers the corresponding decompressed
450	   message to be invalid.

452	   If the transport mechanism only provides one-way data transport then
453	   the feedback interface is considered to be null: any feedback
454	   information sent across the interface is simply discarded by the
455	   application.

457	3.3.  Requirements on transport mechanism

459	   The transport mechanism is the entity that passes data between two
460	   instances of an application. Since the motivation for developing the
461	   UDVM has been the compression of signaling protocols such as [SIP],
462	   the UDVM is designed to operate successfully over both stream-
463	   oriented protocols such as TCP and message-oriented protocols such as
464	   UDP.

466	   Note that the UDVM is not given direct access to the underlying
467	   transport mechanism; instead the compressed data is considered to
468	   first pass through the application. It is an application decision
469	   whether to pass all data from the transport mechanism directly to the
470	   UDVM or whether to mix compressed and uncompressed data (e.g. by
471	   restricting compressed data to a certain port).

473	   If the transport mechanism is message-oriented then the UDVM converts
474	   each compressed message into a corresponding uncompressed message. It

476	Price et al.                                                  [PAGE 9]
477	   is not possible for one compressed message to reconstruct multiple
478	   uncompressed messages.

480	   If the transport mechanism is stream-oriented then the UDVM simply
481	   converts a stream of compressed data into a stream of uncompressed
482	   data. However, when running over a stream-oriented transport such as
483	   TCP, applications often insert their own internal message delimiters
484	   into the data stream. As the message is compressed, it will not be
485	   possible to detect these delimiters in the compressed data stream.
486	   Therefore the UDVM provides a similar character that can be inserted
487	   into the compressed data stream to delimit messages (see Section 3.4
488	   for further details).

490	   No assumption is made about the reliability of the transport
491	   mechanism. The UDVM can operate successfully over unreliable
492	   transport mechanisms such as UDP as well as reliable transport
493	   mechanisms such as TCP.

495	   No assumption is made about the security of the transport mechanism.
496	   It may be possible for a malicious user to insert or modify data on
497	   the path between the compressor and the UDVM. In this case, the
498	   design goal of the UDVM is to avoid presenting additional security
499	   risks compared to simply transporting the application data
500	   uncompressed.

502	3.4.  Requirements on compressor

504	   An important feature of the UDVM is that it can decompress data
505	   generated by arbitrary compression algorithms. In particular this
506	   means that it is not necessary to standardize a compression algorithm
507	   for use with the UDVM; instead the choice can be left to the
508	   implementer.

510	   The overall requirement placed on the compressor is that of
511	   transparency, i.e. the compressor MUST NOT send instructions which
512	   cause the UDVM to incorrectly decompress a given message.

514	   The following more specific requirements are also placed on the
515	   compressor (they can be considered particular instances of the
516	   transparency requirement):

518	   *    Since feedback information is purely optional, the compressor
519	        MUST be able to operate successfully even if it receives no
520	        feedback data.

522	   *    It is RECOMMENDED that the compressor supply a CRC over the
523	        uncompressed message to ensure that successful decompression has
524	        occurred. A UDVM instruction is provided to verify this CRC.

526	   *    If the transport mechanism is message-oriented then the
527	        compressor MUST preserve the boundaries between messages.

529	   *    If the transport mechanism is stream-oriented but the
530	        application defines its own internal message boundaries, then

532	Price et al.                                                 [PAGE 10]
533	        the compressor SHOULD preserve the boundaries between messages
534	        by using the "end-of-message" character 0xFFFF reserved in UDVM
535	        bytecode.

537	   The reason for preserving the message boundaries over a stream-
538	   oriented transport is that damage to one compressed message does not
539	   affect the decompression of subsequent messages. Moreover, the
540	   application typically vetoes state creation and feedback requests on
541	   a per-message basis.

543	   Note that the UDVM also reserves the character 0xFF00 over a stream-
544	   oriented transport mechanism, and replaces every instance of 0xFF00
545	   with 0xFF before decompressing the data. This ensures that arbitrary
546	   compression algorithms can be used over a stream-oriented transport,
547	   provided that every instance of 0xFF in the compressed data stream is
548	   identified and replaced with 0xFF00. This "byte-stuffing" scheme
549	   prevents the compression algorithm from inserting a message delimiter
550	   into the data stream where one is not required.

552	3.4.1.  Types of compression algorithm

554	   Any of the following classes of compression algorithm may be useful
555	   depending on the type of application:

557	   *    Generic compressor (for example [DEFLATE] or a similar
558	        algorithm).

560	   *    Protocol-aware compressor offering excellent performance for
561	        one particular type of data (for example the text messages
562	        generated by [SIP]).

564	   *    Hybrid compressor with similar performance to [DEFLATE] for
565	        generic data and superior performance for certain types of data.

567	   Provided that the uncompressed data can be reconstructed at the UDVM
568	   using the available memory and CPU cycles, implementers have freedom
569	   to use a compression algorithm of their choice.

571	   Note that when using an "off-the-shelf" compression algorithm,
572	   bytecode for the corresponding decompressor will need to be made
573	   available at the UDVM. In general the decompressor bytecode is placed
574	   at the front of the first compressed message, unless the application
575	   offers the ability to download UDVM bytecode offline (in which case
576	   the UDVM memory will be initialized already containing a copy of the
577	   decompression algorithm).

579	3.5.  Application-defined parameters

581	   When an application invokes an instance of the UDVM, a number of
582	   parameters are provided by the application to control the UDVM memory
583	   size, maximum number of CPU cycles etc. The application invoking the
584	   UDVM and the application invoking the compressor MUST initially agree
585	   on a common set of values for these parameters.

587	Price et al.                                                 [PAGE 11]
588	   Note that if the transport mechanism is bidirectional then the
589	   application invoking the UDVM can use the reverse channel to indicate
590	   that additional memory or CPU cycles are available (compared to the
591	   values initially agreed by the application invoking the compressor).
592	   The compressor can then make use of these extra resources to improve
593	   the compression ratio.

595	   The feedback mechanism can also advertise that an upgraded version of
596	   the UDVM is available (e.g. offering additional UDVM instructions),
597	   provided that the upgraded version is backwards compatible with the
598	   basic version described in this document. See Chapter 7 for further
599	   details.

601	   Each parameter is described in greater detail below; example values
602	   for the parameters are listed in Appendix B.

604	   UDVM_version

606	     The UDVM_version parameter specifies the level of functionality
607	     available at the UDVM. The basic version of the UDVM (Version 0)
608	     is defined in this document.

610	   maximum_compressed_size

612	     The maximum_compressed_size parameter limits the size of one
613	     compressed message. Decompression failure occurs if a message
614	     larger than the specified value is provided.

616	   maximum_uncompressed_size

618	     The maximum_uncompressed_size parameter limits the size of one
619	     uncompressed message. Decompression failure occurs if a message
620	     larger than the specified value is provided.

622	   minimum_hash_size

624	     The minimum_hash_size parameter specifies the minimum size of the
625	     state identifier that can be used to reference state. This value
626	     needs to be sufficiently large to prevent malicious users from
627	     guessing a state identifier by brute force.

629	   overall_memory_size

631	     The overall_memory_size parameter specifies the total number of
632	     bytes in the UDVM memory.

634	   working_memory_start

636	     The working_memory_start parameter specifies the start of the UDVM
637	     memory area that can be modified. Memory addresses below this
638	     value are considered read-only by the UDVM.

640	Price et al.                                                 [PAGE 12]
641	   working_memory_end

643	     The working_memory_end parameter specifies the end of the UDVM
644	     memory area that can be modified. Memory addresses above this
645	     value are considered read-only by the UDVM.

647	   cycles_per_bit

649	     The cycles_per_bit parameter specifies the number of "CPU cycles"
650	     that can be used to decompress a single bit of data. One CPU cycle
651	     typically corresponds to a single UDVM instruction, although some
652	     of the high-level instructions may require additional cycles.

654	   cycles_per_message

656	     The cycles_per_message parameter specifies the number of additional
657	     CPU cycles made available at the start of a compressed message.
658	     These cycles can be useful when decompressing algorithms that
659	     download additional data on a per-message basis, for example a new
660	     set of Huffman codes as with [DEFLATE].

662	     The total number of "CPU cycles" available for each compressed
663	     message is specified by the following formula:

665	     total_cycles = message_size * cycles_per_bit + cycles_per_message

667	   first_instruction

669	     The first_instruction parameter specifies the memory address of the
670	     first instruction to be executed when the UDVM is initialized.

672	   Initial memory contents

674	     For each new compressed message the UDVM memory is reinitialized
675	     with contents defined by the application. For example, the
676	     application may be able to download UDVM bytecode for a
677	     decompression algorithm before the first compressed message
678	     arrives. In this case, for each new compressed message the UDVM
679	     memory is initialized already containing a copy of the
680	     decompression algorithm.

682	   Initial state

684	     As well as deciding the initial contents of the UDVM memory, the
685	     application can also store useful information in the form of state.
686	     This predefined state will typically contain optional data that can
687	     be used to improve the overall compression ratio, for example a
688	     well-known decompression algorithm or a dictionary of commonly used
689	     [SIP] phrases. Note that unlike state created on the fly by the
690	     UDVM, there is no need for the application-defined state to use an
691	     [MD5] hash as the state identifier.

693	Price et al.                                                 [PAGE 13]
694	4.  Overview of the UDVM

696	   This chapter describes some basic features of the UDVM, including the
697	   memory allocation, well-known variables and instruction parameters.

699	4.1.  UDVM memory allocation

701	   The memory available to the UDVM is partitioned into a number of
702	   sections, providing space for program code, variables and
703	   miscellaneous data:

705	                  <----- working_memory_size ------>

707	   | Fixed values | Variables | Miscellaneous data | Program code |
708	   +--------------+-----------+--------------------+--------------+

710	   <--------------------- overall_memory_size -------------------->

712	                  Figure 3: Memory allocation in the UDVM

714	   Recall that the amount of memory available to the UDVM is defined by
715	   the application-specific parameters overall_memory_size,
716	   working_memory_start and working_memory_end. Note that all of these
717	   parameters are initialized by the application, but can be
718	   renegotiated on the fly using the feedback mechanism of Chapter 7.

720	   The memory area from Address (working_memory_start) to Address
721	   (working_memory_end) inclusive can be used to store arbitrary data
722	   (variables, program code, Huffman codes etc.). UDVM instructions are
723	   allowed to read from or write to any address in this memory area.

725	   The first part of this memory area is typically used to store a
726	   number of 2-byte variables. UDVM instructions can reference these
727	   variables using a special instruction parameter as described in
728	   Section 4.3.

730	   The memory area from Address 0 to Address (working_memory_start - 1)
731	   and from Address (working_memory_end + 1) to Address
732	   (overall_memory_size - 1) inclusive is write-protected, so UDVM
733	   instructions can read from this memory area but cannot write to it.
734	   This memory area is intended for storing UDVM bytecode that can be
735	   compiled.

737	   Any attempt to read memory addresses beyond the overall memory size
738	   or to write to memory addresses outside the working memory area MUST
739	   cause a decompression failure (see Section 5.3).

741	   The first part of the write-protected UDVM memory is intended for
742	   storing variables whose values no longer need to be modified. The
743	   second part of the write-protected memory is intended for storing
744	   program code including UDVM instructions and their associated
745	   parameters. Note that if an instruction references a variable that
746	   has been write-protected, the compiled version of the instruction

748	Price et al.                                                 [PAGE 14]
749	   will typically run faster than if the referenced variable lies in the
750	   working memory area.

752	4.2.  Well-known variables

754	   The first few variables in the UDVM memory have special tasks, for
755	   example specifying the location of the stack used by the CALL and
756	   RETURN instructions. Each of these well-known variables is a 2-byte
757	   integer.

759	   The following list gives the name of each well-known variable and the
760	   memory address at which the variable can be found:

762	   Name:                    Starting memory address:

764	   byte_copy_left                       0
765	   byte_copy_right                      2
766	   stack_location                       4

768	   The MSBs of each variable are always stored before the LSBs. So, for
769	   example, the MSBs of stack_location are stored at Address 4 whilst
770	   the LSBs are stored at Address 5.

772	   The use of each well-known variable is described in the following
773	   sections of the draft.

775	4.3.  Instruction parameters

777	   Each of the UDVM instructions is followed by 0 or more bytes
778	   containing the parameters required by the instruction.

780	   To reduce the code size of a typical UDVM program, each parameter for
781	   a UDVM instruction is compressed using variable-length encoding. The
782	   aim is to store more common parameter values using fewer bits than
783	   rarely occurring values.

785	   Three different types of parameter are available: the literal, the
786	   reference and the multitype. The parameter types that follow each
787	   UDVM instruction are specified in Chapter 6.

789	   The UDVM bytecode for each parameter type is illustrated in Figure 4
790	   to Figure 6, together with the integer values represented by the
791	   bytecode.

793	   Note that the MSBs in the bytecode are illustrated as preceding the
794	   LSBs. Also, any string of bits marked with k consecutive "n"s is to
795	   be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the
796	   MSBs of n illustrated as preceding the LSBs).

798	   The decoded integer value of the bytecode can be interpreted in two
799	   ways. In some cases it is taken to be the actual value of the
800	   parameter. In other cases it is taken to be a memory address at which
801	   the 2-byte parameter value can be found (MSBs found at the specified
802	   address, LSBs found at the following address). The latter case is

804	Price et al.                                                 [PAGE 15]
805	   denoted by memory[X] where X is the address and memory[X] is the 2-
806	   byte value starting at Address X.

808	   The simplest parameter type is the literal (#), which encodes a
809	   constant integer from 0 to 65535 inclusive. A literal parameter may
810	   require between 1 and 3 bytes depending on its value.

812	   Bytecode:                    Parameter value:          Range:

814	   0nnnnnnn                     N                         0 - 127
815	   10nnnnnn nnnnnnnn            N                         0 - 16383
816	   11000000 nnnnnnnn nnnnnnnn   N                         0 - 65535

818	              Figure 4: Bytecode for a literal (#) parameter

820	   The second parameter type is the reference ($), which is always used
821	   to access a 2-byte value located elsewhere in the UDVM memory. The
822	   bytecode for a reference parameter is decoded to be a constant
823	   integer from 0 to 65535 inclusive, which is interpreted as the memory
824	   address containing the actual value of the parameter.

826	   Bytecode:                    Parameter value:          Range:

828	   0nnnnnnn                     memory[2 * N]             0 - 254
829	   10nnnnnn nnnnnnnn            memory[2 * N]             0 - 32766
830	   11000000 nnnnnnnn nnnnnnnn   memory[N]                 0 - 65535

832	             Figure 5: Bytecode for a reference ($) parameter

834	   The third kind of parameter is the multitype (%), which can be used
835	   to encode both actual values and memory addresses. The multitype
836	   parameter also offers efficient encoding for small integer values
837	   (both positive and negative) and for powers of 2.

839	   Bytecode:                    Parameter value:          Range:

841	   00nnnnnn                     N                         0 - 63
842	   01nnnnnn                     memory[2 * N]             0 - 126
843	   1000011n                     2 ^ (N + 6)              64 - 128
844	   10001nnn                     2 ^ (N + 8)             256 - 32768
845	   111nnnnn                     N + 65504             65504 - 65535
846	   1001nnnn nnnnnnnn            N + 61440             61440 - 65535
847	   101nnnnn nnnnnnnn            N                         0 - 8191
848	   110nnnnn nnnnnnnn            memory[N]                 0 - 8191
849	   10000000 nnnnnnnn nnnnnnnn   N                         0 - 65535
850	   10000001 nnnnnnnn nnnnnnnn   memory[N]                 0 - 65535

852	             Figure 6: Bytecode for a multitype (%) parameter

854	4.4.  Byte copying

856	   A number of UDVM instructions require a string of bytes to be copied
857	   to and from areas of the UDVM memory. This section defines how the
858	   byte copying operation should be performed.

860	Price et al.                                                 [PAGE 16]
861	   In general, the string of bytes is copied in ascending order of
862	   memory address. So if a byte is copied from/to Address n then the
863	   next byte is copied from/to Address n + 1. As usual, if a byte is
864	   read from an address beyond the overall memory size or is written to
865	   an address outside the working memory area then decompression failure
866	   occurs.

868	   Note however that if a byte is copied from/to the memory address
869	   specified in byte_copy_right, the byte copy operation continues by
870	   copying the next byte from/to the memory address specified in
871	   byte_copy_left. This is useful for setting up a "circular buffer"
872	   within the UDVM memory.

874	   Note that the string of bytes is copied on a purely byte-by-byte
875	   basis. In particular, some of the later bytes to be copied may
876	   themselves have been written into the UDVM memory by the byte copying
877	   operation currently being performed.

879	   Equally, it is possible for a byte copying operation to overwrite the
880	   instruction that called the byte copy. If this occurs then the byte
881	   copying operation MUST be completed as if the original instruction
882	   were still in place in the UDVM memory (this also applies if
883	   byte_copy_left or byte_copy_right are overwritten).

885	5.  Decompressing a compressed message

887	   This chapter lists the steps involved in the decompression of a
888	   single compressed message.

890	5.1.  Invoking the UDVM

892	   Whenever the application receives a message to be decompressed, it
893	   invokes a new instance of the UDVM. The overall_memory_size and
894	   initial contents of the UDVM memory are initialized using the
895	   corresponding application-defined parameters. The following steps are
896	   then taken:

898	   1.)   The number of remaining CPU cycles is set equal to the
899	   application-defined parameter cycles_per_message.

901	   Notes:

903	   The amount of compressed data available to the UDVM is exactly one
904	   compressed message. If the transport mechanism is stream-oriented
905	   then the UDVM uses the reserved byte string 0xFFFF to delimit the
906	   compressed messages: the UDVM takes the data between a pair of
907	   neighboring reserved byte strings to be a single compressed message.
908	   The reserved byte string itself is not considered to be part of the
909	   compressed message.

911	   For a stream-oriented transport, the UDVM parses the compressed data
912	   stream for instances of 0xFF and takes the following actions:

914	Price et al.                                                 [PAGE 17]
915	   Occurs in data stream:               Action:

917	   0xFFFF                               Delimit compressed message
918	   0xFF00                               Replace with 0xFF
919	   0xFF01 - 0xFFFE                      Decompression failure

921	   The reserved character 0xFF00 is useful for byte stuffing (if a
922	   compression algorithm generates compressed data containing the
923	   character 0xFF then it should be replaced by the character 0xFF00 to
924	   avoid accidentally inserting a message delimiter into the compressed
925	   data stream).

927	   The compressed data is not provided to the UDVM by default. Instead,
928	   the UDVM requests compressed data using the INPUT instructions
929	   (useful when running over a stream-oriented transport since there is
930	   no need to wait for the entire compressed message before
931	   decompression can begin). Note that in particular, this means that
932	   the application MUST define the initial contents of the UDVM memory
933	   to contain at least one INPUT instruction. See Appendix B for an
934	   example of how the application might initialize the UDVM memory.

936	   The application MUST NOT make more than one compressed message
937	   available to a given instance of the UDVM. In particular, the
938	   application MUST NOT concatenate two messages to form a single
939	   compressed message. This is because compressed messages are typically
940	   padded with trailing zero bits so that they are a whole number of
941	   bytes long. Concatenating two messages would cause these padding bits
942	   to be incorrectly interpreted as compressed data.

944	   2.)   Next, the instructions contained within the UDVM memory are
945	   executed beginning at the address specified in first_instruction.

947	   Notes:

949	   The instructions are executed consecutively unless otherwise
950	   indicated (for example when the UDVM encounters a JUMP instruction).

952	   If the next instruction to be executed lies outside the available
953	   memory then decompression failure occurs (see Section 5.3).

955	   3.)   Each time an instruction is executed the number of available
956	   CPU cycles is decreased by the amount specified in Chapter 6.
957	   Additionally, if the UDVM requests n bits of compressed data (using
958	   one of the INPUT instructions) then the number of available CPU
959	   cycles is increased by n * cycles_per_bit.

961	   Notes:

963	   This means that the total number of CPU cycles available for
964	   processing a compressed message is given by the formula:

966	     total_cycles = cycles_per_message + message_size * cycles_per_bit

968	Price et al.                                                 [PAGE 18]
969	   The reason that this total is not allocated to the UDVM when it is
970	   invoked is that the UDVM can begin to decompress a message that has
971	   only been partially received. So the total message size may not be
972	   known when the UDVM is initialized.

974	   4.)   The UDVM stops executing instructions when it encounters an
975	   END-MESSAGE instruction or if decompression failure occurs.

977	   Notes:

979	   The UDVM passes uncompressed data to the application using the OUTPUT
980	   instruction. The OUTPUT instruction can be used to output a partially
981	   decompressed message; it is an application decision whether to use
982	   the data immediately or whether to buffer and wait until the entire
983	   message has been decompressed.

985	   The UDVM passes state creation and feedback requests to the
986	   application using the END-MESSAGE instruction. This means that it is
987	   only possible to make a state creation and a feedback request once
988	   the message has been decompressed, which is necessary since the
989	   application typically checks the validity of these requests based on
990	   the contents of the decompressed message.

992	5.2.  Successful decompression

994	   The END-MESSAGE instruction indicates that the compressed message has
995	   been successfully decompressed and passed to the application. Note
996	   that the actual uncompressed message is outputted beforehand using
997	   the OUTPUT instruction; this allows the UDVM to output each part of
998	   the message to the application as soon as it has been decompressed.

1000	   The END-MESSAGE instruction provides two additional pieces of
1001	   information to the application: the state creation request and the
1002	   feedback data. The state creation request mechanism is discussed
1003	   below; feedback information is discussed separately in Chapter 7.

1005	   The UDVM may optionally save part of its memory for retrieval by
1006	   later messages. However to prevent malicious storage of a large
1007	   amount of unnecessary state information, the application MUST give
1008	   permission before any state can be created. The application typically
1009	   makes a decision on whether state can be created based on the
1010	   contents of the decompressed message, particularly if the message
1011	   contains authentication data that can verify whether or not the
1012	   sender is legitimate.

1014	   The END-MESSAGE instruction requests the creation of state using the
1015	   parameters state start and state length, which together denote a byte
1016	   string state_value. Provided that the application gives permission,
1017	   state_value is byte copied from the UDVM memory (obeying the rules of
1018	   Section 4.4) and stored together with a 16-byte state identifier that
1019	   can be used to access the state by a later compressed message.

1021	   To provide security against malicious access, the identifier for any
1022	   item of state created by the UDVM is derived from the [MD5] hash of

1024	Price et al.                                                 [PAGE 19]
1025	   the state_value to be stored. The state identifier is constructed by
1026	   taking the 16-byte [MD5] hash and replacing all but the first
1027	   hash_length most significant bytes with zeroes. Note that if
1028	   hash_length is 16 then the unmodified [MD5] hash is the state
1029	   identifier. Decompression failure occurs if hash_length is less than
1030	   the application-defined parameter minimum_hash_size or greater than
1031	   16.

1033	   Each item of state stores the following information (accessed by the
1034	   state_identifier):

1036	   state_identifier
1037	   state start
1038	   state length
1039	   state_value
1040	   state_instruction

1042	   Note that state_start, state_length and state_instruction are all
1043	   parameters from the END-MESSAGE instruction, whereas state_identifier
1044	   and state_value are created as specified above.

1046	   If a state creation request is made with a state identifier that has
1047	   been used by existing state, then the request fails automatically.

1049	   This state can subsequently be accessed by using the STATE-REFERENCE
1050	   and STATE-EXECUTE instructions (by providing the correct state
1051	   identifier).

1053	5.3.  Decompression failure

1055	   If a compressed message given to the UDVM is corrupted (either
1056	   accidentally or maliciously) then the UDVM may terminate with a
1057	   decompression failure.

1059	   Reasons for decompression failure include the following:

1061	   *    A compressed or uncompressed message exceeds the maximum size
1062	        defined by the application.

1064	   *    The UDVM exceeds the available CPU cycles for decompressing a
1065	        message.

1067	   *    The UDVM attempts to read a memory address beyond the overall
1068	        memory size, or to write into a memory address outside the
1069	        working memory area.

1071	   *    An unknown instruction type is encountered.

1073	   *    An unknown parameter type is encountered.

1075	   *    An instruction is encountered that cannot be processed
1076	        successfully by the UDVM (for example a RETURN instruction when
1077	        no CALL instruction has previously been encountered).

1079	Price et al.                                                 [PAGE 20]
1080	   *    The UDVM attempts to access non-existent state.

1082	   *    A manual decompression failure is triggered using the
1083	        DECOMPRESSION-FAILURE instruction.

1085	   If a decompression failure occurs when decompressing a message then
1086	   the UDVM informs the application and takes no further action. It is
1087	   the responsibility of the application to decide how to cope with the
1088	   decompression failure. In general an application SHOULD discard the
1089	   compressed message and any decompressed data that has been outputted.

1091	6.  UDVM instruction set

1093	   The UDVM currently understands 28 instructions, chosen to support the
1094	   widest possible range of compression algorithms with the minimum
1095	   possible overhead.

1097	   Figure 7 lists the different instructions and the bytecode values
1098	   used to store the instructions at the UDVM. The cost of each
1099	   instruction in CPU cycles is also given:

1101	   Instruction:              Bytecode value:   Cost in CPU cycles:

1103	   DECOMPRESSION-FAILURE            0           1
1104	   AND                              1           1
1105	   OR                               2           1
1106	   NOT                              3           1
1107	   ADD                              4           1
1108	   SUBTRACT                         5           1
1109	   MULTIPLY                         6           1
1110	   DIVIDE                           7           1
1111	   LOAD                             8           1
1112	   MULTILOAD                        9           1 + n
1113	   WORKING-MEMORY                   10          1
1114	   COPY                             11          1 + length
1115	   COPY-LITERAL                     12          1 + length
1116	   COPY-OFFSET                      13          1 + length + offset
1117	   JUMP                             14          1
1118	   COMPARE                          15          1
1119	   CALL                             16          1
1120	   RETURN                           17          1
1121	   SWITCH                           18          1 + n
1122	   CRC                              19          1 + length
1123	   END-MESSAGE                      20          1 + state length
1124	   OUTPUT                           21          1 + output_length
1125	   NBO                              22          1
1126	   INPUT-BYTECODE                   23          1 + length
1127	   INPUT-FIXED                      24          1
1128	   INPUT-HUFFMAN                    25          1 + n
1129	   STATE-REFERENCE                  26          1 + state_length
1130	   STATE-EXECUTE                    27          1 + state length

1132	       Figure 7: UDVM instructions and corresponding bytecode values

1134	Price et al.                                                 [PAGE 21]
1135	   Each UDVM instruction costs a minimum of 1 CPU cycle. Certain high-
1136	   level instructions may cost additional cycles depending on the value
1137	   of one of the instruction parameters.

1139	   The only exception when calculating the number of CPU cycles is that
1140	   the STATE-EXECUTE instruction takes (1 + state_length) cycles even
1141	   though it does not have a state_length parameter; instead the value
1142	   of state length is provided by the application as part of the state
1143	   being accessed.

1145	   All instructions are stored as a single byte to indicate the
1146	   instruction type, followed by 0 or more bytes containing the
1147	   parameters required by the instruction. The instruction specifies
1148	   which of the three parameter types of Section 4.3 is used in each
1149	   case. For example, the ADD instruction is followed by two parameters
1150	   as shown below:

1152	   ADD ($parameter_1, %parameter_2)

1154	   When converted into bytecode the number of bytes required by the ADD
1155	   instruction depends on the size of each parameter value, and whether
1156	   the second (multitype) parameter contains the parameter value itself
1157	   or a memory address where the actual value of the parameter can be
1158	   found.

1160	   The instruction set available for the UDVM offers a mix of low-level
1161	   and high-level instructions. The high-level instructions can all be
1162	   emulated using the low-level instructions provided, but given a
1163	   choice it is generally preferable to use a single instruction rather
1164	   than a large number of general-purpose instructions. The resulting
1165	   bytecode will be more compact (leading to a higher overall
1166	   compression ratio) and decompression will typically be faster because
1167	   the implementation of the compression-specific instructions can be
1168	   optimized for the UDVM.

1170	   Each instruction is explained in more detail below:

1172	6.1.  Bit manipulation instructions

1174	   The AND, OR and NOT instructions provide simple bit manipulation on
1175	   2-byte words.

1177	   AND ($parameter_1, %parameter_2)
1178	   OR ($parameter_1, %parameter_2)
1179	   NOT ($parameter_1)

1181	   After the operation is complete, the value of the first parameter is
1182	   overwritten with the result. Note that since this parameter is a
1183	   reference, the memory address specified by the parameter is always
1184	   overwritten and not the parameter itself.

1186	Price et al.                                                 [PAGE 22]
1187	6.2.  Arithmetic instructions

1189	   The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform
1190	   arithmetic on 2-byte words.

1192	   ADD ($parameter_1, %parameter_2)
1193	   SUBTRACT ($parameter_1, %parameter_2)
1194	   MULTIPLY ($parameter_1, %parameter_2)
1195	   DIVIDE ($parameter_1, %parameter_2)

1197	   After the operation is complete, the first parameter is overwritten
1198	   with the result.

1200	   Note that in all cases the arithmetic operation is performed modulo
1201	   2^16. So for example, subtracting 1 from 0 gives the result 65535.

1203	   For the SUBTRACT instruction the second parameter is subtracted from
1204	   the first. Similarly, for the DIVIDE instruction the first parameter
1205	   is divided by the second parameter. Note that if the second parameter
1206	   does not divide exactly into the first parameter then the remainder
1207	   is ignored.

1209	6.3.  Memory management instructions

1211	   The following instructions are used to manipulate the UDVM memory.
1212	   Bytes can be copied from one area of memory to another, and areas of
1213	   memory can be write-protected to make it easier for UDVM code to be
1214	   compiled.

1216	6.3.1.  LOAD

1218	   The LOAD instruction sets a 2-byte variable to a certain specified
1219	   value. The format of a LOAD instruction is as follows:

1221	   LOAD (%address, %value)

1223	   The first parameter specifies the starting address of the 2-byte
1224	   variable, whilst the second parameter specifies the value to be
1225	   loaded into this variable. As usual, MSBs are stored before LSBs in
1226	   the UDVM memory.

1228	6.3.2.  MULTILOAD

1230	   The MULTILOAD instruction sets a contiguous block of 2-byte variables
1231	   to specified values.

1233	   MULTILOAD (%address, #n, %value_0, ..., %value_n-1)

1235	   The first parameter specifies the starting address of the contiguous
1236	   variables, whilst the parameters value_0 through to value_n-1 specify
1237	   the values to load into these variables (in the same order as they
1238	   appear in the instruction).

1240	Price et al.                                                 [PAGE 23]
1241	6.3.3.  WORKING-MEMORY

1243	   The WORKING-MEMORY instruction is used to prevent part of the UDVM
1244	   memory from being modified. This can be very useful when offering
1245	   UDVM code for compilation.
1246	   WORKING-MEMORY (%memory_start, %memory_end)

1248	   The parameters memory_start and memory_end specify the new working
1249	   memory area for the UDVM. These parameters replace the application-
1250	   defined parameters working_memory_start and working_memory_end, but
1251	   only while the current message is being decompressed. When a new
1252	   instance of the UDVM is invoked the working memory area is set by the
1253	   original application-defined parameters.

1255	   If memory_end < memory_start, or if the parameters reference a memory
1256	   address beyond the overall UDVM memory size, then decompression
1257	   failure occurs.

1259	   After the WORKING-MEMORY instruction has been encountered, the only
1260	   way to write into UDVM memory within the protected region is to
1261	   cancel the protection using another WORKING-MEMORY instruction (or to
1262	   invoke a new instance of the UDVM).

1264	6.3.4.  COPY

1266	   The COPY instruction is used to copy a string of bytes from one part
1267	   of the UDVM memory to another.

1269	   COPY (%position, %length, %destination)

1271	   The position parameter specifies the memory address of the first byte
1272	   in the string to be copied, and the length parameter specifies the
1273	   number of bytes to be copied.

1275	   The destination parameter gives the address to which the first byte
1276	   in the string will be copied.

1278	   Note that byte copying is performed as per the rules of Section 4.4.

1280	6.3.5.  COPY-LITERAL

1282	   A modified version of the COPY instruction is given below:

1284	   COPY-LITERAL (%position, %length, $destination)

1286	   The COPY-LITERAL instruction behaves as a COPY instruction except
1287	   that after copying, the destination parameter is replaced with the
1288	   memory address immediately following the address to which the final
1289	   byte was copied. If the final byte was copied to the memory address
1290	   specified in byte_copy_right, the destination parameter is set to the
1291	   memory address specified in byte_copy_left.

1293	Price et al.                                                 [PAGE 24]
1294	6.3.6.  COPY-OFFSET

1296	   A further version of the COPY-LITERAL instruction is given below:

1298	   COPY-OFFSET (%offset, %length, $destination)

1300	   The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction
1301	   except that an offset parameter is given instead of a position
1302	   parameter.

1304	   To derive a suitable position parameter, starting at the memory
1305	   address specified by destination, the UDVM counts backwards a total
1306	   of offset memory addresses. If the memory address specified in
1307	   byte_copy_left is reached, the next memory address is taken to be
1308	   byte_copy_right.

1310	   The COPY-OFFSET instruction then behaves as a COPY-LITERAL
1311	   instruction, taking the position parameter to be the last memory
1312	   address reached in the above step.

1314	6.4.  Program flow instructions

1316	   The following instructions alter the flow of UDVM code. Each
1317	   instruction jumps to one of a number of memory addresses based on a
1318	   certain specified criterion. Note that all of the instructions give
1319	   the memory addresses in the form of deltas relative to the memory
1320	   address of the instruction. The actual memory address is calculated
1321	   as follows:

1323	   memory_address = (memory_address_of_instruction + delta) modulo 2^16

1325	   Note that certain I/O instructions (see Section 6.5) can also alter
1326	   program flow.

1328	6.4.1.  JUMP

1330	   The JUMP instruction moves program execution to the specified memory
1331	   address.

1333	   JUMP (%delta)

1335	   Note that if the address (specified as a delta from the address of
1336	   the JUMP instruction) lies beyond the overall UDVM memory size then
1337	   decompression failure occurs.

1339	6.4.2.  COMPARE

1341	   The COMPARE instruction compares two parameters and then jumps to one
1342	   of three specified memory addresses depending on the result.

1344	   COMPARE (%parameter_1, %parameter_2, %delta_1, %delta_2, %delta_3)

1346	   If parameter_1 < parameter_2 then the UDVM continues instruction
1347	   execution at the (relative) memory address specified by delta 1. If

1349	Price et al.                                                 [PAGE 25]
1350	   parameter_1 = parameter_2 then it jumps to the address specified by
1351	   delta_2. If parameter_1 > parameter_2 then it jumps to the address
1352	   specified by delta_3.

1354	6.4.3.  CALL and RETURN

1356	   The CALL and RETURN instructions provide support for compression
1357	   algorithms with a nested structure.

1359	   CALL (%delta)

1361	   RETURN

1363	   The CALL and RETURN instructions make use of a stack of 2-byte
1364	   variables stored at the memory address specified by the well-known
1365	   variable stack_location. The stack contains the following variables:

1367	   Name:                    Starting memory address:

1369	   stack_free                   stack_location
1370	   stack[0]                     stack_location + 2
1371	   stack[1]                     stack_location + 4
1372	   stack[2]                     stack_location + 6
1373	      :                                 :

1375	   The MSBs of these variables are stored before the LSBs in the UDVM
1376	   memory.

1378	   When the UDVM reaches a CALL instruction, it finds the memory address
1379	   of the instruction immediately following the CALL instruction and
1380	   copies this 2-byte value into stack[stack_free] ready for later
1381	   retrieval. It then increases stack_free by 1 and continues
1382	   instruction execution at the (relative) memory address specified by
1383	   the parameter.

1385	   When the UDVM reaches a RETURN instruction it decreases stack_free by
1386	   1, and then continues instruction execution at the byte position
1387	   stored in stack[stack_free].

1389	   If the variable stack_free is ever increased beyond 65535 or
1390	   decreased below 0 then a bad compressed message has been received and
1391	   decompression failure occurs (see Section 5.3).

1393	   Decompression failure also occurs if one of the above instructions is
1394	   encountered and the value of stack_location is smaller than 6 (this
1395	   prevents the stack from overwriting the well-known variables).

1397	6.4.4.  SWITCH

1399	   The SWITCH instruction performs a conditional jump based on the value
1400	   of one of its parameters.

1402	   SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1)

1404	Price et al.                                                 [PAGE 26]
1405	   When a SWITCH instruction is encountered the UDVM reads the value of
1406	   j. It then continues instruction execution at the (relative) address
1407	   specified by delta j.

1409	   If j specifies a value of n or more, a bad compressed message has
1410	   been received and decompression failure occurs.

1412	6.4.5.  CRC

1414	   The CRC instruction verifies a string of bytes using a 2-byte CRC.

1416	   CRC (%value, %position, %length, %delta)

1418	   The actual CRC calculation is performed using the generator
1419	   polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte
1420	   Frame Check Sequence (FCS) of [RFC-1662].

1422	   The position and length parameters define the string of bytes over
1423	   which the CRC is evaluated. Byte copying rules are enforced as per
1424	   Section 4.4.

1426	   Important note: Since a CRC calculation is always performed over a
1427	   bitstream, for interoperability it is necessary to define the order
1428	   in which bits are supplied within each individual byte. In this case
1429	   the MSBs of the byte MUST be supplied to the CRC calculation before
1430	   the LSBs.

1432	   The value parameter contains the expected integer value of the 2-byte
1433	   CRC. If the calculated CRC matches the expected value then the UDVM
1434	   continues at the following instruction. Otherwise the UDVM jumps to
1435	   the (relative) memory address specified by delta.

1437	6.5.  I/O instructions

1439	   The following instructions allow the UDVM to interface with its
1440	   environment. Note that in the current UDVM architecture all of the
1441	   interfaces pass through the application (which has a veto over any
1442	   information supplied to or from the UDVM).

1444	6.5.1.  END-MESSAGE

1446	   The END-MESSAGE instruction successfully terminates the UDVM and
1447	   passes feedback and state information to the application.

1449	   END-MESSAGE (%hash_length, %state_start, %state_length,
1450	   %state_instruction, %feedback_location)

1452	   The actions taken by the UDVM upon encountering the END-MESSAGE
1453	   instruction are described in Section 5.2.

1455	6.5.2.  DECOMPRESSION-FAILURE

1457	   The DECOMPRESSION-FAILURE instruction triggers a manual decompression
1458	   failure. This is useful if the UDVM program discovers that it cannot

1460	Price et al.                                                 [PAGE 27]
1461	   successfully decompress the message (e.g. by using the CRC
1462	   instruction).

1464	   This instruction has no parameters.

1466	6.5.3.  OUTPUT

1468	   The OUTPUT instruction provides successfully decompressed data to the
1469	   application.

1471	   OUTPUT (%output_start, %output_length)

1473	   The parameters define the starting memory address and length of the
1474	   byte string to be provided to the application. Note that the OUTPUT
1475	   instruction can be used to output a partially decompressed message;
1476	   each time the instruction is encountered it appends a byte string to
1477	   the end of the data previously passed to the application via the
1478	   OUTPUT instruction.

1480	   The string of data is byte copied from the UDVM memory obeying the
1481	   rules of Section 4.4.

1483	   Decompression failure occurs if the cumulative number of bytes
1484	   provided to the application exceeds the application-defined parameter
1485	   maximum_uncompressed_size.

1487	   Since there is technically a difference between outputting a 0-byte
1488	   decompressed message, and not outputting a decompressed message at
1489	   all, the OUTPUT instruction needs to distinguish between the two
1490	   cases. Thus, if the UDVM terminates before encountering an OUTPUT
1491	   instruction it is considered not to have outputted a decompressed
1492	   message. If it encounters one or more OUTPUT instructions, each of
1493	   which provides 0 bytes of data to the application, then it is
1494	   considered to have outputted a 0-byte decompressed message.

1496	6.5.4.  NBO

1498	   The NBO instruction modifies the order in which compressed bits are
1499	   passed to the UDVM.

1501	   As the INPUT-FIXED and INPUT-HUFFMAN instructions read individual
1502	   bits from within a byte, to avoid ambiguity it is necessary to define
1503	   the order in which these bits are read. The default operation is to
1504	   read the MSBs before the LSBs, but if the NBO instruction is
1505	   encountered then the LSBs are read before the MSBs. Both cases are
1506	   illustrated below:

1508	    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB

1510	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1511	   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8|
1512	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1514	        Byte 0          Byte 1              Byte 0          Byte 1

1516	Price et al.                                                 [PAGE 28]
1517	           Default operation                 After NBO instruction

1519	   The NBO instruction can only be used before bitwise compressed data
1520	   is passed to the UDVM. Therefore, a decompression failure occurs if
1521	   it is encountered after an INPUT-FIXED or an INPUT-HUFFMAN
1522	   instruction has been used.

1524	6.5.5.  INPUT-BYTECODE

1526	   The INPUT-BYTECODE instruction requests a certain number of bytes of
1527	   compressed data from the application.

1529	   INPUT-BYTECODE (%length, %destination, %delta)

1531	   The length parameter indicates the requested number of bytes of
1532	   compressed data, and the destination parameter specifies the starting
1533	   memory address to which they should be copied. Byte copying is
1534	   performed as per the rules of Section 4.4.

1536	   If the instruction requests data that lies beyond the end of the
1537	   compressed message, no data is returned. Instead the UDVM moves
1538	   program execution to the memory address specified by the formula
1539	   (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16.

1541	   The INPUT-BYTECODE instruction can only be used before bitwise
1542	   compressed data is passed to the UDVM. Therefore, a decompression
1543	   failure occurs if it is encountered after an INPUT-FIXED or an INPUT-
1544	   HUFFMAN instruction has been used.

1546	6.5.6.  INPUT-FIXED

1548	   The INPUT-FIXED instruction requests a certain number of bits of
1549	   compressed data from the application.

1551	   INPUT-FIXED (%length, %destination, %delta)

1553	   The length parameter indicates the requested number of bits. If this
1554	   parameter does not lie between 1 and 16 inclusive then a
1555	   decompression failure occurs.

1557	   The destination parameter specifies the memory address to which the
1558	   compressed data should be copied. Note that the requested bits are
1559	   interpreted as a 2-byte integer ranging from 0 to 2^length - 1. Under
1560	   default operation the MSBs of this integer are provided first, but if
1561	   an NBO instruction has been executed then the LSBs are provided
1562	   first.

1564	   If the instruction requests data that lies beyond the end of the
1565	   compressed message, no data is returned. Instead the UDVM moves
1566	   program execution to the memory address specified by the formula
1567	   (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16.

1569	Price et al.                                                 [PAGE 29]
1570	6.5.7.  INPUT-HUFFMAN

1572	   The INPUT-HUFFMAN instruction requests a variable number of bits of
1573	   compressed data from the application. The instruction initially
1574	   requests a small number of bits and compares the result against a
1575	   certain criterion; if the criterion is not met then additional bits
1576	   are requested until the criterion is achieved.

1578	   The INPUT-HUFFMAN instruction is followed by three mandatory
1579	   parameters plus n additional sets of parameters. Every additional set
1580	   contains four parameters as shown below:

1582	   INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1,
1583	   %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n,
1584	   %upper_bound_n, %uncompressed_n)

1586	   Note that if n = 0 then the INPUT-HUFFMAN instruction is ignored by
1587	   the UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then
1588	   decompression failure occurs.

1590	   In all other cases, the behavior of the INPUT-HUFFMAN instruction is
1591	   defined below:

1593	   1.)   Set j = 1.

1595	   2.)   Request an additional bits_j compressed bits. Interpret the
1596	   total (bits_1 + ... + bits_j) bits of compressed data requested so
1597	   far as an integer H, with the first bit to be supplied as the MSB and
1598	   the last bit to be supplied as the LSB (note that this is always the
1599	   case, independently of whether the NBO instruction has been used).

1601	   3.)   If data is requested that lies beyond the end of the compressed
1602	   message, terminate the INPUT-HUFFMAN instruction and move program
1603	   execution to the memory address specified by the formula
1604	   (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16.

1606	   4.)   If (H < lower_bound_j) or (H > upper_bound_j) then set j = j +
1607	   1. Then go back to Step 2, unless j > n in which case decompression
1608	   failure occurs.

1610	   5.)   Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the
1611	   memory address specified by the destination parameter.

1613	6.5.8.  STATE-REFERENCE

1615	   The STATE-REFERENCE instruction retrieves some previously stored
1616	   state information.

1618	   STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length,
1619	   %state_destination)

1621	   The id_start and id_length parameters specify the location of the
1622	   state identifier used to retrieve the state information. The state
1623	   identifier is always 16 bytes long; if id_length is less than 16 then

1625	Price et al.                                                 [PAGE 30]
1626	   the remaining least significant bytes of the identifier are padded
1627	   with zeroes.

1629	   Decompression failure occurs if id_length is greater than 16.
1630	   Decompression failure also occurs if no state information matching
1631	   the state identifier can be found.

1633	   Note that when accessing state information that has been previously
1634	   created by the UDVM, the state identifier is always taken from an
1635	   [MD5] hash of the state to be retrieved. However this is not
1636	   necessarily the case for application-defined state as per Section
1637	   3.5.

1639	   The state_start and state_length parameters define the starting byte
1640	   and number of bytes to copy from the state_value contained in the
1641	   identified item of state. If more state is requested than is actually
1642	   available then decompression failure occurs.

1644	   The state_destination parameter contains a UDVM memory address. The
1645	   requested state is byte copied to this memory address using the rules
1646	   of Section 4.4.

1648	6.5.9.  STATE-EXECUTE

1650	   The STATE-EXECUTE instruction retrieves and runs some previously
1651	   stored state information.

1653	   STATE-EXECUTE (%id_start, %id_length)

1655	   The id_start and id_length parameters function as per the STATE-
1656	   REFERENCE instruction.

1658	   STATE-EXECUTE is similar to STATE-REQUEST except that it does not
1659	   require the amount of state being requested or the proposed
1660	   destination for the state to be specified explicitly. Instead, it
1661	   simply puts the state back into the UDVM memory using the original
1662	   parameters from the END-MESSAGE instruction that created the state.

1664	   The entire state_value (all state length bytes of it) is byte copied
1665	   into the memory address specified by state start The UDVM then jumps
1666	   to the (absolute) memory address specified by state_instruction.

1668	   Note that state start, state length and state_instruction are all
1669	   stored together with state_value as part of an item of state
1670	   information.

1672	7.  Feedback information

1674	   If the transport mechanism offers bidirectional data transport then
1675	   the compression ratio can be improved by sending feedback
1676	   information. Since feedback data is optional, compressors must be
1677	   able to function correctly even if no feedback information is
1678	   provided.

1680	Price et al.                                                 [PAGE 31]
1681	   In the bidirectional UDVM architecture, suppose that Application 2
1682	   wishes to send feedback information to Compressor 1. The path taken
1683	   by the feedback information is as follows:

1685	     Appl. 2 --> Compressor 2 --> UDVM 1 --> Appl. 1 --> Compressor 1

1687	   The first hop along the path is between Application 2 and Compressor
1688	   2. If permitted by the application, Compressor 2 MAY be supplied with
1689	   some or all of the following items of data:

1691	   overall_memory_size
1692	   cycles_per_bit
1693	   cycles_per_message
1694	   id lengths and id values of successfully established state

1696	   Since the design of each compressor is left as an implementation
1697	   decision, there is no need to standardize the format in which this
1698	   data is provided to Compressor 2.

1700	   The second hop along the path is between Compressor 2 and UDVM 1. For
1701	   this step Compressor 2 transmits the feedback information to UDVM 1
1702	   across the same transport mechanism used to carry compressed data.
1703	   Typically this feedback information is piggybacked onto existing
1704	   compressed messages (standalone feedback messages are generally
1705	   vetoed by the application due to the lack of a corresponding
1706	   decompressed message).

1708	   Note that Compressor 2 can send the feedback information compressed
1709	   in order to reduce the total number of bits transmitted. Equally,
1710	   Compressor 2 may opt not to send feedback information at all.

1712	   If Compressor 2 chooses not to send feedback information then it sets
1713	   the feedback_location parameter in the END-MESSAGE instruction to 0.
1714	   Otherwise, it copies the following block of data to the memory of
1715	   UDVM 1 and places the starting memory address of this block in the
1716	   feedback_location parameter:

1718	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1719	   |       UDVM_version        |
1720	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1721	   |    overall_memory_size    |
1722	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1723	   |      cycles_per_bit       |
1724	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1725	   |    cycles_per_message     |
1726	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1727	   |S|            n            |
1728	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1729	   |        id_length 1        |
1730	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1731	   |                           |
1732	   :        id_value_1         :
1733	   |                           |

1735	Price et al.                                                 [PAGE 32]
1736	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1737	   |        id_length 2        |
1738	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1739	   |                           |
1740	   :        id_value_2         :
1741	   |                           |
1742	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1743	            :         :
1744	            :         :
1745	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1746	   |        id_length n        |
1747	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1748	   |                           |
1749	   :        id_value_n         :
1750	   |                           |
1751	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1753	   Each of the items of data is explained in greater detail below:

1755	7.1.  UDVM version

1757	   The first 2 bytes of feedback data specify whether only the basic
1758	   version of the UDVM is available, or whether an upgraded version of
1759	   the UDVM is available offering additional instructions, feedback data
1760	   etc.

1762	   The basic version of the UDVM is Version 0, which is the version
1763	   described in this document. Upgraded versions MUST be backwards-
1764	   compatible with the basic version in the following sense:

1766	   *    If some UDVM bytecode reaches the END-MESSAGE or DECOMPRESSION-
1767	        FAILURE instructions when running on Version 0 of the UDVM, then
1768	        the upgraded version MUST run the bytecode in an identical
1769	        manner.

1771	   This condition ensures that all bytecode that is valid for Version 0
1772	   of the UDVM will continue to be valid for upgraded versions of the
1773	   UDVM. However, bytecode that is invalid on Version 0 of the UDVM
1774	   (i.e. bytecode that produces a decompression failure that is not
1775	   manually triggered) may become valid on upgraded versions.

1777	   Examples of how to upgrade the UDVM in a backwards-compatible manner
1778	   include: adding new UDVM instructions, adding more items of feedback
1779	   data etc.

1781	7.2.  Memory size and CPU cycles

1783	   The next 6 bytes of feedback data specify new values for the
1784	   application-defined parameters overall_memory_size, cycles_per_bit
1785	   and cycles_per_message. This allows Application 2 to inform
1786	   Compressor 1 that it has additional memory or processing power
1787	   available that could be used to improve the overall compression
1788	   ratio.

1790	Price et al.                                                 [PAGE 33]
1791	   Note that the feedback data can only be used to increase the amount
1792	   of resources available for Compressor 1 to use. If the feedback data
1793	   specifies a parameter value that is smaller than the value already
1794	   possessed by Compressor 1, the parameter keeps its original value
1795	   (i.e. the feedback data for this parameter is simply ignored).

1797	   The reason for this behavior is that if UDVM 2 is initialized with
1798	   more memory than expected by Compressor 1 then no problem arises, but
1799	   if UDVM 2 is initialized with less memory that expected by Compressor
1800	   1 then decompression failure may occur. Therefore, only allowing the
1801	   parameter values to increase means that the feedback mechanism is
1802	   robust against message loss or reordering on the feedback channel.

1804	   The parameters can only be restored to their original values if reset
1805	   or renegotiated by the application.

1807	7.3.  State identifiers

1809	   The variable n specifies the number of state identifiers to be
1810	   acknowledged.

1812	   Each state identifier is usually the first few bytes from an [MD5]
1813	   hash of the state being acknowledged. When a state identifier is
1814	   placed in the feedback information of UDVM 1, it is known by
1815	   Compressor 1 that the corresponding state has been successfully
1816	   established and can be referenced in future by using a STATE-
1817	   REFERENCE or a STATE-EXECUTE instruction. The feedback information
1818	   includes the length and value of each hash to be acknowledged.

1820	   Note that the MSB of n has a special meaning; if set to 1 then it
1821	   acknowledges the state that is currently being created by UDVM_1 via
1822	   the END-MESSAGE instruction. This saves having to transmit the
1823	   id_length and id_value explicitly on the feedback channel.

1825	8.  Security considerations

1827	   The following chapter identifies the potential security risks
1828	   associated with the overall UDVM architecture, and details the
1829	   proposed solution for each risk.

1831	   ** Avoid snooping into state of other users

1833	   State can only be accessed using a state identifier, which is a
1834	   (prefix of a) cryptographic hash of the state being referenced. This
1835	   implies that the referencing packet already needs knowledge about the
1836	   state. To enforce this, a minimum reference length of 48 bits is
1837	   RECOMMENDED for applications running over an unsecure transport
1838	   mechanism. This also minimises the probability of an accidental state
1839	   collision.

1841	   Generally, ways to obtain knowledge about the state identifier (e.g.,
1842	   passive attacks) will also easily provide knowledge about the state
1843	   referenced, so no new vulnerability results.

1845	Price et al.                                                 [PAGE 34]
1846	   The application needs to handle state identifiers with the same care
1847	   it would handle the state itself.

1849	   ** Avoid DoS vulnerabilities

1851	   *** Use of the UDVM as a tool in a DoS attack to another target

1853	   The UDVM cannot easily be used as an amplifier in a reflection
1854	   attack, as it only generates one decompressed message per incoming
1855	   compressed message. This packet is then handed to the application;
1856	   the utility as a reflection amplifier is therefore limited by the
1857	   utility of the application.

1859	   However, it must be noted that the UDVM can be used to generate
1860	   larger packets as input to the application than have to be sent from
1861	   the malicious sender; this therefore can send smaller packets (at a
1862	   lower bandwidth) than are delivered to the application. Depending on
1863	   the reflection characteristics of the application, this can be
1864	   considered a mild form of amplification. The application MUST limit
1865	   the number of packets reflected to a potential target - even if the
1866	   UDVM is used to generate a large amount of information from a small
1867	   incoming attack packet.

1869	   *** Attacking the UDVM as the DoS target by filling it with state

1871	   Excessive state can only be installed by a malicious sender (or a set
1872	   of malicious senders) with the consent of the application. The system
1873	   consisting of UDVM and application is thus approximately as
1874	   vulnerable as the application itself, unless it allows the
1875	   installation of state from a message where it would not have
1876	   installed state itself.

1878	   If this is desirable to increase the compression ratio, the effect
1879	   can be mitigated by adding feedback at the application level that
1880	   indicates whether the state was actually installed - this allows a
1881	   system under attack to gracefully degrade by no longer installing
1882	   compressor state that is not matched by application state.

1884	   *** Attacking the UDVM by faking state or making unauthorized changes
1885	   to state

1887	   State cannot be destroyed or changed by a malicious sender - it can
1888	   only add new state.

1890	   *** Attacking the UDVM by sending it looping code

1892	   The application sets an upper limit to the number of "CPU cycles"
1893	   that can be used per compressed message and per input bit in the
1894	   compressed message. The damage inflicted by sending packets with
1895	   looping code is therefore limited, although this may still be
1896	   substantial if a large number of CPU cycles are offered by the UDVM.
1897	   However, this would be true for any decompressor that can receive
1898	   packets from anywhere.

1900	Price et al.                                                 [PAGE 35]
1901	9.  Acknowledgements

1903	   Individual compression algorithms such as [DEFLATE] have been
1904	   important sources of ideas and knowledge.

1906	      Thanks to

1908	         Abigail Surtees (abigail.surtees@roke.co.uk)
1909	         Mark A West (mark.a.west@roke.co.uk)
1910	         Lawrence Conroy (lwc@roke.co.uk)
1911	         Christian Schmidt (christian.schmidt@icn.siemens.de)
1912	         Max Riegel (maximilian.riegel@icn.siemens.de)
1913	         Jan Christoffersson (jan.christoffersson@epl.ericsson.se)
1914	         Stefan Forsgren (stefan.forsgren@epl.ericsson.se)
1915	         Krister Svanbro (krister.svanbro@epl.ericsson.se)
1916	         Christopher Clanton (christopher.clanton@nokia.com)
1917	         Khiem Le (khiem.le@nokia.com)
1918	         Ka Cheong Leung (kacheong.leung@nokia.com)

1920	      for valuable input and review.

1922	10. References

1924	   [DEFLATE]    "DEFLATE Compressed Data Format Specification version
1925	                1.3", P. Deutsch, RFC 1951, Internet Engineering Task
1926	                Force, May 1996

1928	   [SCTP]       "Stream Control Transmission Protocol", Stewart et al,
1929	                RFC 2960, Internet Engineering Task Force, October 2000

1931	   [SIP]        "SIP: Session Initiation Protocol", Handley et al,
1932	                RFC 2543, Internet Engineering Task Force, March 1999

1934	   [MD5]        "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321,
1935	                Internet Engineering Task Force, April 1992

1937	   [RFC-1662]   "PPP in HDLC-like Framing", Simpson et al, Internet
1938	                Engineering Task Force, July 1994

1940	   [RFC-2026]   "The Internet Standards Process - Revision 3", Scott
1941	                Bradner, Internet Engineering Task Force, October 1996

1943	   [RFC-2119]   "Key words for use in RFCs to Indicate Requirement
1944	                Levels", Scott Bradner, Internet Engineering Task Force,
1945	                March 1997

1947	11. Authors' addresses

1949	   Richard Price        Tel: +44 1794 833681
1950	   Email:               richard.price@roke.co.uk

1952	   Roke Manor Research Ltd
1953	   Romsey, Hants, SO51 0ZN

1955	Price et al.                                                 [PAGE 36]
1956	   United Kingdom

1958	   Jonathan Rosenberg
1959	   Email:               jdrosen@dynamicsoft.com

1961	   dynamicsoft
1962	   72 Eagle Rock Avenue
1963	   First Floor
1964	   East Hanover, NJ 07936

1966	   Carsten Bormann      Tel: +49 421 218 7024
1967	   Email:               cabo@tzi.org

1969	   Universitaet Bremen TZI
1970	   Postfach 330440
1971	   D-28334 Bremen, Germany

1973	   Hans Hannu           Tel: +46 920 20 21 84
1974	   Email:               hans.hannu@epl.ericsson.se

1976	   Box 920
1977	   Ericsson Erisoft AB
1978	   SE-971 28 Lulea, Sweden

1980	   Zhigang Liu          Tel: +1 972 894-5935
1981	   Email:               zhigang.liu@nokia.com

1983	   Nokia Research Center
1984	   6000 Connection Drive
1985	   Irving, TX 75039
1986	   USA

1988	Price et al.                                                 [PAGE 37]
1989	Appendix A.  Mnemonic language

1991	   Writing UDVM programs directly in bytecode would be a daunting task,
1992	   so a simple mnemonic language is provided to facilitate the creation
1993	   of new decompression algorithms. Most importantly, the language
1994	   allows the parameters of an instruction to be specified as text names
1995	   rather than as integer values.

1997	   If an instruction parameter is given as a text name, it should
1998	   correspond to exactly one instance of a label, a reserved memory
1999	   address or an externally defined keyword. A label is simply a text
2000	   name preceded by a colon, for example:

2002	   :loop
2003	   JUMP (loop)

2005	   For any parameters corresponding to a label, the integer value of the
2006	   parameter is calculated by the following formula:

2008	    parameter_value = (instruction_address - label_address) modulo 2^16

2010	   Note that the "label address" is simply the memory address of the
2011	   instruction immediately following the label. In particular, the above
2012	   example can be rewritten as JUMP (0).

2014	   A reserved memory address is specified using the "reserve" keyword
2015	   followed by a text_name and (optionally) an integer value. For
2016	   example:

2018	   reserve apples
2019	   reserve pears (8)
2020	   reserve bananas
2021	   LOAD (bananas, 5)

2023	   For any parameters corresponding to a reserved memory address, the
2024	   integer value of the parameter is the next free memory address that
2025	   has not yet been reserved. Starting at this address, the specified
2026	   number of bytes of memory are then reserved (if no value is given
2027	   then a total of 2 bytes is reserved).

2029	   The first instance of a "reserve" keyword begins reserving memory at
2030	   Address 6 (to avoid overwriting the three well-known variables of
2031	   Section 4.2). So the above example can be rewritten as LOAD (16, 5).

2033	   An externally defined keyword is specified outside of the mnemonic
2034	   language. All of the application-defined parameters are considered to
2035	   be externally defined keywords and can be referenced in the mnemonic
2036	   code (useful for adapting the code based on the available memory or
2037	   CPU cycles). The following additional keywords can also be used:

2039	   Keyword:                        Corresponding value:

2041	   byte_copy_left                           0
2042	   byte_copy_right                          2

2044	Price et al.                                                 [PAGE 38]
2045	   stack_location                           4
2046	   reserved_end                         See below
2047	   bytecode_length                      See below
2048	   total_length                         See below

2050	   The keyword reserved_end specifies the highest reserved memory
2051	   address for the entire mnemonic code (taking into account all the
2052	   occasions where memory is reserved).

2054	   The keyword bytecode_length specifies the total size of the bytecode
2055	   corresponding to the mnemonic code. Any instances of bytecode_length
2056	   are initially replaced with 3 bytes of zeroes, and then are filled in
2057	   after the remainder of the bytecode has been generated.

2059	   Similarly, the keyword total_length specifies the total amount of
2060	   memory required at the UDVM including bytecode and reserved memory
2061	   addresses.

2063	   A complete description of the mnemonic language and how it should be
2064	   translated into bytecode is given below:

2066	   Instructions:     Instruction names are given in capitals. Replace
2067	                     each name with the corresponding 1-byte value as
2068	                     per Chapter 6.

2070	   $:                When appended to the front of an instruction
2071	                     parameter then the parameter is a memory address
2072	                     rather than a direct value. This symbol is
2073	                     mandatory for reference parameters, optional for
2074	                     multitype parameters and disallowed for literals.

2076	   Integers:         Instruction parameters can be given in the form of
2077	                     decimal integers. They are converted into the
2078	                     shortest bytecode capable of representing the
2079	                     integer by the rules of Section 4.3.

2081	   Text references:  Instruction parameters can also be given in the
2082	                     form of lowercase names. These names should match
2083	                     exactly one label, reserved memory address or
2084	                     externally defined keyword as described above.

2086	   Labels:           Label names are given as a colon followed by
2087	                     lowercase text. They are deleted when converting
2088	                     the mnemonics to bytecode.

2090	   Reserved memory:  Memory addresses are reserved using the "reserve"
2091	                     keyword. The line containing the reserve keyword
2092	                     is deleted when converting to bytecode.

2094	   .LSB:             When appended to the end of a text name, the
2095	                     integer value corresponding to the name is
2096	                     increased by 1. This is useful for addressing the
2097	                     LSBs of a 2-byte variable.

2099	Price et al.                                                 [PAGE 39]
2100	   0b, 0d:           Bytecode values can be specified directly in
2101	                     binary or decimal via the appropriate prefix. The
2102	                     direct bytecode continues until a character occurs
2103	                     that is not an integer or whitespace.

2105	   Whitespace:       All whitespace (plus brackets and commas) just
2106	                     delimit the instructions. Delete.

2108	   Comments:         These are indicated by a semicolon and continue
2109	                     to the end of the line. Delete.

2111	   Once the mnemonic code has been converted into bytecode, it can be
2112	   executed by copying the bytecode into the UDVM memory beginning at
2113	   the first memory address that has not been reserved by an instance of
2114	   the "reserve" keyword. Program execution is assumed to begin at this
2115	   address.

2117	   Note that further to the rules outlined above, well-written mnemonic
2118	   code will also have the following properties:

2120	   *    Any instance of a memory address will be specified as a text
2121	        reference rather than an integer value. This ensures that the
2122	        mnemonic code is portable.

2124	   *    The mnemonic code will not write to any memory address except
2125	        those reserved by the "reserve" keyword. This ensures that the
2126	        code can be compiled.

2128	Appendix B.  Example application-defined parameters

2130	   This appendix gives some example values for each of the application-
2131	   defined parameters. These values are geared towards the compression
2132	   of a text-based protocol running over UDP or TCP, for example a
2133	   signaling protocol such as [SIP].

2135	   Note that all of the proposed values are fixed and not negotiated
2136	   between the two instances of the application invoking the compressor
2137	   and the UDVM. This is because it is possible for the application
2138	   invoking the UDVM to receive compressed messages from several
2139	   different applications, and it is difficult to determine which
2140	   message corresponds to which application. [SIP] does this using
2141	   "From:" and "To:" fields in the message itself, but these are not
2142	   visible until the message has been decompressed. It is simpler just
2143	   to fix a set of parameter for every instance of the application.

2145	   UDVM_version                         0
2146	   maximum_compressed_size              65535
2147	   maximum_uncompressed_size            65535
2148	   minimum_hash_size                    6
2149	   overall_memory_size                  8192
2150	   working_memory_start                 0
2151	   working_memory_end                   8191
2152	   cycles_per_bit                       20
2153	   cycles_per_message                   2000

2155	Price et al.                                                 [PAGE 40]
2156	   first_instruction                    26
2157	   Note that the parameters overall_memory_size, cycles_per_bit and
2158	   cycles_per_message can be increased on the fly using the feedback
2159	   mechanism of Chapter 7. This mechanism is designed to be function
2160	   correctly even when the application invoking the UDVM is sent
2161	   compressed messages from several different applications.

2163	   The initial contents of the UDVM memory also need to be defined. It
2164	   is not enough simply to initialize the memory containing all zeroes,
2165	   as the UDVM would be unable to input any compressed data. Instead,
2166	   for each new compressed message the memory should be initialized
2167	   containing a simple decompressor capable of extracting the first few
2168	   bytes of compressed data. These bytes can then be interpreted as UDVM
2169	   instructions for a more powerful decompression algorithm, a state
2170	   reference to retrieve a previously stored algorithm etc.

2172	   As an example, the following mnemonic code can be converted to
2173	   bytecode and pasted into the UDVM memory beginning at Address 26:

2175	   reserve length
2176	   reserve destination
2177	   reserve hash (16)

2179	   INPUT-BYTECODE (1, length, fail)
2180	   COMPARE (length, 16, retrieve_state, retrieve_state, new_code)

2182	   :retrieve_state

2184	   INPUT-BYTECODE ($length, hash, fail)
2185	   STATE-EXECUTE (hash, $length)

2187	   :new_code

2189	   INPUT-BYTECODE (2, destination, fail)
2190	   INPUT-BYTECODE ($length, $destination, fail)
2191	   SUBTRACT ($destination, execute_new_code)

2193	   :execute_new_code

2195	   JUMP ($destination)

2197	   :fail

2199	   DECOMPRESSION-FAILURE

2201	   The mnemonic code requests a single byte of compressed data, which is
2202	   considered to be a length from 0 to 255. Lengths from 0 to 16
2203	   inclusive are interpreted as the length of a hash value that is used
2204	   to retrieve and run bytecode previously stored as state. Lengths from
2205	   17 to 255 are interpreted as an amount of new UDVM bytecode to be
2206	   extracted from the start of the compressed data.

2208	Price et al.                                                 [PAGE 41]
2209	   Finally, the application can define initial state that is available
2210	   to the UDVM. Examples of application-defined state include common
2211	   decompression algorithms, dictionaries of common text phrases etc.

2213	Appendix C.  Example decompression algorithms

2215	   This appendix gives examples of decompression algorithms which can be
2216	   downloaded to the UDVM in the form of bytecode.

2218	C.1.  Example UDVM code for simple LZ77 decompression

2220	   The first example gives the code required to decompress data from a
2221	   very simple LZ77-based algorithm. The UDVM is instructed to interpret
2222	   a compressed message as a set of 4-byte characters, where each
2223	   character contains a 2-byte position integer followed by a 2-byte
2224	   length integer. Taken together these integers point to a previously
2225	   received text string in the UDVM memory, which is then copied to the
2226	   end of the uncompressed message.

2228	   Since the compressor can only send references to strings already
2229	   present in the UDVM memory, before the first message is decompressed
2230	   the memory must be initialized with a static dictionary containing
2231	   the 256 ASCII characters.

2233	   The algorithm write-protects the memory containing the UDVM
2234	   instructions used to decompress each character, so that they can
2235	   easily be compiled to improve the speed of decompression.

2237	   A 2-byte CRC over the uncompressed message is appended to the end of
2238	   the compressed message, to verify that correct decompression has
2239	   occurred. The algorithm also requests that the contents of the UDVM
2240	   memory be saved using the state request mechanism, so that it can be
2241	   retrieved by sending the appropriate 6-byte hash.

2243	   reserve byte_copy_left
2244	   reserve byte_copy_right
2245	   reserve uncompressed_start
2246	   reserve uncompressed_end
2247	   reserve uncompressed_length
2248	   reserve position
2249	   reserve length
2250	   reserve static_dictionary (256)
2251	   reserve circular_buffer (2048)

2253	   WORKING-MEMORY (uncompressed_start, reserved_end)
2254	   MULTILOAD (0, 7, circular_buffer, reserved_end, static_dictionary,
2255	   circular_buffer, 0, 0, 0)

2257	   :unpack_static_dictionary

2259	   ; The following instructions initialize the static dictionary.

2261	   COPY-LITERAL (position.LSB, 1, $uncompressed_start)
2262	   ADD ($position, 1)

2264	Price et al.                                                 [PAGE 42]
2265	   COMPARE ($position, 256, unpack_static_dictionary, next_character, 0)

2267	   :next_character

2269	   INPUT-FIXED (16, position, fail)
2270	   INPUT-FIXED (16, length, end_of_message)
2271	   COPY-LITERAL ($position, $length, $uncompressed_end)
2272	   ADD ($uncompressed_length, $length)
2273	   JUMP (next_character)

2275	   :fail

2277	   DECOMPRESSION-FAILURE

2279	   :end_of_message

2281	   CRC ($position, $uncompressed_start, $uncompressed_length, fail)
2282	   OUTPUT ($uncompressed_start, $uncompressed_length)
2283	   END-MESSAGE (6, 0, total_length, next_character, 0)

2285	Price et al.                                                 [PAGE 43]