idnits 2.17.1 draft-ietf-rohc-sigcomp-udvm-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 42 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([DEFLATE]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 328 has weird spacing: '...v data v ...' == Line 830 has weird spacing: '...nnnnnnn memo...' == Line 850 has weird spacing: '...nnnnnnn memo...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'PAGE 1' is mentioned on line 43, but not defined == Missing Reference: 'PAGE 2' is mentioned on line 96, but not defined == Missing Reference: 'PAGE 3' is mentioned on line 151, but not defined == Missing Reference: 'PAGE 4' is mentioned on line 207, but not defined == Missing Reference: 'PAGE 5' is mentioned on line 255, but not defined == Missing Reference: 'X' is mentioned on line 805, but not defined == Missing Reference: 'PAGE 6' is mentioned on line 310, but not defined == Missing Reference: 'PAGE 7' is mentioned on line 365, but not defined == Missing Reference: 'PAGE 8' is mentioned on line 421, but not defined == Missing Reference: 'PAGE 9' is mentioned on line 476, but not defined == Missing Reference: 'PAGE 10' is mentioned on line 532, but not defined == Missing Reference: 'PAGE 11' is mentioned on line 587, but not defined == Missing Reference: 'PAGE 12' is mentioned on line 640, but not defined == Missing Reference: 'PAGE 13' is mentioned on line 693, but not defined == Missing Reference: 'PAGE 14' is mentioned on line 748, but not defined == Missing Reference: 'PAGE 15' is mentioned on line 804, but not defined == Missing Reference: 'N' is mentioned on line 850, but not defined == Missing Reference: 'PAGE 16' is mentioned on line 860, but not defined == Missing Reference: 'PAGE 17' is mentioned on line 914, but not defined == Missing Reference: 'PAGE 18' is mentioned on line 968, but not defined == Missing Reference: 'PAGE 19' is mentioned on line 1024, but not defined == Missing Reference: 'PAGE 20' is mentioned on line 1079, but not defined == Missing Reference: 'PAGE 21' is mentioned on line 1134, but not defined == Missing Reference: 'PAGE 22' is mentioned on line 1186, but not defined == Missing Reference: 'PAGE 23' is mentioned on line 1240, but not defined == Missing Reference: 'PAGE 24' is mentioned on line 1293, but not defined == Missing Reference: 'PAGE 25' is mentioned on line 1349, but not defined -- Looks like a reference, but probably isn't: '0' on line 1370 -- Looks like a reference, but probably isn't: '1' on line 1371 -- Looks like a reference, but probably isn't: '2' on line 1372 == Missing Reference: 'PAGE 26' is mentioned on line 1404, but not defined == Missing Reference: 'PAGE 27' is mentioned on line 1460, but not defined == Missing Reference: 'PAGE 28' is mentioned on line 1516, but not defined == Missing Reference: 'PAGE 29' is mentioned on line 1569, but not defined == Missing Reference: 'PAGE 30' is mentioned on line 1625, but not defined == Missing Reference: 'PAGE 31' is mentioned on line 1680, but not defined == Missing Reference: 'PAGE 32' is mentioned on line 1735, but not defined == Missing Reference: 'PAGE 33' is mentioned on line 1790, but not defined == Missing Reference: 'PAGE 34' is mentioned on line 1845, but not defined == Missing Reference: 'PAGE 35' is mentioned on line 1900, but not defined == Missing Reference: 'PAGE 36' is mentioned on line 1955, but not defined == Missing Reference: 'PAGE 37' is mentioned on line 1988, but not defined == Missing Reference: 'PAGE 38' is mentioned on line 2044, but not defined == Missing Reference: 'PAGE 39' is mentioned on line 2099, but not defined == Missing Reference: 'PAGE 40' is mentioned on line 2155, but not defined == Missing Reference: 'PAGE 41' is mentioned on line 2208, but not defined == Missing Reference: 'PAGE 42' is mentioned on line 2264, but not defined == Missing Reference: 'PAGE 43' is mentioned on line 2285, but not defined ** Downref: Normative reference to an Informational RFC: RFC 1951 (ref. 'DEFLATE') ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 2543 (ref. 'SIP') (Obsoleted by RFC 3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265) ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. 'MD5') Summary: 8 errors (**), 0 flaws (~~), 50 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Richard Price, Siemens/Roke Manor 3 INTERNET-DRAFT Jonathan Rosenberg, dynamicsoft 4 Expires: July 2002 Carsten Bormann, TZI/Uni Bremen 5 H. Hannu, Ericsson 6 Z. Liu, Nokia 8 28 January, 2002 10 Universal Decompressor Virtual Machine (UDVM) 11 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of [RFC-2026]. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This document is a submission to the IETF ROHC WG. Comments should be 34 directed to the mailing list of ROHC, rohc@cdt.luth.se. 36 Abstract 38 This draft defines a "Universal Decompressor Virtual Machine" 39 optimized for the task of running decompression algorithms. The UDVM 40 can be configured to understand the output of many well-known 41 compressors such as [DEFLATE]. 43 Price et al. [PAGE 1] 44 Revision history 46 Changes from 48 State creation mechanism modified to use MD5 hash for improved 49 security 51 Support added for streaming compressed data over TCP 53 Memory format modified to allow compilation of UDVM code 55 Additional instructions added for bit manipulation etc. 57 Feedback mechanism added for bidirectional UDVM operation 59 Table of contents 61 1. Introduction.................................................3 62 2. Terminology..................................................3 63 3. Description of the UDVM architecture.........................5 64 3.1. UDVM architecture..........................................5 65 3.2. Requirements on application................................7 66 3.3. Requirements on transport mechanism........................9 67 3.4. Requirements on compressor.................................10 68 3.5. Application-defined parameters.............................11 69 4. Overview of the UDVM.........................................14 70 4.1. UDVM memory allocation.....................................14 71 4.2. Well-known variables.......................................15 72 4.3. Instruction parameters.....................................15 73 4.4. Byte copying...............................................16 74 5. Decompressing a compressed message...........................17 75 5.1. Invoking the UDVM..........................................17 76 5.2. Successful decompression...................................19 77 5.3. Decompression failure......................................20 78 6. UDVM instruction set.........................................21 79 6.1. Bit manipulation instructions..............................22 80 6.2. Arithmetic instructions....................................23 81 6.3. Memory management instructions.............................23 82 6.4. Program flow instructions..................................25 83 6.5. I/O instructions...........................................27 84 7. Feedback information.........................................31 85 7.1. UDVM version...............................................33 86 7.2. Memory size and CPU cycles.................................33 87 7.3. State identifiers..........................................34 88 8. Security considerations......................................34 89 9. Acknowledgements.............................................36 90 10. References...................................................36 91 11. Authors' addresses...........................................36 92 Appendix A. Mnemonic language...................................38 93 Appendix B. Example application-defined parameters..............40 94 Appendix C. Example decompression algorithms....................42 96 Price et al. [PAGE 2] 97 1. Introduction 99 This draft defines a "Universal Decompressor Virtual Machine" (UDVM). 100 The UDVM is a virtual machine much like the Java Virtual Machine but 101 with a key difference: it is designed solely for the purpose of 102 running decompression algorithms. 104 The motivation for creating the UDVM is to provide unlimited 105 flexibility when choosing how to compress a given item of data. 106 Rather than picking one of a small number of pre-negotiated 107 compression algorithms, the implementer has the freedom to select an 108 algorithm of their choice. The compressed data is then combined with 109 a set of UDVM instructions that allow the original data to be 110 extracted, and the result is outputted as UDVM bytecode. 112 Since the UDVM is optimized specifically for running decompression 113 algorithms, the code size of a typical algorithm is small (often sub 114 100 bytes). Moreover the UDVM approach does not add significant extra 115 processing or memory requirements compared to running a fixed pre- 116 programmed decompression algorithm. 118 2. Terminology 120 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 121 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 122 document are to be interpreted as described in [RFC-2119]. 124 Virtual machine 126 A machine architecture designed to be implemented in software 127 (although silicon implementations are of course possible). 129 Universal Decompressor Virtual Machine (UDVM) 131 The virtual machine described in this draft. The UDVM is designed 132 specifically for the task of running decompression algorithms. 134 Bytecode 136 Machine code that can be executed by a virtual machine. UDVM 137 bytecode is a combination of UDVM instructions and compressed data. 139 Application 141 Entity which invokes the UDVM. The application is also responsible 142 for supplying the compressed data to the UDVM and making use of the 143 uncompressed data. 145 Transport mechanism 147 Mechanism for passing data between two instances of an application. 148 The UDVM is designed to work in conjunction with a wide range of 149 transport mechanisms including TCP, UDP and [SCTP]. 151 Price et al. [PAGE 3] 152 Message-oriented transport mechanism 154 A transport mechanism that carries data as a set of distinct, 155 bounded messages. 157 Stream-oriented transport mechanism 159 A transport mechanism that carries data as a continuous stream 160 with no message boundaries. In this case, the UDVM reserves a 161 specific character to delimit messages in the compressed stream. 163 Compressor 165 Entity which converts application data into compressed data that 166 can be reconstructed by the UDVM. 168 Application-defined parameters 170 Parameters that must be agreed upon by the application invoking the 171 compressor and the application invoking the UDVM. Depending on the 172 application these parameters might be fixed a-priori or negotiated. 174 Per-message compression 176 Compression that does not reference data from previous messages. 177 The UDVM can decompress a message of this type using only the 178 application-defined parameters and the data in the message itself. 180 Dynamic compression 182 Compression relative to messages sent prior to the current 183 compressed message. The UDVM stores and retrieves this data using 184 the secure state reference mechanism. 186 State 188 Information which is saved by the UDVM and retrieved for the 189 decompression of subsequent messages. For security reasons, state 190 can only be saved with the permission of the application and can 191 only be retrieved using an [MD5] hash of the state. 193 State identifier 195 A 16-byte value used to access an item of stored state information. 196 (for security it is the first n bytes of an [MD5] hash of the state 197 to be accessed). The minimum acceptable value of n is fixed for 198 security purposes, but implementers can choose higher values of n. 200 CPU cycles 202 A measure of the amount of "CPU power" required to execute a UDVM 203 instruction (the simplest UDVM instructions require a single CPU 204 cycle). An upper limit is placed on the number of cycles that can 205 be used to decompress each bit in a compressed message. 207 Price et al. [PAGE 4] 208 3. Description of the UDVM architecture 210 This chapter describes the overall UDVM architecture including the 211 interfaces between the UDVM and its environment. The requirements on 212 the entities external to the UDVM are also given. 214 In the architecture the UDVM is considered to provide a decompression 215 service for a certain application. The application invokes the UDVM, 216 and is responsible for supplying compressed data to the UDVM and 217 making use of the corresponding uncompressed data. 219 In general the UDVM can offer a decompression service to a wide range 220 of applications. The principal motivation for developing the UDVM has 221 been the compression of application-layer protocols, in particular 222 text-based signaling protocols such as [SIP]. The UDVM architecture 223 is designed to operate securely and to provide a high compression 224 ratio for this case. 226 Note however that the UDVM can be used in any situation provided that 227 the requirements detailed in this chapter are satisfied by the 228 application and the transport mechanism. 230 The following sections describe the overall UDVM architecture and the 231 requirements on entities external to the UDVM such as the transport 232 mechanism and the compressor. 234 3.1. UDVM architecture 236 The UDVM architecture includes the following basic entities, each of 237 which is defined in subsequent sections of the document: 239 UDVM 240 Application (including state) 241 Transport mechanism 242 Compressor 244 Two variants of the architecture are available, depending on whether 245 the transport mechanism offers unidirectional or bidirectional data 246 transport. The unidirectional architecture can be considered to be a 247 special case of the bidirectional version. 249 Note that the UDVM itself does not need to know which architecture 250 has been chosen, because its operation is identical for both cases. 252 If bidirectional data transport is unavailable or undesirable for any 253 other reason, then the UDVM architecture is illustrated in Figure 1. 255 Price et al. [PAGE 5] 256 +--------------------+ +--------------------+ 257 | Compressor | | UDVM | 258 +--------------------+ +--------------------+ 259 ^ Appl. ^ | ^ | Appl. ^ 260 | data | | | v data | 261 +----|-------------|-+ +-|-------------|----+ 262 | | | | | | [X] | 263 | | Appl. 1 v | Compressed data | | Appl. 2 | | 264 | v +---------->----------+ v | 265 | +-------+ | | +-------+ | 266 | | State | | Data transport | | State | | 267 | +-------+ | | +-------+ | 268 | | | | 269 | | | | 270 | | | | 271 +--------------------+ +--------------------+ 273 Figure 1: UDVM architecture for unidirectional data transport 275 In the unidirectional case the UDVM has two 2-way interfaces to the 276 application. The first interface passes compressed data from the 277 application to the UDVM, and provides the corresponding uncompressed 278 data in return. The second interface allows the UDVM to request the 279 creation of state (information that may improve the compression ratio 280 of subsequent messages), and to access previously stored state. 282 Note that both of these interfaces can be provided as extensions to 283 an existing application (e.g. a SIP client) or as a "shim" layer 284 between a compression-unaware client and the UDVM. In the latter 285 case, the term "application" refers to the combined client and shim 286 layer. 288 The [X] symbol denotes that the application has a veto over the 289 corresponding interface. In this case the application has veto over 290 the state interface and can refuse state creation requests if it 291 considers them to be inappropriate. See Section 3.2.2 for further 292 details. 294 Note that although the UDVM architecture only shows one compression 295 entity, it is possible for the UDVM to decompress messages from 296 multiple compressors at different physical locations in a network. 297 The UDVM architecture is designed to prevent data from one compressor 298 interfering with data from a different compressor. A consequence of 299 this design choice is that it is difficult for a malicious user to 300 disrupt UDVM operation by inserting false compressed messages on the 301 transport mechanism. 303 If the transport mechanism exchanges data in both directions then the 304 architecture of Figure 2 can also be used. In this case, two 305 instances of the application communicate using a bidirectional 306 transport mechanism. Both instances of the application invoke a 307 compressor to compress their data, and a UDVM to retrieve the 308 uncompressed data sent by the remote application. 310 Price et al. [PAGE 6] 311 +--------------------+ +--------------------+ 312 | Compressor 1 | | UDVM 2 | 313 +--------------------+ +--------------------+ 314 ^ ^ Appl. ^ | ^ | Appl. ^ | 315 | | data | | | v data | v 316 +-|--|-------------|-+ +-|-------------|--|-+ 317 | | | | | Compressed data | | [X][X]| 318 | | | Appl. 1 v | plus acks | | Appl. 2 | | | 319 | | v +---------->----------+ v | | 320 | A +-------+ | | +-------+ A | 321 | c | State | | Data transport | | State | c | 322 | k +-------+ | | +-------+ k | 323 | s ^ +----------<----------+ ^ s | 324 | | | | | Compressed data | ^ | | | 325 |[X][X] | | plus acks | | | | | 326 +-|--|-------------|-+ +-|-------------|--|-+ 327 ^ | Appl. ^ | | | Appl. | | 328 | v data | v | v data v v 329 +--------------------+ +--------------------+ 330 | UDVM 1 | | Compressor 2 | 331 +--------------------+ +--------------------+ 333 Figure 2: UDVM architecture for bidirectional data transport 335 For a bidirectional transport mechanism an additional interface is 336 provided from the UDVM, via the application, to the compressor on the 337 reverse transport channel. This interface can be used to send 338 feedback information from an application to the remote compressor. 339 The path taken by feedback data between Application 2 and Compressor 340 1 is as follows: 342 Appl. 2 --> Compressor 2 --> UDVM 1 --> Appl. 1 --> Compressor 1 344 This feedback information monitors the behavior of the UDVM, 345 including whether data has been successfully decompressed, the amount 346 of available memory etc. The compressor can make use of this 347 information to improve the overall compression ratio. 349 Note that it is an implementation decision whether to use the 350 feedback channel or not, and compressors must operate successfully 351 even if no feedback information is received. 353 3.2. Requirements on application 355 The application is the entity responsible for invoking the UDVM, 356 supplying the UDVM with compressed data, and making use of the 357 corresponding uncompressed data. 359 Note that in order to use the UDVM decompression service an 360 application (e.g. a SIP client) will require interfaces to the UDVM. 361 These interfaces can be provided by extending the original client, or 362 by providing a "shim" layer between a compression-unaware client and 363 the UDVM. 365 Price et al. [PAGE 7] 366 In addition, two instances of an application MUST agree on how to 367 invoke the UDVM decompression service, and MUST fix or negotiate a 368 common set of application-defined parameters (e.g. 369 maximum_compressed_size) as per Section 3.5. 371 Certain application-defined parameters can be modified on the fly 372 using the state creation mechanism and the feedback mechanism. The 373 application SHOULD additionally provide some external means of 374 resetting or renegotiating these parameters (possibly by terminating 375 the decompression service offered by the UDVM). 377 The UDVM has a total of three interfaces to the environment: a two- 378 way interface for exchanging compressed and uncompressed data, a two- 379 way interface for storing and receiving state, and a one-way 380 interface for forwarding feedback data. To protect against the 381 malicious establishment of false state or false feedback data all of 382 the UDVM interfaces pass through the application, and requests for 383 state creation and feedback can be rejected if they are not 384 accompanied by a valid uncompressed message. 386 Each of the three interfaces is described in greater detail below: 388 3.2.1. Compressed and uncompressed data 390 The first interface supplies compressed data to the UDVM and 391 retrieves the corresponding uncompressed data. Note that when the 392 UDVM is invoked it does not receive any compressed data by default, 393 but instead requests new data explicitly using a specific 394 instruction. This means that the first part of a message can be 395 decompressed without waiting for the entire message to arrive, which 396 is especially useful over a stream-oriented transport such as TCP. 398 Uncompressed data is also passed to the application using a specific 399 instruction. It is an application decision whether to make use of the 400 data immediately or to buffer and wait for a complete message to be 401 successfully decompressed. 403 3.2.2. Storing and retrieving state 405 To provide security against the malicious insertion of false 406 compressed data, the contents of the UDVM memory are reinitialized 407 after each compressed message. This ensures that damaged compressed 408 messages do not prevent the successful decompression of subsequent 409 valid messages. 411 Note however that the overall compression ratio is often 412 significantly higher if messages can be compressed relative to the 413 information stored in previous messages. For this reason it is 414 possible for the UDVM to create "state" information for access when a 415 later message is being decompressed. 417 Both the creation and access of state are designed to be secure 418 against malicious tampering with the compressed data. State can only 419 be created when a complete message has been successfully 421 Price et al. [PAGE 8] 422 decompressed, and the application can veto a state creation request 423 based on the contents of the decompressed message. This is especially 424 useful if the application has an authentication mechanism that can be 425 applied to determine whether the uncompressed data is legitimate. 427 Furthermore, the UDVM can only access previously created state 428 information by providing an [MD5] hash of the state to be accessed. 429 The advantage of using a secure hash to access state information is 430 that it is very difficult to guess the correct hash value without 431 complete knowledge of the state being accessed. 433 Also note that state is not deleted when it is accessed. So even if a 434 malicious user manages to access state information, subsequent 435 messages compressed relative to this state can still be successfully 436 decompressed. Instead, the application is responsible for deleting 437 state information once it determines that the state will no longer be 438 needed. 440 3.2.3. Feedback information 442 The final interface is only used when the transport mechanism is 443 bidirectional. It provides feedback information from the UDVM to the 444 compressor on the reverse channel, and can be used to improve the 445 overall compression ratio. 447 Note that the feedback information is forwarded via the application. 448 Just as for the state interface above, the application can veto 449 feedback information if it considers the corresponding decompressed 450 message to be invalid. 452 If the transport mechanism only provides one-way data transport then 453 the feedback interface is considered to be null: any feedback 454 information sent across the interface is simply discarded by the 455 application. 457 3.3. Requirements on transport mechanism 459 The transport mechanism is the entity that passes data between two 460 instances of an application. Since the motivation for developing the 461 UDVM has been the compression of signaling protocols such as [SIP], 462 the UDVM is designed to operate successfully over both stream- 463 oriented protocols such as TCP and message-oriented protocols such as 464 UDP. 466 Note that the UDVM is not given direct access to the underlying 467 transport mechanism; instead the compressed data is considered to 468 first pass through the application. It is an application decision 469 whether to pass all data from the transport mechanism directly to the 470 UDVM or whether to mix compressed and uncompressed data (e.g. by 471 restricting compressed data to a certain port). 473 If the transport mechanism is message-oriented then the UDVM converts 474 each compressed message into a corresponding uncompressed message. It 476 Price et al. [PAGE 9] 477 is not possible for one compressed message to reconstruct multiple 478 uncompressed messages. 480 If the transport mechanism is stream-oriented then the UDVM simply 481 converts a stream of compressed data into a stream of uncompressed 482 data. However, when running over a stream-oriented transport such as 483 TCP, applications often insert their own internal message delimiters 484 into the data stream. As the message is compressed, it will not be 485 possible to detect these delimiters in the compressed data stream. 486 Therefore the UDVM provides a similar character that can be inserted 487 into the compressed data stream to delimit messages (see Section 3.4 488 for further details). 490 No assumption is made about the reliability of the transport 491 mechanism. The UDVM can operate successfully over unreliable 492 transport mechanisms such as UDP as well as reliable transport 493 mechanisms such as TCP. 495 No assumption is made about the security of the transport mechanism. 496 It may be possible for a malicious user to insert or modify data on 497 the path between the compressor and the UDVM. In this case, the 498 design goal of the UDVM is to avoid presenting additional security 499 risks compared to simply transporting the application data 500 uncompressed. 502 3.4. Requirements on compressor 504 An important feature of the UDVM is that it can decompress data 505 generated by arbitrary compression algorithms. In particular this 506 means that it is not necessary to standardize a compression algorithm 507 for use with the UDVM; instead the choice can be left to the 508 implementer. 510 The overall requirement placed on the compressor is that of 511 transparency, i.e. the compressor MUST NOT send instructions which 512 cause the UDVM to incorrectly decompress a given message. 514 The following more specific requirements are also placed on the 515 compressor (they can be considered particular instances of the 516 transparency requirement): 518 * Since feedback information is purely optional, the compressor 519 MUST be able to operate successfully even if it receives no 520 feedback data. 522 * It is RECOMMENDED that the compressor supply a CRC over the 523 uncompressed message to ensure that successful decompression has 524 occurred. A UDVM instruction is provided to verify this CRC. 526 * If the transport mechanism is message-oriented then the 527 compressor MUST preserve the boundaries between messages. 529 * If the transport mechanism is stream-oriented but the 530 application defines its own internal message boundaries, then 532 Price et al. [PAGE 10] 533 the compressor SHOULD preserve the boundaries between messages 534 by using the "end-of-message" character 0xFFFF reserved in UDVM 535 bytecode. 537 The reason for preserving the message boundaries over a stream- 538 oriented transport is that damage to one compressed message does not 539 affect the decompression of subsequent messages. Moreover, the 540 application typically vetoes state creation and feedback requests on 541 a per-message basis. 543 Note that the UDVM also reserves the character 0xFF00 over a stream- 544 oriented transport mechanism, and replaces every instance of 0xFF00 545 with 0xFF before decompressing the data. This ensures that arbitrary 546 compression algorithms can be used over a stream-oriented transport, 547 provided that every instance of 0xFF in the compressed data stream is 548 identified and replaced with 0xFF00. This "byte-stuffing" scheme 549 prevents the compression algorithm from inserting a message delimiter 550 into the data stream where one is not required. 552 3.4.1. Types of compression algorithm 554 Any of the following classes of compression algorithm may be useful 555 depending on the type of application: 557 * Generic compressor (for example [DEFLATE] or a similar 558 algorithm). 560 * Protocol-aware compressor offering excellent performance for 561 one particular type of data (for example the text messages 562 generated by [SIP]). 564 * Hybrid compressor with similar performance to [DEFLATE] for 565 generic data and superior performance for certain types of data. 567 Provided that the uncompressed data can be reconstructed at the UDVM 568 using the available memory and CPU cycles, implementers have freedom 569 to use a compression algorithm of their choice. 571 Note that when using an "off-the-shelf" compression algorithm, 572 bytecode for the corresponding decompressor will need to be made 573 available at the UDVM. In general the decompressor bytecode is placed 574 at the front of the first compressed message, unless the application 575 offers the ability to download UDVM bytecode offline (in which case 576 the UDVM memory will be initialized already containing a copy of the 577 decompression algorithm). 579 3.5. Application-defined parameters 581 When an application invokes an instance of the UDVM, a number of 582 parameters are provided by the application to control the UDVM memory 583 size, maximum number of CPU cycles etc. The application invoking the 584 UDVM and the application invoking the compressor MUST initially agree 585 on a common set of values for these parameters. 587 Price et al. [PAGE 11] 588 Note that if the transport mechanism is bidirectional then the 589 application invoking the UDVM can use the reverse channel to indicate 590 that additional memory or CPU cycles are available (compared to the 591 values initially agreed by the application invoking the compressor). 592 The compressor can then make use of these extra resources to improve 593 the compression ratio. 595 The feedback mechanism can also advertise that an upgraded version of 596 the UDVM is available (e.g. offering additional UDVM instructions), 597 provided that the upgraded version is backwards compatible with the 598 basic version described in this document. See Chapter 7 for further 599 details. 601 Each parameter is described in greater detail below; example values 602 for the parameters are listed in Appendix B. 604 UDVM_version 606 The UDVM_version parameter specifies the level of functionality 607 available at the UDVM. The basic version of the UDVM (Version 0) 608 is defined in this document. 610 maximum_compressed_size 612 The maximum_compressed_size parameter limits the size of one 613 compressed message. Decompression failure occurs if a message 614 larger than the specified value is provided. 616 maximum_uncompressed_size 618 The maximum_uncompressed_size parameter limits the size of one 619 uncompressed message. Decompression failure occurs if a message 620 larger than the specified value is provided. 622 minimum_hash_size 624 The minimum_hash_size parameter specifies the minimum size of the 625 state identifier that can be used to reference state. This value 626 needs to be sufficiently large to prevent malicious users from 627 guessing a state identifier by brute force. 629 overall_memory_size 631 The overall_memory_size parameter specifies the total number of 632 bytes in the UDVM memory. 634 working_memory_start 636 The working_memory_start parameter specifies the start of the UDVM 637 memory area that can be modified. Memory addresses below this 638 value are considered read-only by the UDVM. 640 Price et al. [PAGE 12] 641 working_memory_end 643 The working_memory_end parameter specifies the end of the UDVM 644 memory area that can be modified. Memory addresses above this 645 value are considered read-only by the UDVM. 647 cycles_per_bit 649 The cycles_per_bit parameter specifies the number of "CPU cycles" 650 that can be used to decompress a single bit of data. One CPU cycle 651 typically corresponds to a single UDVM instruction, although some 652 of the high-level instructions may require additional cycles. 654 cycles_per_message 656 The cycles_per_message parameter specifies the number of additional 657 CPU cycles made available at the start of a compressed message. 658 These cycles can be useful when decompressing algorithms that 659 download additional data on a per-message basis, for example a new 660 set of Huffman codes as with [DEFLATE]. 662 The total number of "CPU cycles" available for each compressed 663 message is specified by the following formula: 665 total_cycles = message_size * cycles_per_bit + cycles_per_message 667 first_instruction 669 The first_instruction parameter specifies the memory address of the 670 first instruction to be executed when the UDVM is initialized. 672 Initial memory contents 674 For each new compressed message the UDVM memory is reinitialized 675 with contents defined by the application. For example, the 676 application may be able to download UDVM bytecode for a 677 decompression algorithm before the first compressed message 678 arrives. In this case, for each new compressed message the UDVM 679 memory is initialized already containing a copy of the 680 decompression algorithm. 682 Initial state 684 As well as deciding the initial contents of the UDVM memory, the 685 application can also store useful information in the form of state. 686 This predefined state will typically contain optional data that can 687 be used to improve the overall compression ratio, for example a 688 well-known decompression algorithm or a dictionary of commonly used 689 [SIP] phrases. Note that unlike state created on the fly by the 690 UDVM, there is no need for the application-defined state to use an 691 [MD5] hash as the state identifier. 693 Price et al. [PAGE 13] 694 4. Overview of the UDVM 696 This chapter describes some basic features of the UDVM, including the 697 memory allocation, well-known variables and instruction parameters. 699 4.1. UDVM memory allocation 701 The memory available to the UDVM is partitioned into a number of 702 sections, providing space for program code, variables and 703 miscellaneous data: 705 <----- working_memory_size ------> 707 | Fixed values | Variables | Miscellaneous data | Program code | 708 +--------------+-----------+--------------------+--------------+ 710 <--------------------- overall_memory_size --------------------> 712 Figure 3: Memory allocation in the UDVM 714 Recall that the amount of memory available to the UDVM is defined by 715 the application-specific parameters overall_memory_size, 716 working_memory_start and working_memory_end. Note that all of these 717 parameters are initialized by the application, but can be 718 renegotiated on the fly using the feedback mechanism of Chapter 7. 720 The memory area from Address (working_memory_start) to Address 721 (working_memory_end) inclusive can be used to store arbitrary data 722 (variables, program code, Huffman codes etc.). UDVM instructions are 723 allowed to read from or write to any address in this memory area. 725 The first part of this memory area is typically used to store a 726 number of 2-byte variables. UDVM instructions can reference these 727 variables using a special instruction parameter as described in 728 Section 4.3. 730 The memory area from Address 0 to Address (working_memory_start - 1) 731 and from Address (working_memory_end + 1) to Address 732 (overall_memory_size - 1) inclusive is write-protected, so UDVM 733 instructions can read from this memory area but cannot write to it. 734 This memory area is intended for storing UDVM bytecode that can be 735 compiled. 737 Any attempt to read memory addresses beyond the overall memory size 738 or to write to memory addresses outside the working memory area MUST 739 cause a decompression failure (see Section 5.3). 741 The first part of the write-protected UDVM memory is intended for 742 storing variables whose values no longer need to be modified. The 743 second part of the write-protected memory is intended for storing 744 program code including UDVM instructions and their associated 745 parameters. Note that if an instruction references a variable that 746 has been write-protected, the compiled version of the instruction 748 Price et al. [PAGE 14] 749 will typically run faster than if the referenced variable lies in the 750 working memory area. 752 4.2. Well-known variables 754 The first few variables in the UDVM memory have special tasks, for 755 example specifying the location of the stack used by the CALL and 756 RETURN instructions. Each of these well-known variables is a 2-byte 757 integer. 759 The following list gives the name of each well-known variable and the 760 memory address at which the variable can be found: 762 Name: Starting memory address: 764 byte_copy_left 0 765 byte_copy_right 2 766 stack_location 4 768 The MSBs of each variable are always stored before the LSBs. So, for 769 example, the MSBs of stack_location are stored at Address 4 whilst 770 the LSBs are stored at Address 5. 772 The use of each well-known variable is described in the following 773 sections of the draft. 775 4.3. Instruction parameters 777 Each of the UDVM instructions is followed by 0 or more bytes 778 containing the parameters required by the instruction. 780 To reduce the code size of a typical UDVM program, each parameter for 781 a UDVM instruction is compressed using variable-length encoding. The 782 aim is to store more common parameter values using fewer bits than 783 rarely occurring values. 785 Three different types of parameter are available: the literal, the 786 reference and the multitype. The parameter types that follow each 787 UDVM instruction are specified in Chapter 6. 789 The UDVM bytecode for each parameter type is illustrated in Figure 4 790 to Figure 6, together with the integer values represented by the 791 bytecode. 793 Note that the MSBs in the bytecode are illustrated as preceding the 794 LSBs. Also, any string of bits marked with k consecutive "n"s is to 795 be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the 796 MSBs of n illustrated as preceding the LSBs). 798 The decoded integer value of the bytecode can be interpreted in two 799 ways. In some cases it is taken to be the actual value of the 800 parameter. In other cases it is taken to be a memory address at which 801 the 2-byte parameter value can be found (MSBs found at the specified 802 address, LSBs found at the following address). The latter case is 804 Price et al. [PAGE 15] 805 denoted by memory[X] where X is the address and memory[X] is the 2- 806 byte value starting at Address X. 808 The simplest parameter type is the literal (#), which encodes a 809 constant integer from 0 to 65535 inclusive. A literal parameter may 810 require between 1 and 3 bytes depending on its value. 812 Bytecode: Parameter value: Range: 814 0nnnnnnn N 0 - 127 815 10nnnnnn nnnnnnnn N 0 - 16383 816 11000000 nnnnnnnn nnnnnnnn N 0 - 65535 818 Figure 4: Bytecode for a literal (#) parameter 820 The second parameter type is the reference ($), which is always used 821 to access a 2-byte value located elsewhere in the UDVM memory. The 822 bytecode for a reference parameter is decoded to be a constant 823 integer from 0 to 65535 inclusive, which is interpreted as the memory 824 address containing the actual value of the parameter. 826 Bytecode: Parameter value: Range: 828 0nnnnnnn memory[2 * N] 0 - 254 829 10nnnnnn nnnnnnnn memory[2 * N] 0 - 32766 830 11000000 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 832 Figure 5: Bytecode for a reference ($) parameter 834 The third kind of parameter is the multitype (%), which can be used 835 to encode both actual values and memory addresses. The multitype 836 parameter also offers efficient encoding for small integer values 837 (both positive and negative) and for powers of 2. 839 Bytecode: Parameter value: Range: 841 00nnnnnn N 0 - 63 842 01nnnnnn memory[2 * N] 0 - 126 843 1000011n 2 ^ (N + 6) 64 - 128 844 10001nnn 2 ^ (N + 8) 256 - 32768 845 111nnnnn N + 65504 65504 - 65535 846 1001nnnn nnnnnnnn N + 61440 61440 - 65535 847 101nnnnn nnnnnnnn N 0 - 8191 848 110nnnnn nnnnnnnn memory[N] 0 - 8191 849 10000000 nnnnnnnn nnnnnnnn N 0 - 65535 850 10000001 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 852 Figure 6: Bytecode for a multitype (%) parameter 854 4.4. Byte copying 856 A number of UDVM instructions require a string of bytes to be copied 857 to and from areas of the UDVM memory. This section defines how the 858 byte copying operation should be performed. 860 Price et al. [PAGE 16] 861 In general, the string of bytes is copied in ascending order of 862 memory address. So if a byte is copied from/to Address n then the 863 next byte is copied from/to Address n + 1. As usual, if a byte is 864 read from an address beyond the overall memory size or is written to 865 an address outside the working memory area then decompression failure 866 occurs. 868 Note however that if a byte is copied from/to the memory address 869 specified in byte_copy_right, the byte copy operation continues by 870 copying the next byte from/to the memory address specified in 871 byte_copy_left. This is useful for setting up a "circular buffer" 872 within the UDVM memory. 874 Note that the string of bytes is copied on a purely byte-by-byte 875 basis. In particular, some of the later bytes to be copied may 876 themselves have been written into the UDVM memory by the byte copying 877 operation currently being performed. 879 Equally, it is possible for a byte copying operation to overwrite the 880 instruction that called the byte copy. If this occurs then the byte 881 copying operation MUST be completed as if the original instruction 882 were still in place in the UDVM memory (this also applies if 883 byte_copy_left or byte_copy_right are overwritten). 885 5. Decompressing a compressed message 887 This chapter lists the steps involved in the decompression of a 888 single compressed message. 890 5.1. Invoking the UDVM 892 Whenever the application receives a message to be decompressed, it 893 invokes a new instance of the UDVM. The overall_memory_size and 894 initial contents of the UDVM memory are initialized using the 895 corresponding application-defined parameters. The following steps are 896 then taken: 898 1.) The number of remaining CPU cycles is set equal to the 899 application-defined parameter cycles_per_message. 901 Notes: 903 The amount of compressed data available to the UDVM is exactly one 904 compressed message. If the transport mechanism is stream-oriented 905 then the UDVM uses the reserved byte string 0xFFFF to delimit the 906 compressed messages: the UDVM takes the data between a pair of 907 neighboring reserved byte strings to be a single compressed message. 908 The reserved byte string itself is not considered to be part of the 909 compressed message. 911 For a stream-oriented transport, the UDVM parses the compressed data 912 stream for instances of 0xFF and takes the following actions: 914 Price et al. [PAGE 17] 915 Occurs in data stream: Action: 917 0xFFFF Delimit compressed message 918 0xFF00 Replace with 0xFF 919 0xFF01 - 0xFFFE Decompression failure 921 The reserved character 0xFF00 is useful for byte stuffing (if a 922 compression algorithm generates compressed data containing the 923 character 0xFF then it should be replaced by the character 0xFF00 to 924 avoid accidentally inserting a message delimiter into the compressed 925 data stream). 927 The compressed data is not provided to the UDVM by default. Instead, 928 the UDVM requests compressed data using the INPUT instructions 929 (useful when running over a stream-oriented transport since there is 930 no need to wait for the entire compressed message before 931 decompression can begin). Note that in particular, this means that 932 the application MUST define the initial contents of the UDVM memory 933 to contain at least one INPUT instruction. See Appendix B for an 934 example of how the application might initialize the UDVM memory. 936 The application MUST NOT make more than one compressed message 937 available to a given instance of the UDVM. In particular, the 938 application MUST NOT concatenate two messages to form a single 939 compressed message. This is because compressed messages are typically 940 padded with trailing zero bits so that they are a whole number of 941 bytes long. Concatenating two messages would cause these padding bits 942 to be incorrectly interpreted as compressed data. 944 2.) Next, the instructions contained within the UDVM memory are 945 executed beginning at the address specified in first_instruction. 947 Notes: 949 The instructions are executed consecutively unless otherwise 950 indicated (for example when the UDVM encounters a JUMP instruction). 952 If the next instruction to be executed lies outside the available 953 memory then decompression failure occurs (see Section 5.3). 955 3.) Each time an instruction is executed the number of available 956 CPU cycles is decreased by the amount specified in Chapter 6. 957 Additionally, if the UDVM requests n bits of compressed data (using 958 one of the INPUT instructions) then the number of available CPU 959 cycles is increased by n * cycles_per_bit. 961 Notes: 963 This means that the total number of CPU cycles available for 964 processing a compressed message is given by the formula: 966 total_cycles = cycles_per_message + message_size * cycles_per_bit 968 Price et al. [PAGE 18] 969 The reason that this total is not allocated to the UDVM when it is 970 invoked is that the UDVM can begin to decompress a message that has 971 only been partially received. So the total message size may not be 972 known when the UDVM is initialized. 974 4.) The UDVM stops executing instructions when it encounters an 975 END-MESSAGE instruction or if decompression failure occurs. 977 Notes: 979 The UDVM passes uncompressed data to the application using the OUTPUT 980 instruction. The OUTPUT instruction can be used to output a partially 981 decompressed message; it is an application decision whether to use 982 the data immediately or whether to buffer and wait until the entire 983 message has been decompressed. 985 The UDVM passes state creation and feedback requests to the 986 application using the END-MESSAGE instruction. This means that it is 987 only possible to make a state creation and a feedback request once 988 the message has been decompressed, which is necessary since the 989 application typically checks the validity of these requests based on 990 the contents of the decompressed message. 992 5.2. Successful decompression 994 The END-MESSAGE instruction indicates that the compressed message has 995 been successfully decompressed and passed to the application. Note 996 that the actual uncompressed message is outputted beforehand using 997 the OUTPUT instruction; this allows the UDVM to output each part of 998 the message to the application as soon as it has been decompressed. 1000 The END-MESSAGE instruction provides two additional pieces of 1001 information to the application: the state creation request and the 1002 feedback data. The state creation request mechanism is discussed 1003 below; feedback information is discussed separately in Chapter 7. 1005 The UDVM may optionally save part of its memory for retrieval by 1006 later messages. However to prevent malicious storage of a large 1007 amount of unnecessary state information, the application MUST give 1008 permission before any state can be created. The application typically 1009 makes a decision on whether state can be created based on the 1010 contents of the decompressed message, particularly if the message 1011 contains authentication data that can verify whether or not the 1012 sender is legitimate. 1014 The END-MESSAGE instruction requests the creation of state using the 1015 parameters state start and state length, which together denote a byte 1016 string state_value. Provided that the application gives permission, 1017 state_value is byte copied from the UDVM memory (obeying the rules of 1018 Section 4.4) and stored together with a 16-byte state identifier that 1019 can be used to access the state by a later compressed message. 1021 To provide security against malicious access, the identifier for any 1022 item of state created by the UDVM is derived from the [MD5] hash of 1024 Price et al. [PAGE 19] 1025 the state_value to be stored. The state identifier is constructed by 1026 taking the 16-byte [MD5] hash and replacing all but the first 1027 hash_length most significant bytes with zeroes. Note that if 1028 hash_length is 16 then the unmodified [MD5] hash is the state 1029 identifier. Decompression failure occurs if hash_length is less than 1030 the application-defined parameter minimum_hash_size or greater than 1031 16. 1033 Each item of state stores the following information (accessed by the 1034 state_identifier): 1036 state_identifier 1037 state start 1038 state length 1039 state_value 1040 state_instruction 1042 Note that state_start, state_length and state_instruction are all 1043 parameters from the END-MESSAGE instruction, whereas state_identifier 1044 and state_value are created as specified above. 1046 If a state creation request is made with a state identifier that has 1047 been used by existing state, then the request fails automatically. 1049 This state can subsequently be accessed by using the STATE-REFERENCE 1050 and STATE-EXECUTE instructions (by providing the correct state 1051 identifier). 1053 5.3. Decompression failure 1055 If a compressed message given to the UDVM is corrupted (either 1056 accidentally or maliciously) then the UDVM may terminate with a 1057 decompression failure. 1059 Reasons for decompression failure include the following: 1061 * A compressed or uncompressed message exceeds the maximum size 1062 defined by the application. 1064 * The UDVM exceeds the available CPU cycles for decompressing a 1065 message. 1067 * The UDVM attempts to read a memory address beyond the overall 1068 memory size, or to write into a memory address outside the 1069 working memory area. 1071 * An unknown instruction type is encountered. 1073 * An unknown parameter type is encountered. 1075 * An instruction is encountered that cannot be processed 1076 successfully by the UDVM (for example a RETURN instruction when 1077 no CALL instruction has previously been encountered). 1079 Price et al. [PAGE 20] 1080 * The UDVM attempts to access non-existent state. 1082 * A manual decompression failure is triggered using the 1083 DECOMPRESSION-FAILURE instruction. 1085 If a decompression failure occurs when decompressing a message then 1086 the UDVM informs the application and takes no further action. It is 1087 the responsibility of the application to decide how to cope with the 1088 decompression failure. In general an application SHOULD discard the 1089 compressed message and any decompressed data that has been outputted. 1091 6. UDVM instruction set 1093 The UDVM currently understands 28 instructions, chosen to support the 1094 widest possible range of compression algorithms with the minimum 1095 possible overhead. 1097 Figure 7 lists the different instructions and the bytecode values 1098 used to store the instructions at the UDVM. The cost of each 1099 instruction in CPU cycles is also given: 1101 Instruction: Bytecode value: Cost in CPU cycles: 1103 DECOMPRESSION-FAILURE 0 1 1104 AND 1 1 1105 OR 2 1 1106 NOT 3 1 1107 ADD 4 1 1108 SUBTRACT 5 1 1109 MULTIPLY 6 1 1110 DIVIDE 7 1 1111 LOAD 8 1 1112 MULTILOAD 9 1 + n 1113 WORKING-MEMORY 10 1 1114 COPY 11 1 + length 1115 COPY-LITERAL 12 1 + length 1116 COPY-OFFSET 13 1 + length + offset 1117 JUMP 14 1 1118 COMPARE 15 1 1119 CALL 16 1 1120 RETURN 17 1 1121 SWITCH 18 1 + n 1122 CRC 19 1 + length 1123 END-MESSAGE 20 1 + state length 1124 OUTPUT 21 1 + output_length 1125 NBO 22 1 1126 INPUT-BYTECODE 23 1 + length 1127 INPUT-FIXED 24 1 1128 INPUT-HUFFMAN 25 1 + n 1129 STATE-REFERENCE 26 1 + state_length 1130 STATE-EXECUTE 27 1 + state length 1132 Figure 7: UDVM instructions and corresponding bytecode values 1134 Price et al. [PAGE 21] 1135 Each UDVM instruction costs a minimum of 1 CPU cycle. Certain high- 1136 level instructions may cost additional cycles depending on the value 1137 of one of the instruction parameters. 1139 The only exception when calculating the number of CPU cycles is that 1140 the STATE-EXECUTE instruction takes (1 + state_length) cycles even 1141 though it does not have a state_length parameter; instead the value 1142 of state length is provided by the application as part of the state 1143 being accessed. 1145 All instructions are stored as a single byte to indicate the 1146 instruction type, followed by 0 or more bytes containing the 1147 parameters required by the instruction. The instruction specifies 1148 which of the three parameter types of Section 4.3 is used in each 1149 case. For example, the ADD instruction is followed by two parameters 1150 as shown below: 1152 ADD ($parameter_1, %parameter_2) 1154 When converted into bytecode the number of bytes required by the ADD 1155 instruction depends on the size of each parameter value, and whether 1156 the second (multitype) parameter contains the parameter value itself 1157 or a memory address where the actual value of the parameter can be 1158 found. 1160 The instruction set available for the UDVM offers a mix of low-level 1161 and high-level instructions. The high-level instructions can all be 1162 emulated using the low-level instructions provided, but given a 1163 choice it is generally preferable to use a single instruction rather 1164 than a large number of general-purpose instructions. The resulting 1165 bytecode will be more compact (leading to a higher overall 1166 compression ratio) and decompression will typically be faster because 1167 the implementation of the compression-specific instructions can be 1168 optimized for the UDVM. 1170 Each instruction is explained in more detail below: 1172 6.1. Bit manipulation instructions 1174 The AND, OR and NOT instructions provide simple bit manipulation on 1175 2-byte words. 1177 AND ($parameter_1, %parameter_2) 1178 OR ($parameter_1, %parameter_2) 1179 NOT ($parameter_1) 1181 After the operation is complete, the value of the first parameter is 1182 overwritten with the result. Note that since this parameter is a 1183 reference, the memory address specified by the parameter is always 1184 overwritten and not the parameter itself. 1186 Price et al. [PAGE 22] 1187 6.2. Arithmetic instructions 1189 The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform 1190 arithmetic on 2-byte words. 1192 ADD ($parameter_1, %parameter_2) 1193 SUBTRACT ($parameter_1, %parameter_2) 1194 MULTIPLY ($parameter_1, %parameter_2) 1195 DIVIDE ($parameter_1, %parameter_2) 1197 After the operation is complete, the first parameter is overwritten 1198 with the result. 1200 Note that in all cases the arithmetic operation is performed modulo 1201 2^16. So for example, subtracting 1 from 0 gives the result 65535. 1203 For the SUBTRACT instruction the second parameter is subtracted from 1204 the first. Similarly, for the DIVIDE instruction the first parameter 1205 is divided by the second parameter. Note that if the second parameter 1206 does not divide exactly into the first parameter then the remainder 1207 is ignored. 1209 6.3. Memory management instructions 1211 The following instructions are used to manipulate the UDVM memory. 1212 Bytes can be copied from one area of memory to another, and areas of 1213 memory can be write-protected to make it easier for UDVM code to be 1214 compiled. 1216 6.3.1. LOAD 1218 The LOAD instruction sets a 2-byte variable to a certain specified 1219 value. The format of a LOAD instruction is as follows: 1221 LOAD (%address, %value) 1223 The first parameter specifies the starting address of the 2-byte 1224 variable, whilst the second parameter specifies the value to be 1225 loaded into this variable. As usual, MSBs are stored before LSBs in 1226 the UDVM memory. 1228 6.3.2. MULTILOAD 1230 The MULTILOAD instruction sets a contiguous block of 2-byte variables 1231 to specified values. 1233 MULTILOAD (%address, #n, %value_0, ..., %value_n-1) 1235 The first parameter specifies the starting address of the contiguous 1236 variables, whilst the parameters value_0 through to value_n-1 specify 1237 the values to load into these variables (in the same order as they 1238 appear in the instruction). 1240 Price et al. [PAGE 23] 1241 6.3.3. WORKING-MEMORY 1243 The WORKING-MEMORY instruction is used to prevent part of the UDVM 1244 memory from being modified. This can be very useful when offering 1245 UDVM code for compilation. 1246 WORKING-MEMORY (%memory_start, %memory_end) 1248 The parameters memory_start and memory_end specify the new working 1249 memory area for the UDVM. These parameters replace the application- 1250 defined parameters working_memory_start and working_memory_end, but 1251 only while the current message is being decompressed. When a new 1252 instance of the UDVM is invoked the working memory area is set by the 1253 original application-defined parameters. 1255 If memory_end < memory_start, or if the parameters reference a memory 1256 address beyond the overall UDVM memory size, then decompression 1257 failure occurs. 1259 After the WORKING-MEMORY instruction has been encountered, the only 1260 way to write into UDVM memory within the protected region is to 1261 cancel the protection using another WORKING-MEMORY instruction (or to 1262 invoke a new instance of the UDVM). 1264 6.3.4. COPY 1266 The COPY instruction is used to copy a string of bytes from one part 1267 of the UDVM memory to another. 1269 COPY (%position, %length, %destination) 1271 The position parameter specifies the memory address of the first byte 1272 in the string to be copied, and the length parameter specifies the 1273 number of bytes to be copied. 1275 The destination parameter gives the address to which the first byte 1276 in the string will be copied. 1278 Note that byte copying is performed as per the rules of Section 4.4. 1280 6.3.5. COPY-LITERAL 1282 A modified version of the COPY instruction is given below: 1284 COPY-LITERAL (%position, %length, $destination) 1286 The COPY-LITERAL instruction behaves as a COPY instruction except 1287 that after copying, the destination parameter is replaced with the 1288 memory address immediately following the address to which the final 1289 byte was copied. If the final byte was copied to the memory address 1290 specified in byte_copy_right, the destination parameter is set to the 1291 memory address specified in byte_copy_left. 1293 Price et al. [PAGE 24] 1294 6.3.6. COPY-OFFSET 1296 A further version of the COPY-LITERAL instruction is given below: 1298 COPY-OFFSET (%offset, %length, $destination) 1300 The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction 1301 except that an offset parameter is given instead of a position 1302 parameter. 1304 To derive a suitable position parameter, starting at the memory 1305 address specified by destination, the UDVM counts backwards a total 1306 of offset memory addresses. If the memory address specified in 1307 byte_copy_left is reached, the next memory address is taken to be 1308 byte_copy_right. 1310 The COPY-OFFSET instruction then behaves as a COPY-LITERAL 1311 instruction, taking the position parameter to be the last memory 1312 address reached in the above step. 1314 6.4. Program flow instructions 1316 The following instructions alter the flow of UDVM code. Each 1317 instruction jumps to one of a number of memory addresses based on a 1318 certain specified criterion. Note that all of the instructions give 1319 the memory addresses in the form of deltas relative to the memory 1320 address of the instruction. The actual memory address is calculated 1321 as follows: 1323 memory_address = (memory_address_of_instruction + delta) modulo 2^16 1325 Note that certain I/O instructions (see Section 6.5) can also alter 1326 program flow. 1328 6.4.1. JUMP 1330 The JUMP instruction moves program execution to the specified memory 1331 address. 1333 JUMP (%delta) 1335 Note that if the address (specified as a delta from the address of 1336 the JUMP instruction) lies beyond the overall UDVM memory size then 1337 decompression failure occurs. 1339 6.4.2. COMPARE 1341 The COMPARE instruction compares two parameters and then jumps to one 1342 of three specified memory addresses depending on the result. 1344 COMPARE (%parameter_1, %parameter_2, %delta_1, %delta_2, %delta_3) 1346 If parameter_1 < parameter_2 then the UDVM continues instruction 1347 execution at the (relative) memory address specified by delta 1. If 1349 Price et al. [PAGE 25] 1350 parameter_1 = parameter_2 then it jumps to the address specified by 1351 delta_2. If parameter_1 > parameter_2 then it jumps to the address 1352 specified by delta_3. 1354 6.4.3. CALL and RETURN 1356 The CALL and RETURN instructions provide support for compression 1357 algorithms with a nested structure. 1359 CALL (%delta) 1361 RETURN 1363 The CALL and RETURN instructions make use of a stack of 2-byte 1364 variables stored at the memory address specified by the well-known 1365 variable stack_location. The stack contains the following variables: 1367 Name: Starting memory address: 1369 stack_free stack_location 1370 stack[0] stack_location + 2 1371 stack[1] stack_location + 4 1372 stack[2] stack_location + 6 1373 : : 1375 The MSBs of these variables are stored before the LSBs in the UDVM 1376 memory. 1378 When the UDVM reaches a CALL instruction, it finds the memory address 1379 of the instruction immediately following the CALL instruction and 1380 copies this 2-byte value into stack[stack_free] ready for later 1381 retrieval. It then increases stack_free by 1 and continues 1382 instruction execution at the (relative) memory address specified by 1383 the parameter. 1385 When the UDVM reaches a RETURN instruction it decreases stack_free by 1386 1, and then continues instruction execution at the byte position 1387 stored in stack[stack_free]. 1389 If the variable stack_free is ever increased beyond 65535 or 1390 decreased below 0 then a bad compressed message has been received and 1391 decompression failure occurs (see Section 5.3). 1393 Decompression failure also occurs if one of the above instructions is 1394 encountered and the value of stack_location is smaller than 6 (this 1395 prevents the stack from overwriting the well-known variables). 1397 6.4.4. SWITCH 1399 The SWITCH instruction performs a conditional jump based on the value 1400 of one of its parameters. 1402 SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) 1404 Price et al. [PAGE 26] 1405 When a SWITCH instruction is encountered the UDVM reads the value of 1406 j. It then continues instruction execution at the (relative) address 1407 specified by delta j. 1409 If j specifies a value of n or more, a bad compressed message has 1410 been received and decompression failure occurs. 1412 6.4.5. CRC 1414 The CRC instruction verifies a string of bytes using a 2-byte CRC. 1416 CRC (%value, %position, %length, %delta) 1418 The actual CRC calculation is performed using the generator 1419 polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte 1420 Frame Check Sequence (FCS) of [RFC-1662]. 1422 The position and length parameters define the string of bytes over 1423 which the CRC is evaluated. Byte copying rules are enforced as per 1424 Section 4.4. 1426 Important note: Since a CRC calculation is always performed over a 1427 bitstream, for interoperability it is necessary to define the order 1428 in which bits are supplied within each individual byte. In this case 1429 the MSBs of the byte MUST be supplied to the CRC calculation before 1430 the LSBs. 1432 The value parameter contains the expected integer value of the 2-byte 1433 CRC. If the calculated CRC matches the expected value then the UDVM 1434 continues at the following instruction. Otherwise the UDVM jumps to 1435 the (relative) memory address specified by delta. 1437 6.5. I/O instructions 1439 The following instructions allow the UDVM to interface with its 1440 environment. Note that in the current UDVM architecture all of the 1441 interfaces pass through the application (which has a veto over any 1442 information supplied to or from the UDVM). 1444 6.5.1. END-MESSAGE 1446 The END-MESSAGE instruction successfully terminates the UDVM and 1447 passes feedback and state information to the application. 1449 END-MESSAGE (%hash_length, %state_start, %state_length, 1450 %state_instruction, %feedback_location) 1452 The actions taken by the UDVM upon encountering the END-MESSAGE 1453 instruction are described in Section 5.2. 1455 6.5.2. DECOMPRESSION-FAILURE 1457 The DECOMPRESSION-FAILURE instruction triggers a manual decompression 1458 failure. This is useful if the UDVM program discovers that it cannot 1460 Price et al. [PAGE 27] 1461 successfully decompress the message (e.g. by using the CRC 1462 instruction). 1464 This instruction has no parameters. 1466 6.5.3. OUTPUT 1468 The OUTPUT instruction provides successfully decompressed data to the 1469 application. 1471 OUTPUT (%output_start, %output_length) 1473 The parameters define the starting memory address and length of the 1474 byte string to be provided to the application. Note that the OUTPUT 1475 instruction can be used to output a partially decompressed message; 1476 each time the instruction is encountered it appends a byte string to 1477 the end of the data previously passed to the application via the 1478 OUTPUT instruction. 1480 The string of data is byte copied from the UDVM memory obeying the 1481 rules of Section 4.4. 1483 Decompression failure occurs if the cumulative number of bytes 1484 provided to the application exceeds the application-defined parameter 1485 maximum_uncompressed_size. 1487 Since there is technically a difference between outputting a 0-byte 1488 decompressed message, and not outputting a decompressed message at 1489 all, the OUTPUT instruction needs to distinguish between the two 1490 cases. Thus, if the UDVM terminates before encountering an OUTPUT 1491 instruction it is considered not to have outputted a decompressed 1492 message. If it encounters one or more OUTPUT instructions, each of 1493 which provides 0 bytes of data to the application, then it is 1494 considered to have outputted a 0-byte decompressed message. 1496 6.5.4. NBO 1498 The NBO instruction modifies the order in which compressed bits are 1499 passed to the UDVM. 1501 As the INPUT-FIXED and INPUT-HUFFMAN instructions read individual 1502 bits from within a byte, to avoid ambiguity it is necessary to define 1503 the order in which these bits are read. The default operation is to 1504 read the MSBs before the LSBs, but if the NBO instruction is 1505 encountered then the LSBs are read before the MSBs. Both cases are 1506 illustrated below: 1508 MSB LSB MSB LSB MSB LSB MSB LSB 1510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1511 |0 1 2 3 4 5 6 7|8 9 ... | |7 6 5 4 3 2 1 0| ... 9 8| 1512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1514 Byte 0 Byte 1 Byte 0 Byte 1 1516 Price et al. [PAGE 28] 1517 Default operation After NBO instruction 1519 The NBO instruction can only be used before bitwise compressed data 1520 is passed to the UDVM. Therefore, a decompression failure occurs if 1521 it is encountered after an INPUT-FIXED or an INPUT-HUFFMAN 1522 instruction has been used. 1524 6.5.5. INPUT-BYTECODE 1526 The INPUT-BYTECODE instruction requests a certain number of bytes of 1527 compressed data from the application. 1529 INPUT-BYTECODE (%length, %destination, %delta) 1531 The length parameter indicates the requested number of bytes of 1532 compressed data, and the destination parameter specifies the starting 1533 memory address to which they should be copied. Byte copying is 1534 performed as per the rules of Section 4.4. 1536 If the instruction requests data that lies beyond the end of the 1537 compressed message, no data is returned. Instead the UDVM moves 1538 program execution to the memory address specified by the formula 1539 (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16. 1541 The INPUT-BYTECODE instruction can only be used before bitwise 1542 compressed data is passed to the UDVM. Therefore, a decompression 1543 failure occurs if it is encountered after an INPUT-FIXED or an INPUT- 1544 HUFFMAN instruction has been used. 1546 6.5.6. INPUT-FIXED 1548 The INPUT-FIXED instruction requests a certain number of bits of 1549 compressed data from the application. 1551 INPUT-FIXED (%length, %destination, %delta) 1553 The length parameter indicates the requested number of bits. If this 1554 parameter does not lie between 1 and 16 inclusive then a 1555 decompression failure occurs. 1557 The destination parameter specifies the memory address to which the 1558 compressed data should be copied. Note that the requested bits are 1559 interpreted as a 2-byte integer ranging from 0 to 2^length - 1. Under 1560 default operation the MSBs of this integer are provided first, but if 1561 an NBO instruction has been executed then the LSBs are provided 1562 first. 1564 If the instruction requests data that lies beyond the end of the 1565 compressed message, no data is returned. Instead the UDVM moves 1566 program execution to the memory address specified by the formula 1567 (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 1569 Price et al. [PAGE 29] 1570 6.5.7. INPUT-HUFFMAN 1572 The INPUT-HUFFMAN instruction requests a variable number of bits of 1573 compressed data from the application. The instruction initially 1574 requests a small number of bits and compares the result against a 1575 certain criterion; if the criterion is not met then additional bits 1576 are requested until the criterion is achieved. 1578 The INPUT-HUFFMAN instruction is followed by three mandatory 1579 parameters plus n additional sets of parameters. Every additional set 1580 contains four parameters as shown below: 1582 INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, 1583 %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, 1584 %upper_bound_n, %uncompressed_n) 1586 Note that if n = 0 then the INPUT-HUFFMAN instruction is ignored by 1587 the UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then 1588 decompression failure occurs. 1590 In all other cases, the behavior of the INPUT-HUFFMAN instruction is 1591 defined below: 1593 1.) Set j = 1. 1595 2.) Request an additional bits_j compressed bits. Interpret the 1596 total (bits_1 + ... + bits_j) bits of compressed data requested so 1597 far as an integer H, with the first bit to be supplied as the MSB and 1598 the last bit to be supplied as the LSB (note that this is always the 1599 case, independently of whether the NBO instruction has been used). 1601 3.) If data is requested that lies beyond the end of the compressed 1602 message, terminate the INPUT-HUFFMAN instruction and move program 1603 execution to the memory address specified by the formula 1604 (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 1606 4.) If (H < lower_bound_j) or (H > upper_bound_j) then set j = j + 1607 1. Then go back to Step 2, unless j > n in which case decompression 1608 failure occurs. 1610 5.) Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the 1611 memory address specified by the destination parameter. 1613 6.5.8. STATE-REFERENCE 1615 The STATE-REFERENCE instruction retrieves some previously stored 1616 state information. 1618 STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, 1619 %state_destination) 1621 The id_start and id_length parameters specify the location of the 1622 state identifier used to retrieve the state information. The state 1623 identifier is always 16 bytes long; if id_length is less than 16 then 1625 Price et al. [PAGE 30] 1626 the remaining least significant bytes of the identifier are padded 1627 with zeroes. 1629 Decompression failure occurs if id_length is greater than 16. 1630 Decompression failure also occurs if no state information matching 1631 the state identifier can be found. 1633 Note that when accessing state information that has been previously 1634 created by the UDVM, the state identifier is always taken from an 1635 [MD5] hash of the state to be retrieved. However this is not 1636 necessarily the case for application-defined state as per Section 1637 3.5. 1639 The state_start and state_length parameters define the starting byte 1640 and number of bytes to copy from the state_value contained in the 1641 identified item of state. If more state is requested than is actually 1642 available then decompression failure occurs. 1644 The state_destination parameter contains a UDVM memory address. The 1645 requested state is byte copied to this memory address using the rules 1646 of Section 4.4. 1648 6.5.9. STATE-EXECUTE 1650 The STATE-EXECUTE instruction retrieves and runs some previously 1651 stored state information. 1653 STATE-EXECUTE (%id_start, %id_length) 1655 The id_start and id_length parameters function as per the STATE- 1656 REFERENCE instruction. 1658 STATE-EXECUTE is similar to STATE-REQUEST except that it does not 1659 require the amount of state being requested or the proposed 1660 destination for the state to be specified explicitly. Instead, it 1661 simply puts the state back into the UDVM memory using the original 1662 parameters from the END-MESSAGE instruction that created the state. 1664 The entire state_value (all state length bytes of it) is byte copied 1665 into the memory address specified by state start The UDVM then jumps 1666 to the (absolute) memory address specified by state_instruction. 1668 Note that state start, state length and state_instruction are all 1669 stored together with state_value as part of an item of state 1670 information. 1672 7. Feedback information 1674 If the transport mechanism offers bidirectional data transport then 1675 the compression ratio can be improved by sending feedback 1676 information. Since feedback data is optional, compressors must be 1677 able to function correctly even if no feedback information is 1678 provided. 1680 Price et al. [PAGE 31] 1681 In the bidirectional UDVM architecture, suppose that Application 2 1682 wishes to send feedback information to Compressor 1. The path taken 1683 by the feedback information is as follows: 1685 Appl. 2 --> Compressor 2 --> UDVM 1 --> Appl. 1 --> Compressor 1 1687 The first hop along the path is between Application 2 and Compressor 1688 2. If permitted by the application, Compressor 2 MAY be supplied with 1689 some or all of the following items of data: 1691 overall_memory_size 1692 cycles_per_bit 1693 cycles_per_message 1694 id lengths and id values of successfully established state 1696 Since the design of each compressor is left as an implementation 1697 decision, there is no need to standardize the format in which this 1698 data is provided to Compressor 2. 1700 The second hop along the path is between Compressor 2 and UDVM 1. For 1701 this step Compressor 2 transmits the feedback information to UDVM 1 1702 across the same transport mechanism used to carry compressed data. 1703 Typically this feedback information is piggybacked onto existing 1704 compressed messages (standalone feedback messages are generally 1705 vetoed by the application due to the lack of a corresponding 1706 decompressed message). 1708 Note that Compressor 2 can send the feedback information compressed 1709 in order to reduce the total number of bits transmitted. Equally, 1710 Compressor 2 may opt not to send feedback information at all. 1712 If Compressor 2 chooses not to send feedback information then it sets 1713 the feedback_location parameter in the END-MESSAGE instruction to 0. 1714 Otherwise, it copies the following block of data to the memory of 1715 UDVM 1 and places the starting memory address of this block in the 1716 feedback_location parameter: 1718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1719 | UDVM_version | 1720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1721 | overall_memory_size | 1722 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1723 | cycles_per_bit | 1724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1725 | cycles_per_message | 1726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1727 |S| n | 1728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1729 | id_length 1 | 1730 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1731 | | 1732 : id_value_1 : 1733 | | 1735 Price et al. [PAGE 32] 1736 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1737 | id_length 2 | 1738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1739 | | 1740 : id_value_2 : 1741 | | 1742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1743 : : 1744 : : 1745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1746 | id_length n | 1747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1748 | | 1749 : id_value_n : 1750 | | 1751 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1753 Each of the items of data is explained in greater detail below: 1755 7.1. UDVM version 1757 The first 2 bytes of feedback data specify whether only the basic 1758 version of the UDVM is available, or whether an upgraded version of 1759 the UDVM is available offering additional instructions, feedback data 1760 etc. 1762 The basic version of the UDVM is Version 0, which is the version 1763 described in this document. Upgraded versions MUST be backwards- 1764 compatible with the basic version in the following sense: 1766 * If some UDVM bytecode reaches the END-MESSAGE or DECOMPRESSION- 1767 FAILURE instructions when running on Version 0 of the UDVM, then 1768 the upgraded version MUST run the bytecode in an identical 1769 manner. 1771 This condition ensures that all bytecode that is valid for Version 0 1772 of the UDVM will continue to be valid for upgraded versions of the 1773 UDVM. However, bytecode that is invalid on Version 0 of the UDVM 1774 (i.e. bytecode that produces a decompression failure that is not 1775 manually triggered) may become valid on upgraded versions. 1777 Examples of how to upgrade the UDVM in a backwards-compatible manner 1778 include: adding new UDVM instructions, adding more items of feedback 1779 data etc. 1781 7.2. Memory size and CPU cycles 1783 The next 6 bytes of feedback data specify new values for the 1784 application-defined parameters overall_memory_size, cycles_per_bit 1785 and cycles_per_message. This allows Application 2 to inform 1786 Compressor 1 that it has additional memory or processing power 1787 available that could be used to improve the overall compression 1788 ratio. 1790 Price et al. [PAGE 33] 1791 Note that the feedback data can only be used to increase the amount 1792 of resources available for Compressor 1 to use. If the feedback data 1793 specifies a parameter value that is smaller than the value already 1794 possessed by Compressor 1, the parameter keeps its original value 1795 (i.e. the feedback data for this parameter is simply ignored). 1797 The reason for this behavior is that if UDVM 2 is initialized with 1798 more memory than expected by Compressor 1 then no problem arises, but 1799 if UDVM 2 is initialized with less memory that expected by Compressor 1800 1 then decompression failure may occur. Therefore, only allowing the 1801 parameter values to increase means that the feedback mechanism is 1802 robust against message loss or reordering on the feedback channel. 1804 The parameters can only be restored to their original values if reset 1805 or renegotiated by the application. 1807 7.3. State identifiers 1809 The variable n specifies the number of state identifiers to be 1810 acknowledged. 1812 Each state identifier is usually the first few bytes from an [MD5] 1813 hash of the state being acknowledged. When a state identifier is 1814 placed in the feedback information of UDVM 1, it is known by 1815 Compressor 1 that the corresponding state has been successfully 1816 established and can be referenced in future by using a STATE- 1817 REFERENCE or a STATE-EXECUTE instruction. The feedback information 1818 includes the length and value of each hash to be acknowledged. 1820 Note that the MSB of n has a special meaning; if set to 1 then it 1821 acknowledges the state that is currently being created by UDVM_1 via 1822 the END-MESSAGE instruction. This saves having to transmit the 1823 id_length and id_value explicitly on the feedback channel. 1825 8. Security considerations 1827 The following chapter identifies the potential security risks 1828 associated with the overall UDVM architecture, and details the 1829 proposed solution for each risk. 1831 ** Avoid snooping into state of other users 1833 State can only be accessed using a state identifier, which is a 1834 (prefix of a) cryptographic hash of the state being referenced. This 1835 implies that the referencing packet already needs knowledge about the 1836 state. To enforce this, a minimum reference length of 48 bits is 1837 RECOMMENDED for applications running over an unsecure transport 1838 mechanism. This also minimises the probability of an accidental state 1839 collision. 1841 Generally, ways to obtain knowledge about the state identifier (e.g., 1842 passive attacks) will also easily provide knowledge about the state 1843 referenced, so no new vulnerability results. 1845 Price et al. [PAGE 34] 1846 The application needs to handle state identifiers with the same care 1847 it would handle the state itself. 1849 ** Avoid DoS vulnerabilities 1851 *** Use of the UDVM as a tool in a DoS attack to another target 1853 The UDVM cannot easily be used as an amplifier in a reflection 1854 attack, as it only generates one decompressed message per incoming 1855 compressed message. This packet is then handed to the application; 1856 the utility as a reflection amplifier is therefore limited by the 1857 utility of the application. 1859 However, it must be noted that the UDVM can be used to generate 1860 larger packets as input to the application than have to be sent from 1861 the malicious sender; this therefore can send smaller packets (at a 1862 lower bandwidth) than are delivered to the application. Depending on 1863 the reflection characteristics of the application, this can be 1864 considered a mild form of amplification. The application MUST limit 1865 the number of packets reflected to a potential target - even if the 1866 UDVM is used to generate a large amount of information from a small 1867 incoming attack packet. 1869 *** Attacking the UDVM as the DoS target by filling it with state 1871 Excessive state can only be installed by a malicious sender (or a set 1872 of malicious senders) with the consent of the application. The system 1873 consisting of UDVM and application is thus approximately as 1874 vulnerable as the application itself, unless it allows the 1875 installation of state from a message where it would not have 1876 installed state itself. 1878 If this is desirable to increase the compression ratio, the effect 1879 can be mitigated by adding feedback at the application level that 1880 indicates whether the state was actually installed - this allows a 1881 system under attack to gracefully degrade by no longer installing 1882 compressor state that is not matched by application state. 1884 *** Attacking the UDVM by faking state or making unauthorized changes 1885 to state 1887 State cannot be destroyed or changed by a malicious sender - it can 1888 only add new state. 1890 *** Attacking the UDVM by sending it looping code 1892 The application sets an upper limit to the number of "CPU cycles" 1893 that can be used per compressed message and per input bit in the 1894 compressed message. The damage inflicted by sending packets with 1895 looping code is therefore limited, although this may still be 1896 substantial if a large number of CPU cycles are offered by the UDVM. 1897 However, this would be true for any decompressor that can receive 1898 packets from anywhere. 1900 Price et al. [PAGE 35] 1901 9. Acknowledgements 1903 Individual compression algorithms such as [DEFLATE] have been 1904 important sources of ideas and knowledge. 1906 Thanks to 1908 Abigail Surtees (abigail.surtees@roke.co.uk) 1909 Mark A West (mark.a.west@roke.co.uk) 1910 Lawrence Conroy (lwc@roke.co.uk) 1911 Christian Schmidt (christian.schmidt@icn.siemens.de) 1912 Max Riegel (maximilian.riegel@icn.siemens.de) 1913 Jan Christoffersson (jan.christoffersson@epl.ericsson.se) 1914 Stefan Forsgren (stefan.forsgren@epl.ericsson.se) 1915 Krister Svanbro (krister.svanbro@epl.ericsson.se) 1916 Christopher Clanton (christopher.clanton@nokia.com) 1917 Khiem Le (khiem.le@nokia.com) 1918 Ka Cheong Leung (kacheong.leung@nokia.com) 1920 for valuable input and review. 1922 10. References 1924 [DEFLATE] "DEFLATE Compressed Data Format Specification version 1925 1.3", P. Deutsch, RFC 1951, Internet Engineering Task 1926 Force, May 1996 1928 [SCTP] "Stream Control Transmission Protocol", Stewart et al, 1929 RFC 2960, Internet Engineering Task Force, October 2000 1931 [SIP] "SIP: Session Initiation Protocol", Handley et al, 1932 RFC 2543, Internet Engineering Task Force, March 1999 1934 [MD5] "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321, 1935 Internet Engineering Task Force, April 1992 1937 [RFC-1662] "PPP in HDLC-like Framing", Simpson et al, Internet 1938 Engineering Task Force, July 1994 1940 [RFC-2026] "The Internet Standards Process - Revision 3", Scott 1941 Bradner, Internet Engineering Task Force, October 1996 1943 [RFC-2119] "Key words for use in RFCs to Indicate Requirement 1944 Levels", Scott Bradner, Internet Engineering Task Force, 1945 March 1997 1947 11. Authors' addresses 1949 Richard Price Tel: +44 1794 833681 1950 Email: richard.price@roke.co.uk 1952 Roke Manor Research Ltd 1953 Romsey, Hants, SO51 0ZN 1955 Price et al. [PAGE 36] 1956 United Kingdom 1958 Jonathan Rosenberg 1959 Email: jdrosen@dynamicsoft.com 1961 dynamicsoft 1962 72 Eagle Rock Avenue 1963 First Floor 1964 East Hanover, NJ 07936 1966 Carsten Bormann Tel: +49 421 218 7024 1967 Email: cabo@tzi.org 1969 Universitaet Bremen TZI 1970 Postfach 330440 1971 D-28334 Bremen, Germany 1973 Hans Hannu Tel: +46 920 20 21 84 1974 Email: hans.hannu@epl.ericsson.se 1976 Box 920 1977 Ericsson Erisoft AB 1978 SE-971 28 Lulea, Sweden 1980 Zhigang Liu Tel: +1 972 894-5935 1981 Email: zhigang.liu@nokia.com 1983 Nokia Research Center 1984 6000 Connection Drive 1985 Irving, TX 75039 1986 USA 1988 Price et al. [PAGE 37] 1989 Appendix A. Mnemonic language 1991 Writing UDVM programs directly in bytecode would be a daunting task, 1992 so a simple mnemonic language is provided to facilitate the creation 1993 of new decompression algorithms. Most importantly, the language 1994 allows the parameters of an instruction to be specified as text names 1995 rather than as integer values. 1997 If an instruction parameter is given as a text name, it should 1998 correspond to exactly one instance of a label, a reserved memory 1999 address or an externally defined keyword. A label is simply a text 2000 name preceded by a colon, for example: 2002 :loop 2003 JUMP (loop) 2005 For any parameters corresponding to a label, the integer value of the 2006 parameter is calculated by the following formula: 2008 parameter_value = (instruction_address - label_address) modulo 2^16 2010 Note that the "label address" is simply the memory address of the 2011 instruction immediately following the label. In particular, the above 2012 example can be rewritten as JUMP (0). 2014 A reserved memory address is specified using the "reserve" keyword 2015 followed by a text_name and (optionally) an integer value. For 2016 example: 2018 reserve apples 2019 reserve pears (8) 2020 reserve bananas 2021 LOAD (bananas, 5) 2023 For any parameters corresponding to a reserved memory address, the 2024 integer value of the parameter is the next free memory address that 2025 has not yet been reserved. Starting at this address, the specified 2026 number of bytes of memory are then reserved (if no value is given 2027 then a total of 2 bytes is reserved). 2029 The first instance of a "reserve" keyword begins reserving memory at 2030 Address 6 (to avoid overwriting the three well-known variables of 2031 Section 4.2). So the above example can be rewritten as LOAD (16, 5). 2033 An externally defined keyword is specified outside of the mnemonic 2034 language. All of the application-defined parameters are considered to 2035 be externally defined keywords and can be referenced in the mnemonic 2036 code (useful for adapting the code based on the available memory or 2037 CPU cycles). The following additional keywords can also be used: 2039 Keyword: Corresponding value: 2041 byte_copy_left 0 2042 byte_copy_right 2 2044 Price et al. [PAGE 38] 2045 stack_location 4 2046 reserved_end See below 2047 bytecode_length See below 2048 total_length See below 2050 The keyword reserved_end specifies the highest reserved memory 2051 address for the entire mnemonic code (taking into account all the 2052 occasions where memory is reserved). 2054 The keyword bytecode_length specifies the total size of the bytecode 2055 corresponding to the mnemonic code. Any instances of bytecode_length 2056 are initially replaced with 3 bytes of zeroes, and then are filled in 2057 after the remainder of the bytecode has been generated. 2059 Similarly, the keyword total_length specifies the total amount of 2060 memory required at the UDVM including bytecode and reserved memory 2061 addresses. 2063 A complete description of the mnemonic language and how it should be 2064 translated into bytecode is given below: 2066 Instructions: Instruction names are given in capitals. Replace 2067 each name with the corresponding 1-byte value as 2068 per Chapter 6. 2070 $: When appended to the front of an instruction 2071 parameter then the parameter is a memory address 2072 rather than a direct value. This symbol is 2073 mandatory for reference parameters, optional for 2074 multitype parameters and disallowed for literals. 2076 Integers: Instruction parameters can be given in the form of 2077 decimal integers. They are converted into the 2078 shortest bytecode capable of representing the 2079 integer by the rules of Section 4.3. 2081 Text references: Instruction parameters can also be given in the 2082 form of lowercase names. These names should match 2083 exactly one label, reserved memory address or 2084 externally defined keyword as described above. 2086 Labels: Label names are given as a colon followed by 2087 lowercase text. They are deleted when converting 2088 the mnemonics to bytecode. 2090 Reserved memory: Memory addresses are reserved using the "reserve" 2091 keyword. The line containing the reserve keyword 2092 is deleted when converting to bytecode. 2094 .LSB: When appended to the end of a text name, the 2095 integer value corresponding to the name is 2096 increased by 1. This is useful for addressing the 2097 LSBs of a 2-byte variable. 2099 Price et al. [PAGE 39] 2100 0b, 0d: Bytecode values can be specified directly in 2101 binary or decimal via the appropriate prefix. The 2102 direct bytecode continues until a character occurs 2103 that is not an integer or whitespace. 2105 Whitespace: All whitespace (plus brackets and commas) just 2106 delimit the instructions. Delete. 2108 Comments: These are indicated by a semicolon and continue 2109 to the end of the line. Delete. 2111 Once the mnemonic code has been converted into bytecode, it can be 2112 executed by copying the bytecode into the UDVM memory beginning at 2113 the first memory address that has not been reserved by an instance of 2114 the "reserve" keyword. Program execution is assumed to begin at this 2115 address. 2117 Note that further to the rules outlined above, well-written mnemonic 2118 code will also have the following properties: 2120 * Any instance of a memory address will be specified as a text 2121 reference rather than an integer value. This ensures that the 2122 mnemonic code is portable. 2124 * The mnemonic code will not write to any memory address except 2125 those reserved by the "reserve" keyword. This ensures that the 2126 code can be compiled. 2128 Appendix B. Example application-defined parameters 2130 This appendix gives some example values for each of the application- 2131 defined parameters. These values are geared towards the compression 2132 of a text-based protocol running over UDP or TCP, for example a 2133 signaling protocol such as [SIP]. 2135 Note that all of the proposed values are fixed and not negotiated 2136 between the two instances of the application invoking the compressor 2137 and the UDVM. This is because it is possible for the application 2138 invoking the UDVM to receive compressed messages from several 2139 different applications, and it is difficult to determine which 2140 message corresponds to which application. [SIP] does this using 2141 "From:" and "To:" fields in the message itself, but these are not 2142 visible until the message has been decompressed. It is simpler just 2143 to fix a set of parameter for every instance of the application. 2145 UDVM_version 0 2146 maximum_compressed_size 65535 2147 maximum_uncompressed_size 65535 2148 minimum_hash_size 6 2149 overall_memory_size 8192 2150 working_memory_start 0 2151 working_memory_end 8191 2152 cycles_per_bit 20 2153 cycles_per_message 2000 2155 Price et al. [PAGE 40] 2156 first_instruction 26 2157 Note that the parameters overall_memory_size, cycles_per_bit and 2158 cycles_per_message can be increased on the fly using the feedback 2159 mechanism of Chapter 7. This mechanism is designed to be function 2160 correctly even when the application invoking the UDVM is sent 2161 compressed messages from several different applications. 2163 The initial contents of the UDVM memory also need to be defined. It 2164 is not enough simply to initialize the memory containing all zeroes, 2165 as the UDVM would be unable to input any compressed data. Instead, 2166 for each new compressed message the memory should be initialized 2167 containing a simple decompressor capable of extracting the first few 2168 bytes of compressed data. These bytes can then be interpreted as UDVM 2169 instructions for a more powerful decompression algorithm, a state 2170 reference to retrieve a previously stored algorithm etc. 2172 As an example, the following mnemonic code can be converted to 2173 bytecode and pasted into the UDVM memory beginning at Address 26: 2175 reserve length 2176 reserve destination 2177 reserve hash (16) 2179 INPUT-BYTECODE (1, length, fail) 2180 COMPARE (length, 16, retrieve_state, retrieve_state, new_code) 2182 :retrieve_state 2184 INPUT-BYTECODE ($length, hash, fail) 2185 STATE-EXECUTE (hash, $length) 2187 :new_code 2189 INPUT-BYTECODE (2, destination, fail) 2190 INPUT-BYTECODE ($length, $destination, fail) 2191 SUBTRACT ($destination, execute_new_code) 2193 :execute_new_code 2195 JUMP ($destination) 2197 :fail 2199 DECOMPRESSION-FAILURE 2201 The mnemonic code requests a single byte of compressed data, which is 2202 considered to be a length from 0 to 255. Lengths from 0 to 16 2203 inclusive are interpreted as the length of a hash value that is used 2204 to retrieve and run bytecode previously stored as state. Lengths from 2205 17 to 255 are interpreted as an amount of new UDVM bytecode to be 2206 extracted from the start of the compressed data. 2208 Price et al. [PAGE 41] 2209 Finally, the application can define initial state that is available 2210 to the UDVM. Examples of application-defined state include common 2211 decompression algorithms, dictionaries of common text phrases etc. 2213 Appendix C. Example decompression algorithms 2215 This appendix gives examples of decompression algorithms which can be 2216 downloaded to the UDVM in the form of bytecode. 2218 C.1. Example UDVM code for simple LZ77 decompression 2220 The first example gives the code required to decompress data from a 2221 very simple LZ77-based algorithm. The UDVM is instructed to interpret 2222 a compressed message as a set of 4-byte characters, where each 2223 character contains a 2-byte position integer followed by a 2-byte 2224 length integer. Taken together these integers point to a previously 2225 received text string in the UDVM memory, which is then copied to the 2226 end of the uncompressed message. 2228 Since the compressor can only send references to strings already 2229 present in the UDVM memory, before the first message is decompressed 2230 the memory must be initialized with a static dictionary containing 2231 the 256 ASCII characters. 2233 The algorithm write-protects the memory containing the UDVM 2234 instructions used to decompress each character, so that they can 2235 easily be compiled to improve the speed of decompression. 2237 A 2-byte CRC over the uncompressed message is appended to the end of 2238 the compressed message, to verify that correct decompression has 2239 occurred. The algorithm also requests that the contents of the UDVM 2240 memory be saved using the state request mechanism, so that it can be 2241 retrieved by sending the appropriate 6-byte hash. 2243 reserve byte_copy_left 2244 reserve byte_copy_right 2245 reserve uncompressed_start 2246 reserve uncompressed_end 2247 reserve uncompressed_length 2248 reserve position 2249 reserve length 2250 reserve static_dictionary (256) 2251 reserve circular_buffer (2048) 2253 WORKING-MEMORY (uncompressed_start, reserved_end) 2254 MULTILOAD (0, 7, circular_buffer, reserved_end, static_dictionary, 2255 circular_buffer, 0, 0, 0) 2257 :unpack_static_dictionary 2259 ; The following instructions initialize the static dictionary. 2261 COPY-LITERAL (position.LSB, 1, $uncompressed_start) 2262 ADD ($position, 1) 2264 Price et al. [PAGE 42] 2265 COMPARE ($position, 256, unpack_static_dictionary, next_character, 0) 2267 :next_character 2269 INPUT-FIXED (16, position, fail) 2270 INPUT-FIXED (16, length, end_of_message) 2271 COPY-LITERAL ($position, $length, $uncompressed_end) 2272 ADD ($uncompressed_length, $length) 2273 JUMP (next_character) 2275 :fail 2277 DECOMPRESSION-FAILURE 2279 :end_of_message 2281 CRC ($position, $uncompressed_start, $uncompressed_length, fail) 2282 OUTPUT ($uncompressed_start, $uncompressed_length) 2283 END-MESSAGE (6, 0, total_length, next_character, 0) 2285 Price et al. [PAGE 43]