idnits 2.17.1 draft-costanzo-lzju90-mime-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 4 longer pages, the longest (page 4) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 10 has weird spacing: '...fts are worki...' == Line 11 has weird spacing: '...ments of the ...' == Line 12 has weird spacing: '...t other group...' == Line 16 has weird spacing: '...and may be ...' == Line 20 has weird spacing: '...atus of any ...' == (17 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 3, 1998) is 9367 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '256' is mentioned on line 584, but not defined -- Looks like a reference, but probably isn't: 'N' on line 276 == Missing Reference: '80' is mentioned on line 415, but not defined -- Looks like a reference, but probably isn't: 'K' on line 593 ** Downref: Normative reference to an Experimental RFC: RFC 1505 (ref. '1') -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' Summary: 11 errors (**), 0 flaws (~~), 10 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT A. Costanzo 2 draft-costanzo-lzju90-mime-01.txt AKC Computer Services Corp. 3 Expires: March 8th, 1999 September 3, 1998 5 Definition of the LZJU90 6 MIME Content Transfer Encoding Type 8 1. Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working docu- 11 ments of the Internet Engineering Task Force (IETF), its areas, and its 12 working groups. Note that other groups may also distribute working 13 documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months 16 and may be updated, replaced, or obsoleted by other documents at any 17 time. It is inappropriate to use Internet-Drafts as reference material 18 or to cite them other than as "work in progress." 20 To learn the current status of any Internet-Draft, please check the 21 "1id-abstracts.txt" listing contained in the Internet- Drafts Shadow 22 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 23 munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast, or ftp.isi.edu 24 (US West Coast). 26 Distribution of this memo is unlimited. 28 2. Abstract 30 This memo defines a new transfer encoding type for MIME, namely LZJU90. 31 LZJU90 specifies a section consisting of an encoded binary or text 32 object. The encoding provides both compression and representation in a 33 text format. 35 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 36 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this 37 document are to be interpreted as described in RFC 2119. 39 3. Introduction 41 The Multipurpose Internet Message Extensions (MIME) define a facility 42 whereby messages may use a specific transfer encoding scheme to survive 43 the process of transport through various mail delivery systems and 44 transport agents. 46 This Draft specifies and defines the LZJU90 transfer encoding for MIME 47 compliant mail systems. This transfer encoding was first defined in 48 RFC 1505 [1]. 50 Costanzo [Page 1] 51 EXPIRES IN SIX MONTHS September 3, 1998 53 LZJU90 specifies a section consisting of an encoded binary or text 54 object. The encoding (defined below) provides both compression and 55 representation in a text format. This encoding is advantageous and 56 superior over other encoding schemes in that the resulting object 57 is compressed, usually much smaller than an object using a transfer 58 encoding of some other type. The resulting compressed object, is in 59 the character set ISO-10646-UTF-8. [2] [3] 61 3. Definition of the LZJU90 Compressed Encoding 63 LZJU90 is an encoding for a binary or text object to be 64 sent in an Internet mail message. The encoding provides both 65 compression and representation in a text format that will successfully 66 survive transmission through the many different mailers and gateways 67 that comprise the Internet and connected mail networks. 69 3.1 Overview 71 The encoding first compresses the binary object, using a modified 72 LZ77 algorithm, called LZJU90. It then encodes each 6 bits of the 73 output of the compression as a text character, using a character set 74 chosen to survive any translations between codes, such as ASCII to 75 EBCDIC. The 64 six-bit strings 000000 through 111111 are represented 76 by the characters "+", "-", "0" to "9", "A" to "Z", and "a" to "z". 77 The output text begins with a line identifying the encoding. This is 78 for visual reference only, the content transfer encoding header 79 identifies the section to the user program. It also names the object 80 that was encoded, usually by a file name. 82 The format of this line is: 84 * LZJU90 86 where is optional. For example: 88 * LZJU90 foobar 90 This is followed by the compressed and encoded data, broken into 91 lines where convenient. It is recommended that lines be broken about 92 every 76 characters length. The decoder must accept lines with 1 to 93 1000 characters on each line. After this, there is one final line that 94 gives the number of bytes in the original data and a CRC of the 95 original data. This should match the byte count and CRC found during 96 decompression. 98 This line has the format: 100 * 102 Costanzo [Page 2] 103 EXPIRES IN SIX MONTHS September 3, 1998 105 where is a decimal number, and CRC is 8 hexadecimal digits. 107 For example: 108 * 4128076 5AC2D50E 110 The count used in encoding the object. This numeral is the total number 111 of lines, including the start and end lines that begin with *. 113 3.2 Specification of the LZJU90 compression 115 The Lempel-Ziv-Storer-Szymanski model of mixing pointers and literal 116 characters is used in the compression algorithm. Repeat occurrences 117 of strings of octets are replaced by pointers to the earlier 118 occurrence. 120 The data compression is defined by the decoding algorithm. Any 121 encoder that emits symbols which cause the decoder to produce the 122 original input is defined to be valid. 124 There are many possible strategies for the maximal-string matching 125 that the encoder does, section 3.2.2 gives an example of 126 one such algorithm. 128 Regardless of which algorithm is used, and what tradeoffs 129 are made between compression ratio and execution speed or space, 130 the result can always be decoded by the simple decoder. 132 The compressed data consists of a mixture of unencoded literal 133 characters and copy pointers which point to an earlier occurrence of 134 the string to be encoded. 136 Compressed data contains two types of codewords: 138 LITERAL pass the literal directly to the uncompressed output. 140 COPY length, offset 141 go back offset characters in the output and copy length 142 characters forward to the current position. 144 To distinguish between codewords, the copy length is used. A copy 145 length of zero indicates that the following codeword is a literal 146 codeword. A copy length greater than zero indicates that the 147 following codeword is a copy codeword. 149 To improve copy length encoding, a threshold value of 2 has been 150 subtracted from the original copy length for copy codewords, because 151 the minimum copy length is 3 in this compression scheme. 153 Costanzo [Page 3] 154 EXPIRES IN SIX MONTHS September 3, 1998 156 The maximum offset value is set at 32255. Larger offsets offer 157 extremely low improvements in compression (less than 1 percent, 158 typically). 160 No special encoding is done on the LITERAL characters. However, 161 unary encoding is used for the copy length and copy offset values to 162 improve compression. A start-step-stop unary code is used. 164 A (start, step, stop) unary code of the integers is defined as 165 follows: The Nth codeword has N ones followed by a zero followed by 166 a field of size START + (N * STEP). If the field width is equal to 167 STOP then the preceding zero can be omitted. The integers are laid 168 out sequentially through these codewords. For example, (0, 1, 4) 169 would look like: 171 Codeword Range 172 0 0 173 10x 1-2 174 110xx 3-6 175 1110xxx 7-14 176 1111xxxx 15-30 178 Following are the actual values used for copy length and copy offset: 180 The copy length is encoded with a (0, 1, 7) code leading to a maximum 181 copy length of 256 by including the THRESHOLD value of 2. 183 Codeword Range 184 0 0 185 10x 3-4 186 110xx 5-8 187 1110xxx 9-16 188 11110xxxx 17-32 189 111110xxxxx 33-64 190 1111110xxxxxx 65-128 191 1111111xxxxxxx 129-256 193 The copy offset is encoded with a (9, 1, 14) code leading to a 194 maximum copy offset of 32255. Offset 0 is reserved as an end of 195 compressed data flag. 197 Codeword Range 198 0xxxxxxxxx 0-511 199 10xxxxxxxxxx 512-1535 200 110xxxxxxxxxxx 1536-3583 201 1110xxxxxxxxxxxx 3485-7679 202 11110xxxxxxxxxxxxx 7680-15871 203 11111xxxxxxxxxxxxxx 15872-32255 205 Costanzo [Page 4] 206 EXPIRES IN SIX MONTHS September 3, 1998 208 The 0 has been chosen to signal the start of the field for ease of 209 encoding. (The bit generator can simply encode one more bit than is 210 significant in the binary representation of the excess.) 212 The stop values are useful in the encoding to prevent out of range 213 values for the lengths and offsets, as well as shortening some codes 214 by one bit. 216 The worst case compression using this scheme is a 1/8 increase in 217 size of the encoded data. (One zero bit followed by 8 character 218 bits). After the character encoding, the worst case ratio is 3/2 to 219 the original data. 221 The minimum copy length of 3 has been chosen because the worst case 222 copy length and offset is 3 bits (3) and 19 bits (32255) for a total 223 of 22 bits to encode a 3 character string (24 bits). 225 3.2.1 Sample Decoder 227 As mentioned previously, the compression is defined by the decoder. 228 Any encoder that produced output that is correctly decoded is by 229 definition correct. 231 The following is an implementation of the decoder, written more for 232 clarity and as much portability as possible, rather than for maximum 233 speed. 235 When optimized for a specific environment, it will run significantly 236 faster. 238 /* LZJU 90 Decoding program */ 240 /* Written By Robert Jung and Robert Ullmann, 1990 and 1991. */ 242 /* This code is NOT COPYRIGHT, not protected. It is in the true 243 Public Domain. */ 245 #include 246 #include 248 typedef unsigned char uchar; 249 typedef unsigned int uint; 251 #define N 32255 252 #define THRESHOLD 3 254 #define STRTP 9 255 #define STEPP 1 256 #define STOPP 14 257 #define STRTL 0 258 #define STEPL 1 259 #define STOPL 7 260 Costanzo [Page 5] 261 EXPIRES IN SIX MONTHS September 3, 1998 263 static FILE *in; 264 static FILE *out; 266 static int getbuf; 267 static int getlen; 268 static long in_count; 269 static long out_count; 270 static long crc; 271 static long crctable[256]; 272 static uchar xxcodes[] = 273 "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\ 274 abcdefghijklmnopqrstuvwxyz"; 275 static uchar ddcodes[256]; 276 static uchar text[N]; 278 #define CRCPOLY 0xEDB88320 279 #define CRC_MASK 0xFFFFFFFF 280 #define UPDATE_CRC(crc, c) \ 281 crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \ 282 ^ (crc >> 8) 283 #define START_RECD "* LZJU90" 285 void MakeCrctable() /* Initialize CRC-32 table */ 286 { 287 uint i, j; 288 long r; 289 for (i = 0; i <= 255; i++) { 290 r = i; 291 for (j = 8; j > 0; j--) { 292 if (r & 1) 293 r = (r >> 1) ^ CRCPOLY; 294 else 295 r >>= 1; 296 } 297 crctable[i] = r; 298 } 299 } 301 int GetXX() /* Get xxcode and translate */ 302 { 303 int c; 304 do { 305 if ((c = fgetc(in)) == EOF) 306 c = 0; 307 } while (c == '\n'); 308 in_count++; 309 return ddcodes[c]; 310 } 312 Costanzo [Page 6] 313 EXPIRES IN SIX MONTHS September 3, 1998 315 int GetBit() /* Get one bit from input buffer */ 316 { 317 int c; 318 while (getlen <= 0) { 319 c = GetXX(); 320 getbuf |= c << (10-getlen); 321 getlen += 6; 322 } 323 c = (getbuf & 0x8000) != 0; 324 getbuf <<= 1; 325 getbuf &= 0xFFFF; 326 getlen--; 327 return(c); 328 } 329 int GetBits(int len) /* Get len bits */ 330 { 331 int c; 332 while (getlen <= 10) { 333 c = GetXX(); 334 getbuf |= c << (10-getlen); 335 getlen += 6; 336 } 337 if (getlen < len) { 338 c = (uint)getbuf >> (16-len); 339 getbuf = GetXX(); 340 c |= getbuf >> (6+getlen-len); 341 getbuf <<= (10+len-getlen); 342 getbuf &= 0xFFFF; 343 getlen -= len - 6; 344 } 345 else { 346 c = (uint)getbuf >> (16-len); 347 getbuf <<= len; 348 getbuf &= 0xFFFF; 349 getlen -= len; 350 } 351 return(c); 352 } 354 int DecodePosition() /* Decode offset position pointer */ 355 { 356 int c; 357 int width; 358 int plus; 359 int pwr; 360 plus = 0; 361 pwr = 1 << STRTP; 362 for (width = STRTP; width < STOPP; width += STEPP) { 363 c = GetBit(); 364 if (c == 0) 366 Costanzo [Page 7] 367 EXPIRES IN SIX MONTHS September 3, 1998 369 break; 370 plus += pwr; 371 pwr <<= 1; 372 } 374 if (width != 0) 375 c = GetBits(width); 376 c += plus; 377 return(c); 378 } 380 int DecodeLength() /* Decode code length */ 381 { 382 int c; 383 int width; 384 int plus; 385 int pwr; 386 plus = 0; 387 pwr = 1 << STRTL; 388 for (width = STRTL; width < STOPL; width += STEPL) { 389 c = GetBit(); 390 if (c == 0) 391 break; 392 plus += pwr; 393 pwr <<= 1; 394 } 395 if (width != 0) 396 c = GetBits(width); 397 c += plus; 398 return(c); 399 } 401 void InitCodes() /* Initialize decode table */ 402 { 403 int i; 404 for (i = 0; i < 256; i++) ddcodes[i] = 0; 405 for (i = 0; i < 64; i++) ddcodes[xxcodes[i]] = i; 406 return; 407 } 409 main(int ac, char **av) /* main program */ 410 { 411 int r; 412 int j, k; 413 int c; 414 int pos; 415 char buf[80]; 416 char name[3]; 417 long num, bytes; 419 Costanzo [Page 8] 420 EXPIRES IN SIX MONTHS September 3, 1998 422 if (ac < 3) { 423 fprintf(stderr, "usage: judecode in out\n"); 424 return(1); 425 } 427 in = fopen(av[1], "r"); 428 if (!in){ 429 fprintf(stderr, "Can't open %s\n", av[1]); 430 return(1); 431 } 433 out = fopen(av[2], "wb"); 434 if (!out) { 435 fprintf(stderr, "Can't open %s\n", av[2]); 436 fclose(in); 438 return(1); 439 } 441 while (1) { 442 if (fgets(buf, sizeof(buf), in) == NULL) { 443 fprintf(stderr, "Unexpected EOF\n"); 444 return(1); 445 } 446 if (strncmp(buf, START_RECD, strlen(START_RECD)) == 0) 447 break; 448 } 450 in_count = 0; 451 out_count = 0; 452 getbuf = 0; 453 getlen = 0; 455 InitCodes(); 456 MakeCrctable(); 458 crc = CRC_MASK; 459 r = 0; 461 while (feof(in) == 0) { 462 c = DecodeLength(); 463 if (c == 0) { 464 c = GetBits(8); 465 UPDATE_CRC(crc, c); 466 out_count++; 467 text[r] = c; 468 fputc(c, out); 469 if (++r >= N) 470 r = 0; 471 } 473 Costanzo [Page 9] 474 EXPIRES IN SIX MONTHS September 3, 1998 476 else { 477 pos = DecodePosition(); 478 if (pos == 0) 479 break; 480 pos--; 481 j = c + THRESHOLD - 1; 482 pos = r - pos - 1; 483 if (pos < 0) 484 pos += N; 485 for (k = 0; k < j; k++) { 486 c = text[pos]; 487 text[r] = c; 488 UPDATE_CRC(crc, c); 489 out_count++; 490 fputc(c, out); 491 if (++r >= N) 492 r = 0; 493 if (++pos >= N) 495 pos = 0; 496 } 497 } 498 } 500 fgetc(in); /* skip newline */ 502 if (fscanf(in, "* %ld %lX", &bytes, &num) != 2) { 503 fprintf(stderr, "CRC record not found\n"); 504 return(1); 505 } 507 else if (crc != num) { 508 fprintf(stderr, 509 "CRC error, expected %lX, found %lX\n", 510 crc, num); 511 return(1); 512 } 514 else if (bytes != out_count) { 515 fprintf(stderr, 516 "File size error, expected %lu, found %lu\n", 517 bytes, out_count); 518 return(1); 519 } 521 else 522 fprintf(stderr, 523 "File decoded to %lu bytes correctly\n", 524 out_count); 526 EXPIRES IN SIX MONTHS September 3, 1998 528 fclose(in); 529 fclose(out); 530 return(0); 531 } 533 3.2.2 An Example of an Encoder 535 Many algorithms are possible for the encoder, with different 536 tradeoffs between speed, size, and complexity. The following 537 is an example program which is fairly efficient; more sophisticated 538 implementations will run much faster, and produce somewhat 539 better compression. 541 This example also shows that the encoder need not use the entire 542 window available. Not using the full window costs a small amount of 543 compression, but can greatly increase the speed of some algorithms. 545 /* LZJU 90 Encoding program */ 547 /* Written By Robert Jung and Robert Ullmann, 1990 and 1991. */ 549 /* This code is NOT COPYRIGHT, not protected. It is in the true 550 Public Domain. */ 552 #include 554 typedef unsigned char uchar; 555 typedef unsigned int uint; 557 #define N 24000 /* Size of window buffer */ 558 #define F 256 /* Size of look-ahead buffer */ 559 #define THRESHOLD 3 560 #define K 16384 /* Size of hash table */ 562 #define STRTP 9 563 #define STEPP 1 564 #define STOPP 14 566 #define STRTL 0 567 #define STEPL 1 568 #define STOPL 7 570 #define CHARSLINE 78 572 static FILE *in; 573 static FILE *out; 575 static int putlen; 576 static int putbuf; 577 static int char_ct; 578 static long in_count; 579 static long out_count; 581 EXPIRES IN SIX MONTHS September 3, 1998 583 static long crc; 584 static long crctable[256]; 585 static uchar xxcodes[] = 586 "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\ 587 abcdefghijklmnopqrstuvwxyz"; 588 uchar window_text[N + F + 1]; 590 /* text contains window, plus 1st F of window again 591 (for comparisons) */ 593 uint hash_table[K]; 594 /* table of pointers into the text */ 596 #define CRCPOLY 0xEDB88320 597 #define CRC_MASK 0xFFFFFFFF 598 #define UPDATE_CRC(crc, c) \ 599 crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \ 600 ^ (crc >> 8) 602 void MakeCrctable() /* Initialize CRC-32 table */ 603 { 604 uint i, j; 605 long r; 606 for (i = 0; i <= 255; i++) { 607 r = i; 608 for (j = 8; j > 0; j--) { 609 if (r & 1) 610 r = (r >> 1) ^ CRCPOLY; 611 else 612 r >>= 1; 613 } 614 crctable[i] = r; 615 } 616 } 618 void PutXX(int c) /* Translate and put xxcode */ 619 { 620 c = xxcodes[c & 0x3F]; 621 if (++char_ct > CHARSLINE) { 622 char_ct = 1; 623 fputc('\n', out); 624 } 625 fputc(c, out); 626 out_count++; 627 } 629 EXPIRES IN SIX MONTHS September 3, 1998 631 void PutBits(int c, int len) /* Put rightmost "len" bits of "c" */ 632 { 633 c <<= 16 - len; 634 c &= 0xFFFF; 635 putbuf |= (uint) c >> putlen; 636 c <<= 16 - putlen; 637 c &= 0xFFFF; 638 putlen += len; 639 while (putlen >= 6) { 640 PutXX(putbuf >> 10); 641 putlen -= 6; 642 putbuf <<= 6; 643 putbuf &= 0xFFFF; 644 putbuf |= (uint) c >> 10; 645 c = 0; 646 } 647 } 649 void EncodePosition(int ch) /* Encode offset position pointer */ 650 { 651 int width; 652 int prefix; 653 int pwr; 654 pwr = 1 << STRTP; 655 for (width = STRTP; ch >= pwr; width += STEPP, pwr <<= 1) 656 ch -= pwr; 657 if ((prefix = width - STRTP) != 0) 658 PutBits(0xffff, prefix); 659 if (width < STOPP) 660 width++; 661 /* else if (width > STOPP) 662 abort(); do nothing */ 663 PutBits(ch, width); 664 } 666 void EncodeLength(int ch) /* Encode code length */ 667 { 668 int width; 669 int prefix; 670 int pwr; 671 pwr = 1 << STRTL; 672 for (width = STRTL; ch >= pwr; width += STEPL, pwr <<= 1) 673 ch -= pwr; 674 if ((prefix = width - STRTL) != 0) 675 PutBits(0xffff, prefix); 676 if (width < STOPL) 677 width++; 678 /* else if (width > STOPL) 679 abort(); do nothing */ 680 PutBits(ch, width); 681 } 683 EXPIRES IN SIX MONTHS September 3, 1998 685 main(int ac, char **av) /* main program */ 686 { 687 uint r, s, i, c; 688 uchar *p, *rp; 689 int match_position; 690 int match_length; 691 int len; 692 uint hash, h; 694 if (ac < 3) { 695 fprintf(stderr, "usage: juencode in out\n"); 696 return(1); 697 } 699 in = fopen(av[1], "rb"); 700 if (!in) { 701 fprintf(stderr, "Can't open %s\n", av[1]); 702 return(1); 703 } 705 out = fopen(av[2], "w"); 706 if (!out) { 707 fprintf(stderr, "Can't open %s\n", av[2]); 708 fclose(in); 709 return(1); 710 } 712 char_ct = 0; 713 in_count = 0; 714 out_count = 0; 715 putbuf = 0; 716 putlen = 0; 717 hash = 0; 719 MakeCrctable(); 720 crc = CRC_MASK; 722 fprintf(out, "* LZJU90 %s\n", av[1]); 724 /* The hash table inititialization is somewhat arbitrary */ 725 for (i = 0; i < K; i++) hash_table[i] = i % N; 727 r = 0; 728 s = 0; 730 /* Fill lookahead buffer */ 732 for (len = 0; len < F && (c = fgetc(in)) != EOF; len++) { 734 UPDATE_CRC(crc, c); 735 in_count++; 736 window_text[s++] = c; 737 } 739 EXPIRES IN SIX MONTHS September 3, 1998 741 while (len > 0) { 742 /* look for match in window at hash position */ 743 h = ((((window_text[r] << 5) ^ window_text[r+1]) 744 << 5) ^ window_text[r+2]); 745 p = window_text + hash_table[h % K]; 746 rp = window_text + r; 747 for (i = 0, match_length = 0; i < F; i++) { 748 if (*p++ != *rp++) break; 749 match_length++; 750 } 751 match_position = r - hash_table[h % K]; 752 if (match_position <= 0) match_position += N; 754 if (match_position > N - F - 2) match_length = 0; 755 if (match_position > in_count - len - 2) 756 match_length = 0; /* ! :-) */ 758 if (match_length > len) 759 match_length = len; 760 if (match_length < THRESHOLD) { 761 EncodeLength(0); 762 PutBits(window_text[r], 8); 763 match_length = 1; 764 } 765 else { 766 EncodeLength(match_length - THRESHOLD + 1); 767 EncodePosition(match_position); 768 } 770 for (i = 0; i < match_length && 771 (c = fgetc(in)) != EOF; i++) { 772 UPDATE_CRC(crc, c); 773 in_count++; 774 window_text[s] = c; 775 if (s < F - 1) 776 window_text 777 [s + N] = c; 778 if (++s > N - 1) s = 0; 779 hash = ((hash << 5) ^ window_text[r]); 780 if (r > 1) hash_table[hash % K] = r - 2; 781 if (++r > N - 1) r = 0; 782 } 784 while (i++ < match_length) { 785 if (++s > N - 1) s = 0; 786 hash = ((hash << 5) ^ window_text[r]); 787 if (r > 1) hash_table[hash % K] = r - 2; 788 if (++r > N - 1 ) r = 0; 789 len--; 790 } 791 } 793 EXPIRES IN SIX MONTHS September 3, 1998 795 /* end compression indicator */ 796 EncodeLength(1); 797 EncodePosition(0); 798 PutBits(0, 7); 800 fprintf(out, "\n* %lu %08lX\n", in_count, crc); 801 fprintf(stderr, "Encoded %lu bytes to %lu symbols\n", 802 in_count, out_count); 804 fclose(in); 805 fclose(out); 807 return(0); 808 } 810 3.3.1 Example of a MIME LZJU90 Compressed Object 812 The following is an example of a MIME LZJU90 compressed object. 814 Content-Type: text/plain; 815 charset="utf-8" 816 Content-Transfer-Encoding: LZJU90 818 * LZJU90 example 819 8-mBtWA7WBVZ3dEBtnCNdU2WkE4owW+l4kkaApW+o4Ir0k33Ao4IE4kk 820 bYtk1XY618NnCQl+OHQ61d+J8FZBVVCVdClZ2-LUI0v+I4EraItasHbG 821 VVg7c8tdk2lCBtr3U86FZANVCdnAcUCNcAcbCMUCdicx0+u4wEETHcRM 822 7tZ2-6Btr268-Eh3cUAlmBth2-IUo3As42laIE2Ao4Yq4G-cHHT-wCEU 823 6tjBtnAci-I++ 824 * 190 081E2601 826 4. Security Consideration 828 The security (or lack) is responsibility of the application domain 829 controlling the decoder of the LZJU90 object. 831 5. References 833 [1] Costanzo, A. Robinson, D. and R. Ullmann, "Encoding Header 834 Field for Internet Messages", RFC 1505, AKC Consulting, 835 Prime Computer, Inc., August 1993. 837 [2] International Organization for Standardization, Information 838 Technology -- Universal Coded Character Set (UCS). ISO/IEC 839 10646-1:1993, June 1993. 841 [3] International Organization for Standardization, Information 842 Technology -- Universal Coded Character Set (UCS). ISO/IEC 843 10646-1: 1993/AMD.2: 1996 (E) 845 EXPIRES IN SIX MONTHS September 3, 1998 847 6. Acknowledgments 849 The author would like to thank Robert Jung, David Robinson and 850 Robert Ullmann for their past contributions to this work. 852 7. Author's Address 854 Al Costanzo 855 AKC Computer Services Corp. 856 P.O. Box 4031 857 Roselle Park, NJ 07204-0531 859 Phone: +1 908 298 9000 860 Email: AL@AKC.COM