| < draft-deutsch-gzip-spec-01.txt | draft-deutsch-gzip-spec-02.txt > | |||
|---|---|---|---|---|
| INTERNET-DRAFT L. Peter Deutsch | INTERNET-DRAFT L. Peter Deutsch | |||
| GZIP 4.3 Aladdin Enterprises | GZIP 4.3 Aladdin Enterprises | |||
| Expires: 17 Aug 1996 12 Feb 1996 | Expires: 16 Sep 1996 11 Mar 1996 | |||
| GZIP file format specification version 4.3 | GZIP file format specification version 4.3 | |||
| File draft-deutsch-gzip-spec-01.txt | File draft-deutsch-gzip-spec-02.txt | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft. Internet-Drafts are working | This document is an Internet-Draft. Internet-Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
| and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
| working documents as Internet-Drafts. | working documents as Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| skipping to change at line 32 ¶ | skipping to change at line 32 ¶ | |||
| To learn the current status of any Internet-Draft, please check the | To learn the current status of any Internet-Draft, please check the | |||
| ``1id-abstracts.txt'' listing contained in the Internet- Drafts | ``1id-abstracts.txt'' listing contained in the Internet- Drafts | |||
| Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), | Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), | |||
| munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or | munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or | |||
| ftp.isi.edu (US West Coast). | ftp.isi.edu (US West Coast). | |||
| Distribution of this memo is unlimited. | Distribution of this memo is unlimited. | |||
| Notices | Notices | |||
| Copyright (C) 1996 L. Peter Deutsch | Copyright (c) 1996 L. Peter Deutsch | |||
| Permission is granted to copy and distribute this document for any | Permission is granted to copy and distribute this document for any | |||
| purpose and without charge, including translations into other | purpose and without charge, including translations into other | |||
| languages and incorporation into compilations, provided that it is | languages and incorporation into compilations, provided that it is | |||
| copied as a whole (including the copyright notice and this notice) | copied as a whole (including the copyright notice and this notice) | |||
| and with no changes. | and with no changes. | |||
| Deutsch [Page 1] | ||||
| Abstract | Abstract | |||
| This specification defines a lossless compressed data format that is | This specification defines a lossless compressed data format that is | |||
| compatible with the widely used GZIP utility. The format includes a | compatible with the widely used GZIP utility. The format includes a | |||
| cyclic redundancy check value for detecting data corruption. The | cyclic redundancy check value for detecting data corruption. The | |||
| format presently uses the DEFLATE method of compression but can be | format presently uses the DEFLATE method of compression but can be | |||
| easily extended to use other compression methods. The format can be | easily extended to use other compression methods. The format can be | |||
| implemented readily in a manner not covered by patents. | implemented readily in a manner not covered by patents. | |||
| Table of contents | Table of Contents | |||
| Deutsch [Page 1] | ||||
| 1. Introduction ................................................... 2 | 1. Introduction ................................................... 2 | |||
| 1.1 Purpose .................................................... 2 | 1.1. Purpose ................................................... 2 | |||
| 1.2 Intended audience .......................................... 2 | 1.2. Intended audience ......................................... 3 | |||
| 1.3 Scope ...................................................... 3 | 1.3. Scope ..................................................... 3 | |||
| 1.4 Compliance ................................................. 3 | 1.4. Compliance ................................................ 3 | |||
| 1.5 Definitions of terms and conventions used .................. 3 | 1.5. Definitions of terms and conventions used ................. 3 | |||
| 1.6 Changes from previous versions ............................. 3 | 1.6. Changes from previous versions ............................ 3 | |||
| 2. Detailed specification ......................................... 3 | 2. Detailed specification ......................................... 4 | |||
| 2.1 Overall conventions ........................................ 3 | 2.1. Overall conventions ....................................... 4 | |||
| 2.2 File format ................................................ 4 | 2.2. File format ............................................... 5 | |||
| 2.3 Member format .............................................. 4 | 2.3. Member format ............................................. 5 | |||
| 2.3.1. Member header and trailer ........................... 5 | 2.3.1. Member header and trailer ........................... 5 | |||
| 2.3.1.1. Extra field ....................................... 8 | * 2.3.1.1. Extra field ...................... 8 | |||
| 2.3.1.2. Compliance ........................................ 8 | * 2.3.1.2. Compliance ....................... 9 | |||
| 3. References ..................................................... 9 | 3. References .................................................. 9 | |||
| 3.1 Related standards .......................................... 9 | 4. Security considerations .................................... 10 | |||
| 3.2 Other related publications ................................. 9 | 5. Acknowledgements ........................................... 10 | |||
| 4. Security considerations ........................................ 9 | 6. Author's address ........................................... 10 | |||
| 5. Acknowledgements .............................................. 10 | 7. Appendix: Jean-loup Gailly's gzip utility .................. 11 | |||
| 6. Author's address .............................................. 10 | 8. Appendix: Sample CRC Code .................................. 11 | |||
| 7. Appendix: Jean-loup Gailly's gzip utility ..................... 10 | ||||
| 8. Appendix: Sample CRC Code ..................................... 11 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1. Purpose | 1.1. Purpose | |||
| The purpose of this specification is to define a lossless | The purpose of this specification is to define a lossless | |||
| compressed data format that: | compressed data format that: | |||
| o Is independent of CPU type, operating system, file system, | ||||
| * Is independent of CPU type, operating system, file system, | ||||
| and character set, and hence can be used for interchange; | and character set, and hence can be used for interchange; | |||
| o Can compress or decompress a data stream (as opposed to a | * Can compress or decompress a data stream (as opposed to a | |||
| randomly accessible file) to produce another data stream, | randomly accessible file) to produce another data stream, | |||
| using only an a priori bounded amount of intermediate | using only an a priori bounded amount of intermediate | |||
| storage, and hence can be used in data communications or | storage, and hence can be used in data communications or | |||
| similar structures such as Unix filters; | similar structures such as Unix filters; | |||
| o Compresses data with efficiency comparable to the best | * Compresses data with efficiency comparable to the best | |||
| currently available general-purpose compression methods, and | currently available general-purpose compression methods, and | |||
| in particular considerably better than the 'compress' | in particular considerably better than the 'compress' | |||
| program; | program; | |||
| o Can be implemented readily in a manner not covered by | * Can be implemented readily in a manner not covered by | |||
| patents, and hence can be practiced freely; | patents, and hence can be practiced freely; | |||
| o Is compatible with the file format produced by the current | ||||
| Deutsch [Page 2] | ||||
| * Is compatible with the file format produced by the current | ||||
| widely used gzip utility, in that conforming decompressors | widely used gzip utility, in that conforming decompressors | |||
| will be able to read data produced by the existing gzip | will be able to read data produced by the existing gzip | |||
| compressor. | compressor. | |||
| The data format defined by this specification does not attempt to: | The data format defined by this specification does not attempt to: | |||
| o Provide random access to compressed data; | ||||
| o Compress specialized data (e.g., raster graphics) as well as | * Provide random access to compressed data; | |||
| * Compress specialized data (e.g., raster graphics) as well as | ||||
| the best currently available specialized algorithms. | the best currently available specialized algorithms. | |||
| 1.2. Intended audience | 1.2. Intended audience | |||
| Deutsch [Page 2] | ||||
| This specification is intended for use by implementors of software | This specification is intended for use by implementors of software | |||
| to compress data into gzip format and/or decompress data from gzip | to compress data into gzip format and/or decompress data from gzip | |||
| format. | format. | |||
| The text of the specification assumes a basic background in | The text of the specification assumes a basic background in | |||
| programming at the level of bits and other primitive data | programming at the level of bits and other primitive data | |||
| representations. | representations. | |||
| 1.3. Scope | 1.3. Scope | |||
| skipping to change at line 136 ¶ | skipping to change at line 138 ¶ | |||
| specifications presented here; a compliant compressor must produce | specifications presented here; a compliant compressor must produce | |||
| files that conform to all the specifications presented here. The | files that conform to all the specifications presented here. The | |||
| material in the appendices is not part of the specification per se | material in the appendices is not part of the specification per se | |||
| and is not relevant to compliance. | and is not relevant to compliance. | |||
| 1.5. Definitions of terms and conventions used | 1.5. Definitions of terms and conventions used | |||
| byte: 8 bits stored or transmitted as a unit (same as an octet). | byte: 8 bits stored or transmitted as a unit (same as an octet). | |||
| (For this specification, a byte is exactly 8 bits, even on | (For this specification, a byte is exactly 8 bits, even on | |||
| machines which store a character on a number of bits different | machines which store a character on a number of bits different | |||
| from 8.) See Section 2.1, below for the numbering of bits within | from 8.) See below for the numbering of bits within a byte. | |||
| a byte. | ||||
| 1.6. Changes from previous versions | 1.6. Changes from previous versions | |||
| There have been no technical changes to the gzip format since | There have been no technical changes to the gzip format since | |||
| version 4.1 of this specification. In version 4.2, some | version 4.1 of this specification. In version 4.2, some | |||
| terminology was changed, and the sample CRC code was rewritten for | terminology was changed, and the sample CRC code was rewritten for | |||
| clarity and to eliminate the requirement for the caller to do pre- | clarity and to eliminate the requirement for the caller to do pre- | |||
| and post-conditioning. Version 4.3 is a conversion of the | and post-conditioning. Version 4.3 is a conversion of the | |||
| Deutsch [Page 3] | ||||
| specification to Internet Draft style. | specification to Internet Draft style. | |||
| 2. Detailed specification | 2. Detailed specification | |||
| 2.1. Overall conventions | 2.1. Overall conventions | |||
| In the diagrams below, a box like this: | In the diagrams below, a box like this: | |||
| +---+ | +---+ | |||
| | | <-- the vertical bars might be missing | | | <-- the vertical bars might be missing | |||
| +---+ | +---+ | |||
| Deutsch [Page 3] | ||||
| represents one byte; a box like this: | represents one byte; a box like this: | |||
| +==============+ | +==============+ | |||
| | | | | | | |||
| +==============+ | +==============+ | |||
| represents a variable number of bytes. | represents a variable number of bytes. | |||
| Bytes stored within a computer do not have a 'bit order', since | Bytes stored within a computer do not have a 'bit order', since | |||
| they are always treated as a unit. However, a byte considered as | they are always treated as a unit. However, a byte considered as | |||
| skipping to change at line 198 ¶ | skipping to change at line 200 ¶ | |||
| 0 1 | 0 1 | |||
| +--------+--------+ | +--------+--------+ | |||
| |00001000|00000010| | |00001000|00000010| | |||
| +--------+--------+ | +--------+--------+ | |||
| ^ ^ | ^ ^ | |||
| | | | | | | |||
| | + more significant byte = 2 x 256 | | + more significant byte = 2 x 256 | |||
| + less significant byte = 8 | + less significant byte = 8 | |||
| Deutsch [Page 4] | ||||
| 2.2. File format | 2.2. File format | |||
| A gzip file consists of a series of "members" (compressed data | A gzip file consists of a series of "members" (compressed data | |||
| sets). The format of each member is specified in the following | sets). The format of each member is specified in the following | |||
| section. The members simply appear one after another in the file, | section. The members simply appear one after another in the file, | |||
| with no additional information before, between, or after them. | with no additional information before, between, or after them. | |||
| 2.3. Member format | 2.3. Member format | |||
| Each member has the following structure: | Each member has the following structure: | |||
| Deutsch [Page 4] | ||||
| +---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+ | |||
| |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->) | |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->) | |||
| +---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+ | |||
| (if FLG.FEXTRA set) | (if FLG.FEXTRA set) | |||
| +---+---+=================================+ | +---+---+=================================+ | |||
| | XLEN |...XLEN bytes of 'extra field'...| (more-->) | | XLEN |...XLEN bytes of 'extra field'...| (more-->) | |||
| +---+---+=================================+ | +---+---+=================================+ | |||
| skipping to change at line 249 ¶ | skipping to change at line 251 ¶ | |||
| |...compressed blocks...| (more-->) | |...compressed blocks...| (more-->) | |||
| +=======================+ | +=======================+ | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+ | |||
| | CRC32 | ISIZE | | | CRC32 | ISIZE | | |||
| +---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+ | |||
| 2.3.1. Member header and trailer | 2.3.1. Member header and trailer | |||
| Deutsch [Page 5] | ||||
| ID1 (IDentification 1) | ID1 (IDentification 1) | |||
| ID2 (IDentification 2) | ID2 (IDentification 2) | |||
| These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 | These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 | |||
| (0x8b, \213), to identify the file as being in gzip format. | (0x8b, \213), to identify the file as being in gzip format. | |||
| CM (Compression Method) | CM (Compression Method) | |||
| This identifies the compression method used in the file. CM | This identifies the compression method used in the file. CM | |||
| = 0-7 are reserved. CM = 8 denotes the 'deflate' | = 0-7 are reserved. CM = 8 denotes the 'deflate' | |||
| compression method, which is the one customarily used by | compression method, which is the one customarily used by | |||
| gzip and which is documented elsewhere. | gzip and which is documented elsewhere. | |||
| Deutsch [Page 5] | ||||
| FLG (FLaGs) | FLG (FLaGs) | |||
| This flag byte is divided into individual bits as follows: | This flag byte is divided into individual bits as follows: | |||
| bit 0 FTEXT | bit 0 FTEXT | |||
| bit 1 FHCRC | bit 1 FHCRC | |||
| bit 2 FEXTRA | bit 2 FEXTRA | |||
| bit 3 FNAME | bit 3 FNAME | |||
| bit 4 FCOMMENT | bit 4 FCOMMENT | |||
| bit 5 reserved | bit 5 reserved | |||
| bit 6 reserved | bit 6 reserved | |||
| bit 7 reserved | bit 7 reserved | |||
| skipping to change at line 306 ¶ | skipping to change at line 304 ¶ | |||
| meaning in gzip 1.2.4.] | meaning in gzip 1.2.4.] | |||
| If FEXTRA is set, optional extra fields are present, as | If FEXTRA is set, optional extra fields are present, as | |||
| described in a following section. | described in a following section. | |||
| If FNAME is set, an original file name is present, | If FNAME is set, an original file name is present, | |||
| terminated by a zero byte. The name must consist of ISO | terminated by a zero byte. The name must consist of ISO | |||
| 8859-1 (LATIN-1) characters; on operating systems using | 8859-1 (LATIN-1) characters; on operating systems using | |||
| EBCDIC or any other character set for file names, the name | EBCDIC or any other character set for file names, the name | |||
| must be translated to the ISO LATIN-1 character set. This | must be translated to the ISO LATIN-1 character set. This | |||
| Deutsch [Page 6] | ||||
| is the original name of the file being compressed, with any | is the original name of the file being compressed, with any | |||
| directory components removed, and, if the file being | directory components removed, and, if the file being | |||
| compressed is on a file system with case insensitive names, | compressed is on a file system with case insensitive names, | |||
| forced to lower case. There is no original file name if the | forced to lower case. There is no original file name if the | |||
| data was compressed from a source other than a named file; | data was compressed from a source other than a named file; | |||
| for example, if the source was stdin on a Unix system, there | for example, if the source was stdin on a Unix system, there | |||
| is no file name. | is no file name. | |||
| If FCOMMENT is set, a zero-terminated file comment is | If FCOMMENT is set, a zero-terminated file comment is | |||
| present. This comment is not interpreted; it is only | present. This comment is not interpreted; it is only | |||
| Deutsch [Page 6] | ||||
| intended for human consumption. The comment must consist of | intended for human consumption. The comment must consist of | |||
| ISO 8859-1 (LATIN-1) characters. Line breaks should be | ISO 8859-1 (LATIN-1) characters. Line breaks should be | |||
| denoted by a single line feed character (10 decimal). | denoted by a single line feed character (10 decimal). | |||
| Reserved FLG bits must be zero. | Reserved FLG bits must be zero. | |||
| MTIME (Modification TIME) | MTIME (Modification TIME) | |||
| This gives the most recent modification time of the original | This gives the most recent modification time of the original | |||
| file being compressed. The time is in Unix format, i.e., | file being compressed. The time is in Unix format, i.e., | |||
| seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this | seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this | |||
| may cause problems for MS-DOS and other systems that use | may cause problems for MS-DOS and other systems that use | |||
| local rather than Universal time.) If the compressed data | local rather than Universal time.) If the compressed data | |||
| did not come from a file, MTIME is set to the time at which | did not come from a file, MTIME is set to the time at which | |||
| compression started. MTIME = 0 means no time stamp is | compression started. MTIME = 0 means no time stamp is | |||
| available. | available. | |||
| XFL (eXtra FLags) | XFL (eXtra FLags) | |||
| skipping to change at line 336 ¶ | skipping to change at line 333 ¶ | |||
| This gives the most recent modification time of the original | This gives the most recent modification time of the original | |||
| file being compressed. The time is in Unix format, i.e., | file being compressed. The time is in Unix format, i.e., | |||
| seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this | seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this | |||
| may cause problems for MS-DOS and other systems that use | may cause problems for MS-DOS and other systems that use | |||
| local rather than Universal time.) If the compressed data | local rather than Universal time.) If the compressed data | |||
| did not come from a file, MTIME is set to the time at which | did not come from a file, MTIME is set to the time at which | |||
| compression started. MTIME = 0 means no time stamp is | compression started. MTIME = 0 means no time stamp is | |||
| available. | available. | |||
| XFL (eXtra FLags) | XFL (eXtra FLags) | |||
| These flags are available for use by specific compression | These flags are available for use by specific compression | |||
| methods. The 'deflate' method (CM = 8) sets these flags as | methods. The 'deflate' method (CM = 8) sets these flags as | |||
| follows: | follows: | |||
| XFL = 2 - compressor used maximum compression, | XFL = 2 - compressor used maximum compression, | |||
| slowest algorithm | slowest algorithm | |||
| XFL = 4 - compressor used fastest algorithm | XFL = 4 - compressor used fastest algorithm | |||
| OS (Operating System) | OS (Operating System) | |||
| This identifies the type of file system on which compression | This identifies the type of file system on which compression | |||
| took place. This may be useful in determining end-of-line | took place. This may be useful in determining end-of-line | |||
| convention for text files. The currently defined values are | convention for text files. The currently defined values are | |||
| as follows: | as follows: | |||
| Deutsch [Page 7] | ||||
| 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32) | 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32) | |||
| 1 - Amiga | 1 - Amiga | |||
| 2 - VMS (or OpenVMS) | 2 - VMS (or OpenVMS) | |||
| 3 - Unix | 3 - Unix | |||
| 4 - VM/CMS | 4 - VM/CMS | |||
| 5 - Atari TOS | 5 - Atari TOS | |||
| 6 - HPFS filesystem (OS/2, NT) | 6 - HPFS filesystem (OS/2, NT) | |||
| 7 - Macintosh | 7 - Macintosh | |||
| 8 - Z-System | 8 - Z-System | |||
| 9 - CP/M | 9 - CP/M | |||
| 10 - TOPS-20 | 10 - TOPS-20 | |||
| 11 - NTFS filesystem (NT) | 11 - NTFS filesystem (NT) | |||
| 12 - QDOS | 12 - QDOS | |||
| 13 - Acorn RISCOS | 13 - Acorn RISCOS | |||
| 255 - unknown | 255 - unknown | |||
| XLEN (eXtra LENgth) | XLEN (eXtra LENgth) | |||
| Deutsch [Page 7] | ||||
| If FLG.FEXTRA is set, this gives the length of the optional | If FLG.FEXTRA is set, this gives the length of the optional | |||
| extra field. See below for details. | extra field. See below for details. | |||
| CRC32 (CRC-32) | CRC32 (CRC-32) | |||
| This contains a Cyclic Redundancy Check value of the | This contains a Cyclic Redundancy Check value of the | |||
| uncompressed data computed according to CRC-32 algorithm | uncompressed data computed according to CRC-32 algorithm | |||
| used in the ISO 3309 standard and in section 8.1.1.6.2 of | used in the ISO 3309 standard and in section 8.1.1.6.2 of | |||
| ITU-T recommendation V.42. (See http://www.iso.ch for | ITU-T recommendation V.42. (See http://www.iso.ch for | |||
| ordering ISO documents. See gopher://info.itu.ch for an | ordering ISO documents. See gopher://info.itu.ch for an | |||
| online version of ITU-T V.42.) | online version of ITU-T V.42.) | |||
| ISIZE (Input SIZE) | ISIZE (Input SIZE) | |||
| This contains the size of the original (uncompressed) input | This contains the size of the original (uncompressed) input | |||
| data modulo 2^32. | data modulo 2^32. | |||
| 2.3.1.1. Extra field | 2.3.1.1. Extra field | |||
| If the FLG.FEXTRA bit is set, an "extra field" is present in | If the FLG.FEXTRA bit is set, an "extra field" is present in | |||
| the header, with total length XLEN bytes. It consists of a | the header, with total length XLEN bytes. It consists of a | |||
| series of subfields, each of the form: | series of subfields, each of the form: | |||
| +---+---+---+---+==================================+ | +---+---+---+---+==================================+ | |||
| skipping to change at line 409 ¶ | skipping to change at line 401 ¶ | |||
| with some mnemonic value. Jean-loup Gailly | with some mnemonic value. Jean-loup Gailly | |||
| <gzip@prep.ai.mit.edu> is maintaining a registry of subfield | <gzip@prep.ai.mit.edu> is maintaining a registry of subfield | |||
| IDs; please send him any subfield ID you wish to use. Subfield | IDs; please send him any subfield ID you wish to use. Subfield | |||
| IDs with SI2 = 0 are reserved for future use. The following | IDs with SI2 = 0 are reserved for future use. The following | |||
| IDs are currently defined: | IDs are currently defined: | |||
| SI1 SI2 Data | SI1 SI2 Data | |||
| ---------- ---------- ---- | ---------- ---------- ---- | |||
| 0x41 ('A') 0x70 ('P') Apollo file type information | 0x41 ('A') 0x70 ('P') Apollo file type information | |||
| Deutsch [Page 8] | ||||
| LEN gives the length of the subfield data, excluding the 4 | LEN gives the length of the subfield data, excluding the 4 | |||
| initial bytes. | initial bytes. | |||
| 2.3.1.2. Compliance | 2.3.1.2. Compliance | |||
| A compliant compressor must produce files with correct ID1, | A compliant compressor must produce files with correct ID1, | |||
| ID2, CM, CRC32, and ISIZE, but may set all the other fields in | ID2, CM, CRC32, and ISIZE, but may set all the other fields in | |||
| the fixed-length part of the header to default values (255 for | the fixed-length part of the header to default values (255 for | |||
| OS, 0 for all others). The compressor must set all reserved | OS, 0 for all others). The compressor must set all reserved | |||
| bits to zero. | bits to zero. | |||
| A compliant decompressor must check ID1, ID2, and CM, and | A compliant decompressor must check ID1, ID2, and CM, and | |||
| provide an error indication if any of these have incorrect | provide an error indication if any of these have incorrect | |||
| values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC | values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC | |||
| Deutsch [Page 8] | ||||
| at least so it can skip over the optional fields if they are | at least so it can skip over the optional fields if they are | |||
| present. It need not examine any other part of the header or | present. It need not examine any other part of the header or | |||
| trailer; in particular, a decompressor may ignore FTEXT and OS | trailer; in particular, a decompressor may ignore FTEXT and OS | |||
| and always produce binary output, and still be compliant. A | and always produce binary output, and still be compliant. A | |||
| compliant decompressor must give an error indication if any | compliant decompressor must give an error indication if any | |||
| reserved bit is non-zero, since such a bit could indicate the | reserved bit is non-zero, since such a bit could indicate the | |||
| presence of a new field that would cause subsequent data to be | presence of a new field that would cause subsequent data to be | |||
| interpreted incorrectly. | interpreted incorrectly. | |||
| 3. References | 3. References | |||
| 3.1. Related standards | [1] "Information Processing - 8-bit single-byte coded graphic | |||
| character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987). | ||||
| "Information Processing - 8-bit single-byte coded graphic | The ISO 8859-1 (Latin-1) character set is a superset of 7-bit | |||
| character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987). | ASCII. Files defining this character set may be obtained from | |||
| The ISO 8859-1 (Latin-1) character set is a superset of 7-bit | ftp.uu.net:/graphics/png/documents/iso_8859-1.* | |||
| ASCII. Files defining this character set may be obtained from | ||||
| ftp.uu.net:/graphics/png/documents/iso_8859-1.* | ||||
| ISO 3309 | ||||
| ITU-T recommendation V.42 | [2] ISO 3309 | |||
| 3.2. Other related publications | [3] ITU-T recommendation V.42 | |||
| [1] Deutsch, L.P.,"'Deflate' Compressed Data Format | [4] Deutsch, L.P.,"'Deflate' Compressed Data Format Specification". | |||
| Specification". available in | available in ftp.uu.net:/pub/archiving/zip/doc/deflate-*.doc | |||
| ftp.uu.net:/pub/archiving/zip/doc/deflate-*.doc | ||||
| [2] Gailly, J.-L., gzip documentation, available in | [5] Gailly, J.-L., gzip documentation, available in | |||
| prep.ai.mit.edu:/pub/gnu/gzip-*.tar | prep.ai.mit.edu:/pub/gnu/gzip-*.tar | |||
| [3] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via | [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table | |||
| Table Look-Up", Communications of the ACM, 31(8), pp.1008-1013. | Look-Up", Communications of the ACM, 31(8), pp.1008-1013. | |||
| [4] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal, | [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal, | |||
| pp.118-133. | pp.118-133. | |||
| [5] ftp.adelaide.edu.au:/pub/rocksoft/papers/crc_v3.txt, | [8] ftp.adelaide.edu.au:/pub/rocksoft/papers/crc_v3.txt, describing | |||
| describing the CRC concept. | the CRC concept. | |||
| Deutsch [Page 9] | ||||
| 4. Security considerations | 4. Security considerations | |||
| Any data compression method involves the reduction of redundancy in | Any data compression method involves the reduction of redundancy in | |||
| the data. Consequently, any corruption of the data is likely to have | the data. Consequently, any corruption of the data is likely to have | |||
| severe effects and be difficult to correct. Uncompressed text, on | severe effects and be difficult to correct. Uncompressed text, on | |||
| the other hand, will probably still be readable despite the presence | the other hand, will probably still be readable despite the presence | |||
| of some corrupted bytes. | of some corrupted bytes. | |||
| It is recommended that systems using this data format provide some | It is recommended that systems using this data format provide some | |||
| means of validating the integrity of the compressed data, such as by | means of validating the integrity of the compressed data, such as by | |||
| setting and checking the CRC-32 check value. | setting and checking the CRC-32 check value. | |||
| Deutsch [Page 9] | ||||
| 5. Acknowledgements | 5. Acknowledgements | |||
| Trademarks cited in this document are the property of their | Trademarks cited in this document are the property of their | |||
| respective owners. | respective owners. | |||
| Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler, | Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler, | |||
| the related software described in this specification. Glenn | the related software described in this specification. Glenn | |||
| Randers-Pehrson converted this document to Internet Draft and HTML | Randers-Pehrson converted this document to Internet Draft and HTML | |||
| format. | format. | |||
| skipping to change at line 512 ¶ | skipping to change at line 498 ¶ | |||
| sent by email to | sent by email to | |||
| Jean-loup Gailly <gzip@prep.ai.mit.edu> and | Jean-loup Gailly <gzip@prep.ai.mit.edu> and | |||
| Mark Adler <madler@alumni.caltech.edu> | Mark Adler <madler@alumni.caltech.edu> | |||
| Editorial comments on this specification can be sent by email to | Editorial comments on this specification can be sent by email to | |||
| L. Peter Deutsch <ghost@aladdin.com> and | L. Peter Deutsch <ghost@aladdin.com> and | |||
| Glenn Randers-Pehrson <randeg@alumni.rpi.edu> | Glenn Randers-Pehrson <randeg@alumni.rpi.edu> | |||
| Deutsch [Page 10] | ||||
| 7. Appendix: Jean-loup Gailly's gzip utility | 7. Appendix: Jean-loup Gailly's gzip utility | |||
| The most widely used implementation of gzip compression, and the | The most widely used implementation of gzip compression, and the | |||
| original documentation on which this specification is based, were | original documentation on which this specification is based, were | |||
| created by Jean-loup Gailly <gzip@prep.ai.mit.edu>. Since this | created by Jean-loup Gailly <gzip@prep.ai.mit.edu>. Since this | |||
| implementation is a de facto standard, we mention some more of its | implementation is a de facto standard, we mention some more of its | |||
| features here. Again, the material in this section is not part of | features here. Again, the material in this section is not part of | |||
| the specification per se, and implementations need not follow it to | the specification per se, and implementations need not follow it to | |||
| be compliant. | be compliant. | |||
| When compressing or decompressing a file, gzip preserves the | When compressing or decompressing a file, gzip preserves the | |||
| protection, ownership, and modification time attributes on the local | protection, ownership, and modification time attributes on the local | |||
| file system, since there is no provision for representing protection | file system, since there is no provision for representing protection | |||
| attributes in the gzip file format itself. Since the file format | attributes in the gzip file format itself. Since the file format | |||
| includes a modification time, the gzip decompressor provides a | includes a modification time, the gzip decompressor provides a | |||
| command line switch that assigns the modification time from the file, | command line switch that assigns the modification time from the file, | |||
| rather than the local modification time of the compressed input, to | rather than the local modification time of the compressed input, to | |||
| the decompressed output. | the decompressed output. | |||
| Deutsch [Page 10] | ||||
| 8. Appendix: Sample CRC Code | 8. Appendix: Sample CRC Code | |||
| The following sample code represents a practical implementation of | The following sample code represents a practical implementation of | |||
| the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42 | the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42 | |||
| for a formal specification.) | for a formal specification.) | |||
| The sample code is in the ANSI C programming language. Non C users | The sample code is in the ANSI C programming language. Non C users | |||
| may find it easier to read with these hints: | may find it easier to read with these hints: | |||
| & Bitwise AND operator. | & Bitwise AND operator. | |||
| skipping to change at line 563 ¶ | skipping to change at line 549 ¶ | |||
| /* Flag: has the table been computed? Initially false. */ | /* Flag: has the table been computed? Initially false. */ | |||
| int crc_table_computed = 0; | int crc_table_computed = 0; | |||
| /* Make the table for a fast CRC. */ | /* Make the table for a fast CRC. */ | |||
| void make_crc_table(void) | void make_crc_table(void) | |||
| { | { | |||
| unsigned long c; | unsigned long c; | |||
| int n, k; | int n, k; | |||
| Deutsch [Page 11] | ||||
| for (n = 0; n < 256; n++) { | for (n = 0; n < 256; n++) { | |||
| c = (unsigned long) n; | c = (unsigned long) n; | |||
| for (k = 0; k < 8; k++) { | for (k = 0; k < 8; k++) { | |||
| if (c & 1) { | if (c & 1) { | |||
| c = 0xedb88320L ^ (c >> 1); | c = 0xedb88320L ^ (c >> 1); | |||
| } else { | } else { | |||
| c = c >> 1; | c = c >> 1; | |||
| } | } | |||
| } | } | |||
| crc_table[n] = c; | crc_table[n] = c; | |||
| } | } | |||
| crc_table_computed = 1; | crc_table_computed = 1; | |||
| } | } | |||
| Deutsch [Page 11] | ||||
| /* | /* | |||
| Update a running crc with the bytes buf[0..len-1] and return | Update a running crc with the bytes buf[0..len-1] and return | |||
| the updated crc. The crc should be initialized to zero. Pre- and | the updated crc. The crc should be initialized to zero. Pre- and | |||
| post-conditioning (one's complement) is performed within this | post-conditioning (one's complement) is performed within this | |||
| function so it shouldn't be done by the caller. Usage example: | function so it shouldn't be done by the caller. Usage example: | |||
| unsigned long crc = 0L; | unsigned long crc = 0L; | |||
| while (read_buffer(buffer, length) != EOF) { | while (read_buffer(buffer, length) != EOF) { | |||
| crc = update_crc(crc, buffer, length); | crc = update_crc(crc, buffer, length); | |||
| End of changes. 52 change blocks. | ||||
| 78 lines changed or deleted | 64 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||