| < draft-yergeau-rfc2279bis-03.txt | draft-yergeau-rfc2279bis-04.txt > | |||
|---|---|---|---|---|
| Network Working Group F. Yergeau | Network Working Group F. Yergeau | |||
| Internet-Draft Alis Technologies | Internet-Draft Alis Technologies | |||
| Expires: August 7, 2003 February 6, 2003 | Expires: August 18, 2003 February 17, 2003 | |||
| UTF-8, a transformation format of ISO 10646 | UTF-8, a transformation format of ISO 10646 | |||
| draft-yergeau-rfc2279bis-03 | draft-yergeau-rfc2279bis-04 | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that other | |||
| other groups may also distribute working documents as Internet- | groups may also distribute working documents as Internet-Drafts. | |||
| Drafts. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at http:// | The list of current Internet-Drafts can be accessed at | |||
| www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on August 7, 2003. | This Internet-Draft will expire on August 18, 2003. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2003). All Rights Reserved. | Copyright (C) The Internet Society (2003). All Rights Reserved. | |||
| Abstract | Abstract | |||
| ISO/IEC 10646-1 defines a large character set called the Universal | ISO/IEC 10646-1 defines a large character set called the Universal | |||
| Character Set (UCS) which encompasses most of the world's writing | Character Set (UCS) which encompasses most of the world's writing | |||
| systems. The originally proposed encodings of the UCS, however, were | systems. The originally proposed encodings of the UCS, however, were | |||
| not compatible with many current applications and protocols, and this | not compatible with many current applications and protocols, and this | |||
| has led to the development of UTF-8, the object of this memo. UTF-8 | has led to the development of UTF-8, the object of this memo. UTF-8 | |||
| has the characteristic of preserving the full US-ASCII range, | has the characteristic of preserving the full US-ASCII range, | |||
| providing compatibility with file systems, parsers and other software | providing compatibility with file systems, parsers and other software | |||
| that rely on US-ASCII values but are transparent to other values. | that rely on US-ASCII values but are transparent to other values. | |||
| This memo obsoletes and replaces RFC 2279. | This memo obsoletes and replaces RFC 2279. | |||
| Discussion of this draft should take place on the ietf- | ||||
| charsets@iana.org mailing list. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Notational conventions . . . . . . . . . . . . . . . . . . . . 5 | 2. Notational conventions . . . . . . . . . . . . . . . . . . . . 4 | |||
| 3. UTF-8 definition . . . . . . . . . . . . . . . . . . . . . . . 6 | 3. UTF-8 definition . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 4. Syntax of UTF-8 Byte Sequences . . . . . . . . . . . . . . . . 8 | 4. Syntax of UTF-8 Byte Sequences . . . . . . . . . . . . . . . . 6 | |||
| 5. Versions of the standards . . . . . . . . . . . . . . . . . . 9 | 5. Versions of the standards . . . . . . . . . . . . . . . . . . 6 | |||
| 6. Byte order mark (BOM) . . . . . . . . . . . . . . . . . . . . 10 | 6. Byte order mark (BOM) . . . . . . . . . . . . . . . . . . . . 7 | |||
| 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 | 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 8. MIME registration . . . . . . . . . . . . . . . . . . . . . . 13 | 8. MIME registration . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 12. Changes from RFC 2279 . . . . . . . . . . . . . . . . . . . . 17 | 12. Changes from RFC 2279 . . . . . . . . . . . . . . . . . . . . 11 | |||
| Normative references . . . . . . . . . . . . . . . . . . . . . 19 | Normative references . . . . . . . . . . . . . . . . . . . . . 12 | |||
| Informative references . . . . . . . . . . . . . . . . . . . . 20 | Informative references . . . . . . . . . . . . . . . . . . . . 12 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . 21 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| Full Copyright Statement . . . . . . . . . . . . . . . . . . . 22 | Intellectual Property and Copyright Statements . . . . . . . . 14 | |||
| 1. Introduction | 1. Introduction | |||
| ISO/IEC 10646 [ISO.10646] defines a large character set called the | ISO/IEC 10646 [ISO.10646] defines a large character set called the | |||
| Universal Character Set (UCS), which encompasses most of the world's | Universal Character Set (UCS), which encompasses most of the world's | |||
| writing systems. The same set of characters is defined by the | writing systems. The same set of characters is defined by the Unicode | |||
| Unicode standard [UNICODE], which further defines additional | standard [UNICODE], which further defines additional character | |||
| character properties and other application details of great interest | properties and other application details of great interest to | |||
| to implementers. Up to the present time, changes in Unicode and | implementers. Up to the present time, changes in Unicode and | |||
| amendments and additions to ISO/IEC 10646 have tracked each other, so | amendments and additions to ISO/IEC 10646 have tracked each other, so | |||
| that the character repertoires and code point assignments have | that the character repertoires and code point assignments have | |||
| remained in sync. The relevant standardization committees have | remained in sync. The relevant standardization committees have | |||
| committed to maintain this very useful synchronism. | committed to maintain this very useful synchronism. | |||
| ISO/IEC 10646 and Unicode define several encoding forms of their | ISO/IEC 10646 and Unicode define several encoding forms of their | |||
| common repertoire: UTF-8, UCS-2, UTF-16, UCS-4 and UTF-32. In an | common repertoire: UTF-8, UCS-2, UTF-16, UCS-4 and UTF-32. In an | |||
| encoding form, each character is represented as one or more encoding | encoding form, each character is represented as one or more encoding | |||
| units. All standard UCS encoding forms except UTF-8 have an encoding | units. All standard UCS encoding forms except UTF-8 have an encoding | |||
| unit larger than one octet, making them hard to use in many current | unit larger than one octet, making them hard to use in many current | |||
| applications and protocols that assume 8 or even 7 bit characters. | applications and protocols that assume 8 or even 7 bit characters. | |||
| UTF-8, the object of this memo, has a one-octet encoding unit. It | UTF-8, the object of this memo, has a one-octet encoding unit. It | |||
| uses all bits of an octet, but has the quality of preserving the full | uses all bits of an octet, but has the quality of preserving the full | |||
| US-ASCII [US-ASCII] range: US-ASCII characters are encoded in one | US-ASCII [US-ASCII] range: US-ASCII characters are encoded in one | |||
| octet having the normal US-ASCII value, and any octet with such a | octet having the normal US-ASCII value, and any octet with such a | |||
| value can only stand for a US-ASCII character, and nothing else. | value can only stand for a US-ASCII character, and nothing else. | |||
| UTF-8 encodes UCS characters as a varying number of octets, where the | UTF-8 encodes UCS characters as a varying number of octets, where the | |||
| number of octets, and the value of each, depend on the integer value | number of octets, and the value of each, depend on the integer value | |||
| assigned to the character in ISO/IEC 10646 (the character number, | assigned to the character in ISO/IEC 10646 (the character number, | |||
| a.k.a. code point or Unicode scalar value). This encoding form has | a.k.a. code point or Unicode scalar value). This encoding form has | |||
| the following characteristics (all values are in hexadecimal): | the following characteristics (all values are in hexadecimal): | |||
| o Character numbers from U+0000 to U+007F (US-ASCII repertoire) | o Character numbers from U+0000 to U+007F (US-ASCII repertoire) | |||
| correspond to octets 00 to 7F (7 bit US-ASCII values). A direct | correspond to octets 00 to 7F (7 bit US-ASCII values). A direct | |||
| consequence is that a plain ASCII string is also a valid UTF-8 | consequence is that a plain ASCII string is also a valid UTF-8 | |||
| string. | string. | |||
| o US-ASCII octet values do not appear otherwise in a UTF-8 encoded | o US-ASCII octet values do not appear otherwise in a UTF-8 encoded | |||
| character stream. This provides compatibility with file systems | character stream. This provides compatibility with file systems | |||
| or other software (e.g. the printf() function in C libraries) | or other software (e.g. the printf() function in C libraries) that | |||
| that parse based on US-ASCII values but are transparent to other | parse based on US-ASCII values but are transparent to other | |||
| values. | values. | |||
| o Round-trip conversion is easy between UTF-8 and other encoding | o Round-trip conversion is easy between UTF-8 and other encoding | |||
| forms. | forms. | |||
| o The first octet of a multi-octet sequence indicates the number of | o The first octet of a multi-octet sequence indicates the number of | |||
| octets in the sequence. | octets in the sequence. | |||
| o The octet values C0, C1, FE and FF never appear. If the range of | o The octet values C0, C1, FE and FF never appear. If the range of | |||
| character numbers is restricted to U+0000..U+10FFFF (the UTF-16 | character numbers is restricted to U+0000..U+10FFFF (the UTF-16 | |||
| accessible range), then the octet values F5..FD also never appear. | accessible range), then the octet values F5..FD also never appear. | |||
| o Character boundaries are easily found from anywhere in an octet | o Character boundaries are easily found from anywhere in an octet | |||
| stream. | stream. | |||
| o The lexicographic sorting order of UTF-8 strings is the same as if | o The lexicographic sorting order of UTF-8 strings is the same as if | |||
| ordered by character numbers. Of course this is of limited | ordered by character numbers. Of course this is of limited | |||
| interest since a sort order based on character numbers is not | interest since a sort order based on character numbers is not | |||
| culturally valid. | culturally valid. | |||
| o The Boyer-Moore fast search algorithm can be used with UTF-8 data. | o The Boyer-Moore fast search algorithm can be used with UTF-8 data. | |||
| o UTF-8 strings can be fairly reliably recognized as such by a | o UTF-8 strings can be fairly reliably recognized as such by a | |||
| simple algorithm, i.e. the probability that a string of | simple algorithm, i.e. the probability that a string of characters | |||
| characters in any other encoding appears as valid UTF-8 is low, | in any other encoding appears as valid UTF-8 is low, diminishing | |||
| diminishing with increasing string length. | with increasing string length. | |||
| UTF-8 was originally a project of the X/Open Joint | UTF-8 was originally a project of the X/Open Joint | |||
| Internationalization Group XOJIG with the objective to specify a File | Internationalization Group XOJIG with the objective to specify a File | |||
| System Safe UCS Transformation Format [FSS_UTF] that is compatible | System Safe UCS Transformation Format [FSS_UTF] that is compatible | |||
| with UNIX systems, supporting multilingual text in a single encoding. | with UNIX systems, supporting multilingual text in a single encoding. | |||
| The original authors were Gary Miller, Greger Leijonhufvud and John | The original authors were Gary Miller, Greger Leijonhufvud and John | |||
| Entenmann. Later, Ken Thompson and Rob Pike did significant work for | Entenmann. Later, Ken Thompson and Rob Pike did significant work for | |||
| the formal definition of UTF-8. | the formal definition of UTF-8. | |||
| 2. Notational conventions | 2. Notational conventions | |||
| skipping to change at page 6, line 7 ¶ | skipping to change at page 4, line 44 ¶ | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| UCS characters are designated by the U+HHHH notation, where HHHH is a | UCS characters are designated by the U+HHHH notation, where HHHH is a | |||
| string of from 4 to 6 hexadecimal digits representing the character | string of from 4 to 6 hexadecimal digits representing the character | |||
| number in ISO/IEC 10646. | number in ISO/IEC 10646. | |||
| 3. UTF-8 definition | 3. UTF-8 definition | |||
| UTF-8 is defined by the Unicode Standard [UNICODE]. Descriptions and | UTF-8 is defined by the Unicode Standard [UNICODE]. Descriptions and | |||
| formulae can also be found in Annex D of ISO/IEC 10646-1 [ISO.10646] | formulae can also be found in Annex D of ISO/IEC 10646-1 [ISO.10646] | |||
| In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 | In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 | |||
| accessible range) are encoded using sequences of 1 to 4 octets. The | accessible range) are encoded using sequences of 1 to 4 octets. The | |||
| only octet of a "sequence" of one has the higher-order bit set to 0, | only octet of a "sequence" of one has the higher-order bit set to 0, | |||
| the remaining 7 bits being used to encode the character number. In a | the remaining 7 bits being used to encode the character number. In a | |||
| sequence of n octets, n>1, the initial octet has the n higher-order | sequence of n octets, n>1, the initial octet has the n higher-order | |||
| bits set to 1, followed by a bit set to 0. The remaining bit(s) of | bits set to 1, followed by a bit set to 0. The remaining bit(s) of | |||
| that octet contain bits from the number of the character to be | that octet contain bits from the number of the character to be | |||
| encoded. The following octet(s) all have the higher-order bit set to | encoded. The following octet(s) all have the higher-order bit set to | |||
| 1 and the following bit set to 0, leaving 6 bits in each to contain | 1 and the following bit set to 0, leaving 6 bits in each to contain | |||
| bits from the character to be encoded. | bits from the character to be encoded. | |||
| The table below summarizes the format of these different octet types. | The table below summarizes the format of these different octet types. | |||
| The letter x indicates bits available for encoding bits of the | The letter x indicates bits available for encoding bits of the | |||
| character number. | character number. | |||
| skipping to change at page 6, line 37 ¶ | skipping to change at page 5, line 25 ¶ | |||
| --------------------+--------------------------------------------- | --------------------+--------------------------------------------- | |||
| 0000 0000-0000 007F | 0xxxxxxx | 0000 0000-0000 007F | 0xxxxxxx | |||
| 0000 0080-0000 07FF | 110xxxxx 10xxxxxx | 0000 0080-0000 07FF | 110xxxxx 10xxxxxx | |||
| 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx | 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx | |||
| 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx | 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx | |||
| Encoding a character to UTF-8 proceeds as follows: | Encoding a character to UTF-8 proceeds as follows: | |||
| 1. Determine the number of octets required from the character number | 1. Determine the number of octets required from the character number | |||
| and the first column of the table above. It is important to note | and the first column of the table above. It is important to note | |||
| that the rows of the table are mutually exclusive, i.e. there is | that the rows of the table are mutually exclusive, i.e. there is | |||
| only one valid way to encode a given character. | only one valid way to encode a given character. | |||
| 2. Prepare the high-order bits of the octets as per the second | 2. Prepare the high-order bits of the octets as per the second | |||
| column of the table. | column of the table. | |||
| 3. Fill in the bits marked x from the bits of the character number, | 3. Fill in the bits marked x from the bits of the character number, | |||
| expressed in binary. Start by putting the lowest-order bit of | expressed in binary. Start by putting the lowest-order bit of the | |||
| the character number in the lowest-order position of the last | character number in the lowest-order position of the last octet | |||
| octet of the sequence, then put the next higher-order bit of the | of the sequence, then put the next higher-order bit of the | |||
| character number in the next higher-order position of that octet, | character number in the next higher-order position of that octet, | |||
| etc. When the x bits of the last octet are filled in, move on to | etc. When the x bits of the last octet are filled in, move on to | |||
| the next to last octet, then to the preceding one, etc. until | the next to last octet, then to the preceding one, etc. until all | |||
| all x bits are filled in. | x bits are filled in. | |||
| The definition of UTF-8 prohibits encoding character numbers between | The definition of UTF-8 prohibits encoding character numbers between | |||
| U+D800 and U+DFFF, which are reserved for use with the UTF-16 | U+D800 and U+DFFF, which are reserved for use with the UTF-16 | |||
| encoding form (as surrogate pairs) and do not directly represent | encoding form (as surrogate pairs) and do not directly represent | |||
| characters. When encoding in UTF-8 from UTF-16 data, it is necessary | characters. When encoding in UTF-8 from UTF-16 data, it is necessary | |||
| to first decode the UTF-16 data to obtain character numbers, which | to first decode the UTF-16 data to obtain character numbers, which | |||
| are then encoded in UTF-8 as described above. This contrasts with | are then encoded in UTF-8 as described above. This contrasts with | |||
| CESU-8 [CESU-8], which is a UTF-8-like encoding that is not meant for | CESU-8 [CESU-8], which is a UTF-8-like encoding that is not meant for | |||
| use on the Internet. CESU-8 operates similarly to UTF-8 but encodes | use on the Internet. CESU-8 operates similarly to UTF-8 but encodes | |||
| the UTF-16 code values (16-bit quantities) instead of the character | the UTF-16 code values (16-bit quantities) instead of the character | |||
| number (code point). This leads to different results for character | number (code point). This leads to different results for character | |||
| numbers above 0xFFFF; the CESU-8 encoding of those characters is NOT | numbers above 0xFFFF; the CESU-8 encoding of those characters is NOT | |||
| valid UTF-8. | valid UTF-8. | |||
| Decoding a UTF-8 character proceeds as follows: | Decoding a UTF-8 character proceeds as follows: | |||
| 1. Initialize a binary number with all bits set to 0. Up to 21 bits | 1. Initialize a binary number with all bits set to 0. Up to 21 bits | |||
| may be needed. | may be needed. | |||
| 2. Determine which bits encode the character number from the number | 2. Determine which bits encode the character number from the number | |||
| of octets in the sequence and the second column of the table | of octets in the sequence and the second column of the table | |||
| above (the bits marked x). | above (the bits marked x). | |||
| 3. Distribute the bits from the sequence to the binary number, first | 3. Distribute the bits from the sequence to the binary number, first | |||
| the lower-order bits from the last octet of the sequence and | the lower-order bits from the last octet of the sequence and | |||
| proceeding to the left until no x bits are left. The binary | proceeding to the left until no x bits are left. The binary | |||
| number is now equal to the character number. | number is now equal to the character number. | |||
| Implementations of the decoding algorithm above MUST protect against | Implementations of the decoding algorithm above MUST protect against | |||
| decoding invalid sequences. For instance, a naive implementation may | decoding invalid sequences. For instance, a naive implementation may | |||
| decode the overlong UTF-8 sequence C0 80 into the character U+0000, | decode the overlong UTF-8 sequence C0 80 into the character U+0000, | |||
| or the surrogate pair ED A1 8C ED BE B4 into U+233B4. Decoding | or the surrogate pair ED A1 8C ED BE B4 into U+233B4. Decoding | |||
| invalid sequences may have security consequences or cause other | invalid sequences may have security consequences or cause other | |||
| problems. See Security Considerations (Section 10) below. | problems. See Security Considerations (Section 10) below. | |||
| 4. Syntax of UTF-8 Byte Sequences | 4. Syntax of UTF-8 Byte Sequences | |||
| A UTF-8 string is a sequence of octets representing a sequence of UCS | A UTF-8 string is a sequence of octets representing a sequence of UCS | |||
| characters. An octet sequence is valid UTF-8 only if it matches the | characters. An octet sequence is valid UTF-8 only if it matches the | |||
| following syntax, which is derived from the rules for encoding UTF-8 | following syntax, which is derived from the rules for encoding UTF-8 | |||
| and is expressed in the ABNF of [RFC2234]. | and is expressed in the ABNF of [RFC2234]. | |||
| UTF8-octets = *( UTF8-char ) | UTF8-octets = *( UTF8-char ) | |||
| UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 | UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 | |||
| UTF8-1 = %x00-7F | UTF8-1 = %x00-7F | |||
| UTF8-2 = %xC2-DF UTF8-tail | UTF8-2 = %xC2-DF UTF8-tail | |||
| UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / | UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / | |||
| %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) | %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) | |||
| UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / | UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / | |||
| %xF4 %x80-8F 2( UTF8-tail ) | %xF4 %x80-8F 2( UTF8-tail ) | |||
| UTF8-tail = %x80-BF | UTF8-tail = %x80-BF | |||
| 5. Versions of the standards | 5. Versions of the standards | |||
| ISO/IEC 10646 is updated from time to time by publication of | ISO/IEC 10646 is updated from time to time by publication of | |||
| amendments and additional parts; similarly, new versions of the | amendments and additional parts; similarly, new versions of the | |||
| Unicode standard are published over time. Each new version obsoletes | Unicode standard are published over time. Each new version obsoletes | |||
| and replaces the previous one, but implementations, and more | and replaces the previous one, but implementations, and more | |||
| significantly data, are not updated instantly. | significantly data, are not updated instantly. | |||
| In general, the changes amount to adding new characters, which does | In general, the changes amount to adding new characters, which does | |||
| not pose particular problems with old data. In 1996, Amendment 5 to | not pose particular problems with old data. In 1996, Amendment 5 to | |||
| the 1993 edition of ISO/IEC 10646 and Unicode 2.0 moved and expanded | the 1993 edition of ISO/IEC 10646 and Unicode 2.0 moved and expanded | |||
| the Korean Hangul block, thereby making any previous data containing | the Korean Hangul block, thereby making any previous data containing | |||
| Hangul characters invalid under the new version. Unicode 2.0 has the | Hangul characters invalid under the new version. Unicode 2.0 has the | |||
| same difference from Unicode 1.1. The justification for allowing | same difference from Unicode 1.1. The justification for allowing such | |||
| such an incompatible change was that there were no major | an incompatible change was that there were no major implementations | |||
| implementations and no significant amounts of data containing Hangul. | and no significant amounts of data containing Hangul. The incident | |||
| The incident has been dubbed the "Korean mess", and the relevant | has been dubbed the "Korean mess", and the relevant committees have | |||
| committees have pledged to never, ever again make such an | pledged to never, ever again make such an incompatible change (see | |||
| incompatible change (see Unicode Consortium Policies [1]). | Unicode Consortium Policies [1]). | |||
| New versions, and in particular any incompatible changes, have | New versions, and in particular any incompatible changes, have | |||
| consequences regarding MIME charset labels, to be discussed in MIME | consequences regarding MIME charset labels, to be discussed in MIME | |||
| registration (Section 8). | registration (Section 8). | |||
| 6. Byte order mark (BOM) | 6. Byte order mark (BOM) | |||
| The UCS character U+FEFF "ZERO WIDTH NO-BREAK SPACE" is also known | The UCS character U+FEFF "ZERO WIDTH NO-BREAK SPACE" is also known | |||
| informally as "BYTE ORDER MARK" (abbreviated "BOM"). This character | informally as "BYTE ORDER MARK" (abbreviated "BOM"). This character | |||
| can be used as a genuine "ZERO WIDTH NO-BREAK SPACE" within text, but | can be used as a genuine "ZERO WIDTH NO-BREAK SPACE" within text, but | |||
| the BOM name hints at a second possible usage of the character: to | the BOM name hints at a second possible usage of the character: to | |||
| prepend a U+FEFF character to a stream of UCS characters as a | prepend a U+FEFF character to a stream of UCS characters as a | |||
| "signature". A receiver of such a serialized stream may then use the | "signature". A receiver of such a serialized stream may then use the | |||
| initial character as a hint that the stream consists of UCS | initial character as a hint that the stream consists of UCS | |||
| characters and also to recognize which UCS encoding is involved and, | characters and also to recognize which UCS encoding is involved and, | |||
| with encodings having a multi-octet encoding unit, as a way to | with encodings having a multi-octet encoding unit, as a way to | |||
| recognize the serialization order of the octets. UTF-8 having a | recognize the serialization order of the octets. UTF-8 having a | |||
| single-octet encoding unit, this last function is useless and the BOM | single-octet encoding unit, this last function is useless and the BOM | |||
| will always appear as the octet sequence EF BB BF. | will always appear as the octet sequence EF BB BF. | |||
| It is important to understand that the character U+FEFF appearing at | It is important to understand that the character U+FEFF appearing at | |||
| any position other than the beginning of a stream MUST be interpreted | any position other than the beginning of a stream MUST be interpreted | |||
| with the semantics for the zero-width non-breaking space, and MUST | with the semantics for the zero-width non-breaking space, and MUST | |||
| NOT be interpreted as a signature. When interpreted as a signature, | NOT be interpreted as a signature. When interpreted as a signature, | |||
| the Unicode standard suggests than an initial U+FEFF character may be | the Unicode standard suggests than an initial U+FEFF character may be | |||
| stripped before processing the text. Such stripping is necessary in | stripped before processing the text. Such stripping is necessary in | |||
| some cases (e.g. when concatenating two strings, because otherwise | some cases (e.g. when concatenating two strings, because otherwise | |||
| the resulting string may contain an unintended "ZERO WIDTH NO-BREAK | the resulting string may contain an unintended "ZERO WIDTH NO-BREAK | |||
| SPACE" at the connection point), but might affect an external process | SPACE" at the connection point), but might affect an external process | |||
| at a different layer (such as a digital signature or a count of the | at a different layer (such as a digital signature or a count of the | |||
| characters) that is relying on the presence of all characters in the | characters) that is relying on the presence of all characters in the | |||
| stream. It is therefore RECOMMENDED to avoid stripping an initial | stream. It is therefore RECOMMENDED to avoid stripping an initial | |||
| U+FEFF interpreted as a signature without a good reason, to ignore it | U+FEFF interpreted as a signature without a good reason, to ignore it | |||
| instead of stripping it when appropriate (such as for display) and to | instead of stripping it when appropriate (such as for display) and to | |||
| strip it only when really necessary. | strip it only when really necessary. | |||
| U+FEFF in the first position of a stream MAY be interpreted as a | U+FEFF in the first position of a stream MAY be interpreted as a | |||
| zero-width non-breaking space, and is not always a signature. In an | zero-width non-breaking space, and is not always a signature. In an | |||
| attempt at diminishing this uncertainty, Unicode 3.2 adds a new | attempt at diminishing this uncertainty, Unicode 3.2 adds a new | |||
| character, U+2060 "WORD JOINER", with exactly the same semantics and | character, U+2060 "WORD JOINER", with exactly the same semantics and | |||
| usage as U+FEFF except for the signature function, and strongly | usage as U+FEFF except for the signature function, and strongly | |||
| recommends its exclusive use for expressing word-joining semantics. | recommends its exclusive use for expressing word-joining semantics. | |||
| Eventually, following this recommendation will make it all but | Eventually, following this recommendation will make it all but | |||
| certain that any initial U+FEFF is a signature, not an intended "ZERO | certain that any initial U+FEFF is a signature, not an intended "ZERO | |||
| WIDTH NO-BREAK SPACE". | WIDTH NO-BREAK SPACE". | |||
| In the meantime, the uncertainty unfortunately remains and may affect | In the meantime, the uncertainty unfortunately remains and may affect | |||
| Internet protocols. Protocol specifications MAY restrict usage of | Internet protocols. Protocol specifications MAY restrict usage of | |||
| U+FEFF as a signature in order to reduce or eliminate the potential | U+FEFF as a signature in order to reduce or eliminate the potential | |||
| ill effects of this uncertainty. In the interest of striking a | ill effects of this uncertainty. In the interest of striking a | |||
| balance between the advantages (reduction of uncertainty) and | balance between the advantages (reduction of uncertainty) and | |||
| drawbacks (loss of the signature function) of such restrictions, it | drawbacks (loss of the signature function) of such restrictions, it | |||
| is useful to distinguish a few cases: | is useful to distinguish a few cases: | |||
| o A protocol SHOULD forbid use of U+FEFF as a signature for those | o A protocol SHOULD forbid use of U+FEFF as a signature for those | |||
| textual protocol elements that the protocol mandates to be always | textual protocol elements that the protocol mandates to be always | |||
| UTF-8, the signature function being totally useless in those | UTF-8, the signature function being totally useless in those | |||
| cases. | cases. | |||
| o A protocol SHOULD also forbid use of U+FEFF as a signature for | o A protocol SHOULD also forbid use of U+FEFF as a signature for | |||
| skipping to change at page 11, line 34 ¶ | skipping to change at page 8, line 48 ¶ | |||
| always use the mechanisms properly. The latter two cases are | always use the mechanisms properly. The latter two cases are | |||
| likely to occur with larger protocol elements such as MIME | likely to occur with larger protocol elements such as MIME | |||
| entities, especially when implementations of the protocol will | entities, especially when implementations of the protocol will | |||
| obtain such entities from file systems, from protocols that do not | obtain such entities from file systems, from protocols that do not | |||
| have encoding identification mechanisms for payloads (such as FTP) | have encoding identification mechanisms for payloads (such as FTP) | |||
| or from other protocols that do not guarantee proper | or from other protocols that do not guarantee proper | |||
| identification of character encoding (such as HTTP). | identification of character encoding (such as HTTP). | |||
| When a protocol forbids use of U+FEFF as a signature for a certain | When a protocol forbids use of U+FEFF as a signature for a certain | |||
| protocol element, then any initial U+FEFF in that protocol element | protocol element, then any initial U+FEFF in that protocol element | |||
| MUST be interpreted as a "ZERO WIDTH NO-BREAK SPACE". When a | MUST be interpreted as a "ZERO WIDTH NO-BREAK SPACE". When a protocol | |||
| protocol does NOT forbid use of U+FEFF as a signature for a certain | does NOT forbid use of U+FEFF as a signature for a certain protocol | |||
| protocol element, then implementations SHOULD be prepared to handle a | element, then implementations SHOULD be prepared to handle a | |||
| signature in that element and react appropriately: using the | signature in that element and react appropriately: using the | |||
| signature to identify the character encoding as necessary and | signature to identify the character encoding as necessary and | |||
| stripping or ignoring the signature as appropriate. | stripping or ignoring the signature as appropriate. | |||
| 7. Examples | 7. Examples | |||
| The character sequence U+0041 U+2262 U+0391 U+002E "A<NOT IDENTICAL | The character sequence U+0041 U+2262 U+0391 U+002E "A<NOT IDENTICAL | |||
| TO><ALPHA>." is encoded in UTF-8 as follows: | TO><ALPHA>." is encoded in UTF-8 as follows: | |||
| --+--------+-----+-- | --+--------+-----+-- | |||
| skipping to change at page 13, line 22 ¶ | skipping to change at page 10, line 7 ¶ | |||
| edition (Korean block), encoded to a sequence of octets using the | edition (Korean block), encoded to a sequence of octets using the | |||
| encoding scheme outlined above. UTF-8 is suitable for use in MIME | encoding scheme outlined above. UTF-8 is suitable for use in MIME | |||
| content types under the "text" top-level type. | content types under the "text" top-level type. | |||
| It is noteworthy that the label "UTF-8" does not contain a version | It is noteworthy that the label "UTF-8" does not contain a version | |||
| identification, referring generically to ISO/IEC 10646. This is | identification, referring generically to ISO/IEC 10646. This is | |||
| intentional, the rationale being as follows: | intentional, the rationale being as follows: | |||
| A MIME charset label is designed to give just the information needed | A MIME charset label is designed to give just the information needed | |||
| to interpret a sequence of bytes received on the wire into a sequence | to interpret a sequence of bytes received on the wire into a sequence | |||
| of characters, nothing more (see [RFC2045], section 2.2). As long as | of characters, nothing more (see [RFC2045], section 2.2). As long as | |||
| a character set standard does not change incompatibly, version | a character set standard does not change incompatibly, version | |||
| numbers serve no purpose, because one gains nothing by learning from | numbers serve no purpose, because one gains nothing by learning from | |||
| the tag that newly assigned characters may be received that one | the tag that newly assigned characters may be received that one | |||
| doesn't know about. The tag itself doesn't teach anything about the | doesn't know about. The tag itself doesn't teach anything about the | |||
| new characters, which are going to be received anyway. | new characters, which are going to be received anyway. | |||
| Hence, as long as the standards evolve compatibly, the apparent | Hence, as long as the standards evolve compatibly, the apparent | |||
| advantage of having labels that identify the versions is only that, | advantage of having labels that identify the versions is only that, | |||
| apparent. But there is a disadvantage to such version-dependent | apparent. But there is a disadvantage to such version-dependent | |||
| labels: when an older application receives data accompanied by a | labels: when an older application receives data accompanied by a | |||
| skipping to change at page 15, line 9 ¶ | skipping to change at page 10, line 48 ¶ | |||
| 9. IANA Considerations | 9. IANA Considerations | |||
| The entry for UTF-8 in the IANA charset registry should be updated to | The entry for UTF-8 in the IANA charset registry should be updated to | |||
| point to this memo. | point to this memo. | |||
| 10. Security Considerations | 10. Security Considerations | |||
| Implementers of UTF-8 need to consider the security aspects of how | Implementers of UTF-8 need to consider the security aspects of how | |||
| they handle illegal UTF-8 sequences. It is conceivable that in some | they handle illegal UTF-8 sequences. It is conceivable that in some | |||
| circumstances an attacker would be able to exploit an incautious UTF- | circumstances an attacker would be able to exploit an incautious | |||
| 8 parser by sending it an octet sequence that is not permitted by the | UTF-8 parser by sending it an octet sequence that is not permitted by | |||
| UTF-8 syntax. | the UTF-8 syntax. | |||
| A particularly subtle form of this attack can be carried out against | A particularly subtle form of this attack can be carried out against | |||
| a parser which performs security-critical validity checks against the | a parser which performs security-critical validity checks against the | |||
| UTF-8 encoded form of its input, but interprets certain illegal octet | UTF-8 encoded form of its input, but interprets certain illegal octet | |||
| sequences as characters. For example, a parser might prohibit the | sequences as characters. For example, a parser might prohibit the | |||
| NUL character when encoded as the single-octet sequence 00, but | NUL character when encoded as the single-octet sequence 00, but | |||
| erroneously allow the illegal two-octet sequence C0 80 and interpret | erroneously allow the illegal two-octet sequence C0 80 and interpret | |||
| it as a NUL character. Another example might be a parser which | it as a NUL character. Another example might be a parser which | |||
| prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the | prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the | |||
| illegal octet sequence 2F C0 AE 2E 2F. This last exploit has | illegal octet sequence 2F C0 AE 2E 2F. This last exploit has actually | |||
| actually been used in a widespread virus attacking Web servers in | been used in a widespread virus attacking Web servers in 2001; the | |||
| 2001; the security threat is thus very real. | security threat is thus very real. | |||
| Another security issue occurs when encoding to UTF-8: the ISO/IEC | Another security issue occurs when encoding to UTF-8: the ISO/IEC | |||
| 10646 description of UTF-8 allows encoding character numbers up to | 10646 description of UTF-8 allows encoding character numbers up to | |||
| U+7FFFFFFF, yielding sequences of up to 6 bytes. There is therefore | U+7FFFFFFF, yielding sequences of up to 6 bytes. There is therefore | |||
| a risk of buffer overflow if the range of character numbers is not | a risk of buffer overflow if the range of character numbers is not | |||
| explicitly limited to U+10FFFF or if buffer sizing doesn't take into | explicitly limited to U+10FFFF or if buffer sizing doesn't take into | |||
| account the possibility of 5- and 6-byte sequences. | account the possibility of 5- and 6-byte sequences. | |||
| 11. Acknowledgements | 11. Acknowledgements | |||
| The following have participated in the drafting and discussion of | The following have participated in the drafting and discussion of | |||
| this memo: James E. Agenbroad, Harald Alvestrand, Andries Brouwer, | this memo: James E. Agenbroad, Harald Alvestrand, Andries Brouwer, | |||
| Mark Davis, Martin J. DÈrst, Patrick FÈñltstrȵm, Ned Freed, David | Mark Davis, Martin J. Duerst, Patrick Faltstrom, Ned Freed, David | |||
| Goldsmith, Tony Hansen, Edwin F. Hart, Paul Hoffman, David Hopwood, | Goldsmith, Tony Hansen, Edwin F. Hart, Paul Hoffman, David Hopwood, | |||
| Simon Josefsson, Kent Karlsson, Markus Kuhn, Michael Kung, Alain | Simon Josefsson, Kent Karlsson, Dan Kohn, Markus Kuhn, Michael Kung, | |||
| LaBontȨ, Ira McDonald, Alexey Melnikov, John Gardiner Myers, Dan | Alain LaBonte, Ira McDonald, Alexey Melnikov, MURATA Makoto, John | |||
| Oscarsson, Murray Sargent, Markus Scherer, Keld Simonsen, Arnold | Gardiner Myers, Dan Oscarsson, Roozbeh Pournader, Murray Sargent, | |||
| Winkler, Kenneth Whistler and Misha Wolf. | Markus Scherer, Keld Simonsen, Arnold Winkler, Kenneth Whistler and | |||
| Misha Wolf. | ||||
| 12. Changes from RFC 2279 | 12. Changes from RFC 2279 | |||
| o Restricted the range of characters to 0000-10FFFF (the UTF-16 | o Restricted the range of characters to 0000-10FFFF (the UTF-16 | |||
| accessible range). | accessible range). | |||
| o Made Unicode the source of the normative definition of UTF-8, | o Made Unicode the source of the normative definition of UTF-8, | |||
| keeping ISO/IEC 10646 as the reference for characters. | keeping ISO/IEC 10646 as the reference for characters. | |||
| o Significantly shortened Introduction. No more mention of UTF-1 or | o Straightened out terminology. UTF-8 now described in terms of an | |||
| UTF-7, of Transformation Formats. | encoding form of the character number. UCS-2 and UCS-4 almost | |||
| o Straightened out terminology. UTF-8 now described in terms of an | ||||
| encoding form of the character number. UCS-2 and UCS-4 almost | ||||
| disappeared. | disappeared. | |||
| o Turned the note warning against decoding of invalid sequences into | o Turned the note warning against decoding of invalid sequences into | |||
| a normative MUST NOT. | a normative MUST NOT. | |||
| o Added a new section about the UTF-8 BOM, with advice for | o Added a new section about the UTF-8 BOM, with advice for | |||
| protocols. | protocols. | |||
| o Updated a couple of references (10646-1:2000, Unicode 3.2, RFC | ||||
| 2978). | ||||
| o Added TOC. | ||||
| o Removed suggested UNICODE-1-1-UTF-8 MIME charset registration. | o Removed suggested UNICODE-1-1-UTF-8 MIME charset registration. | |||
| o Added new "Notational conventions" section about RFC 2119 and | ||||
| U+HHHH notation. | ||||
| o Added pointer to Unicode Consortium Policies in "Versions of the | ||||
| standards" section. | ||||
| o Added a fourth example with a non-BMP character and a BOM. | ||||
| o Added a paragraph about U+2060 WORD JOINER. | ||||
| o Enumerate more byte values impossible in UTF-8, either as a result | ||||
| of forbidding overlong sequences or of restricting to the UTF-16 | ||||
| accessible range. | ||||
| o Added "IANA Considerations" section to ask that the UTF-8 entry in | ||||
| the charset registry point to this memo. | ||||
| o Added an ABNF syntax for valid UTF-8 octet sequences | o Added an ABNF syntax for valid UTF-8 octet sequences | |||
| o Added some warning language about CESU-8 | ||||
| o Split References into Normative and Informative | ||||
| Normative references | Normative references | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | |||
| Specifications: ABNF", RFC 2234, November 1997. | Specifications: ABNF", RFC 2234, November 1997. | |||
| [ISO.10646] International Organization for Standardization, | [ISO.10646] | |||
| "Information Technology - Universal Multiple-octet coded | International Organization for Standardization, | |||
| Character Set (UCS)", ISO/IEC Standard 10646, comprised | "Information Technology - Universal Multiple-octet coded | |||
| of ISO/IEC 10646-1:2000, "Information technology -- | Character Set (UCS)", ISO/IEC Standard 10646, comprised | |||
| Universal Multiple-Octet Coded Character Set (UCS) -- | of ISO/IEC 10646-1:2000, "Information technology -- | |||
| Part 1: Architecture and Basic Multilingual Plane", ISO/ | Universal Multiple-Octet Coded Character Set (UCS) -- Part | |||
| IEC 10646-2:2001, "Information technology -- Universal | 1: Architecture and Basic Multilingual Plane", ISO/IEC | |||
| Multiple-Octet Coded Character Set (UCS) -- Part 2: | 10646-2:2001, "Information technology -- Universal | |||
| Supplementary Planes" and ISO/IEC 10646-1:2000/Amd | Multiple-Octet Coded Character Set (UCS) -- Part 2: | |||
| 1:2002, "Mathematical symbols and other characters". | Supplementary Planes" and ISO/IEC 10646-1:2000/Amd 1:2002, | |||
| "Mathematical symbols and other characters". | ||||
| [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version | [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version | |||
| 3.2", defined by The Unicode Standard, Version 3.0 | 3.2", defined by The Unicode Standard, Version 3.0 | |||
| (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), | (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), | |||
| as amended by the Unicode Standard Annex #27: Unicode | as amended by the Unicode Standard Annex #27: Unicode 3.1 | |||
| 3.1 (see http://www.unicode.org/reports/tr27) and by the | (see http://www.unicode.org/reports/tr27) and by the | |||
| Unicode Standard Annex #28: Unicode 3.2 (see http:// | Unicode Standard Annex #28: Unicode 3.2 (see | |||
| www.unicode.org/reports/tr28), March 2002, <http:// | http://www.unicode.org/reports/tr28), March 2002, | |||
| www.unicode.org/unicode/standard/versions/ | <http://www.unicode.org/unicode/standard/versions/ | |||
| enumeratedversions.html#Unicode_3_2_0>. | enumeratedversions.html#Unicode_3_2_0>. | |||
| Informative references | Informative references | |||
| [CESU-8] Phipps, T., "Compatibility Encoding Scheme for UTF-16: 8- | [CESU-8] Phipps, T., "Compatibility Encoding Scheme for UTF-16: | |||
| Bit (CESU-8)", UTR 26, April 2002, <http:// | 8-Bit (CESU-8)", UTR 26, April 2002, | |||
| www.unicode.org/unicode/reports/tr26/>. | <http://www.unicode.org/unicode/reports/tr26/>. | |||
| [FSS_UTF] X/Open Company Ltd., "X/Open CAE Specification C501 -- | [FSS_UTF] X/Open Company Ltd., "X/Open CAE Specification C501 -- | |||
| File System Safe UCS Transformation Format (FSS_UTF)", | File System Safe UCS Transformation Format (FSS_UTF)", | |||
| ISBN 1-85912-082-2, April 1995. | ISBN 1-85912-082-2, April 1995. | |||
| [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | |||
| Extensions (MIME) Part One: Format of Internet Message | Extensions (MIME) Part One: Format of Internet Message | |||
| Bodies", RFC 2045, November 1996. | Bodies", RFC 2045, November 1996. | |||
| [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration | [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration | |||
| Procedures", BCP 19, RFC 2978, October 2000. | Procedures", BCP 19, RFC 2978, October 2000. | |||
| [US-ASCII] American National Standards Institute, "Coded Character | [US-ASCII] | |||
| Set - 7-bit American Standard Code for Information | American National Standards Institute, "Coded Character | |||
| Interchange", ANSI X3.4, 1986. | Set - 7-bit American Standard Code for Information | |||
| Interchange", ANSI X3.4, 1986. | ||||
| URIs | URIs | |||
| [1] <http://www.unicode.org/unicode/standard/policies.html> | [1] <http://www.unicode.org/unicode/standard/policies.html> | |||
| Author's Address | Author's Address | |||
| FranȺois Yergeau | Francois Yergeau | |||
| Alis Technologies | Alis Technologies | |||
| 100, boul. Alexis-Nihon, bureau 600 | 100, boul. Alexis-Nihon, bureau 600 | |||
| MontrȨal, QC H4M 2P2 | Montreal, QC H4M 2P2 | |||
| Canada | Canada | |||
| Phone: +1 514 747 2547 | Phone: +1 514 747 2547 | |||
| Fax: +1 514 747 2561 | Fax: +1 514 747 2561 | |||
| EMail: fyergeau@alis.com | EMail: fyergeau@alis.com | |||
| Intellectual Property Statement | ||||
| The IETF takes no position regarding the validity or scope of any | ||||
| intellectual property or other rights that might be claimed to | ||||
| pertain to the implementation or use of the technology described in | ||||
| this document or the extent to which any license under such rights | ||||
| might or might not be available; neither does it represent that it | ||||
| has made any effort to identify any such rights. Information on the | ||||
| IETF's procedures with respect to rights in standards-track and | ||||
| standards-related documentation can be found in BCP-11. Copies of | ||||
| claims of rights made available for publication and any assurances of | ||||
| licenses to be made available, or the result of an attempt made to | ||||
| obtain a general license or permission for the use of such | ||||
| proprietary rights by implementors or users of this specification can | ||||
| be obtained from the IETF Secretariat. | ||||
| The IETF invites any interested party to bring to its attention any | ||||
| copyrights, patents or patent applications, or other proprietary | ||||
| rights which may cover technology that may be required to practice | ||||
| this standard. Please address the information to the IETF Executive | ||||
| Director. | ||||
| Full Copyright Statement | Full Copyright Statement | |||
| Copyright (C) The Internet Society (2003). All Rights Reserved. | Copyright (C) The Internet Society (2003). All Rights Reserved. | |||
| This document and translations of it may be copied and furnished to | This document and translations of it may be copied and furnished to | |||
| others, and derivative works that comment on or otherwise explain it | others, and derivative works that comment on or otherwise explain it | |||
| or assist in its implementation may be prepared, copied, published | or assist in its implementation may be prepared, copied, published | |||
| and distributed, in whole or in part, without restriction of any | and distributed, in whole or in part, without restriction of any | |||
| kind, provided that the above copyright notice and this paragraph are | kind, provided that the above copyright notice and this paragraph are | |||
| included on all such copies and derivative works. However, this | included on all such copies and derivative works. However, this | |||
| document itself may not be modified in any way, such as by removing | document itself may not be modified in any way, such as by removing | |||
| the copyright notice or references to the Internet Society or other | the copyright notice or references to the Internet Society or other | |||
| Internet organizations, except as needed for the purpose of | Internet organizations, except as needed for the purpose of | |||
| developing Internet standards in which case the procedures for | developing Internet standards in which case the procedures for | |||
| copyrights defined in the Internet Standards process must be | copyrights defined in the Internet Standards process must be | |||
| followed, or as required to translate it into languages other than | followed, or as required to translate it into languages other than | |||
| English. | English. | |||
| The limited permissions granted above are perpetual and will not be | The limited permissions granted above are perpetual and will not be | |||
| revoked by the Internet Society or its successors or assigns. | revoked by the Internet Society or its successors or assignees. | |||
| This document and the information contained herein is provided on an | This document and the information contained herein is provided on an | |||
| "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING | "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING | |||
| TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING | TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING | |||
| BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION | BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION | |||
| HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF | HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF | |||
| MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | |||
| Acknowledgement | Acknowledgement | |||
| End of changes. 67 change blocks. | ||||
| 166 lines changed or deleted | 159 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||