| < draft-yergeau-rfc2279bis-04.txt | draft-yergeau-rfc2279bis-05.txt > | |||
|---|---|---|---|---|
| Network Working Group F. Yergeau | Network Working Group F. Yergeau | |||
| Internet-Draft Alis Technologies | Internet-Draft Alis Technologies | |||
| Expires: August 18, 2003 February 17, 2003 | Expires: December 8, 2003 June 9, 2003 | |||
| UTF-8, a transformation format of ISO 10646 | UTF-8, a transformation format of ISO 10646 | |||
| draft-yergeau-rfc2279bis-04 | draft-yergeau-rfc2279bis-05 | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that other | Task Force (IETF), its areas, and its working groups. Note that other | |||
| groups may also distribute working documents as Internet-Drafts. | groups may also distribute working documents as Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at http:// | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on August 18, 2003. | This Internet-Draft will expire on December 8, 2003. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2003). All Rights Reserved. | Copyright (C) The Internet Society (2003). All Rights Reserved. | |||
| Abstract | Abstract | |||
| ISO/IEC 10646-1 defines a large character set called the Universal | ISO/IEC 10646-1 defines a large character set called the Universal | |||
| Character Set (UCS) which encompasses most of the world's writing | Character Set (UCS) which encompasses most of the world's writing | |||
| systems. The originally proposed encodings of the UCS, however, were | systems. The originally proposed encodings of the UCS, however, were | |||
| skipping to change at page 2, line 16 ¶ | skipping to change at page 2, line 16 ¶ | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Notational conventions . . . . . . . . . . . . . . . . . . . . 4 | 2. Notational conventions . . . . . . . . . . . . . . . . . . . . 4 | |||
| 3. UTF-8 definition . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. UTF-8 definition . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 4. Syntax of UTF-8 Byte Sequences . . . . . . . . . . . . . . . . 6 | 4. Syntax of UTF-8 Byte Sequences . . . . . . . . . . . . . . . . 6 | |||
| 5. Versions of the standards . . . . . . . . . . . . . . . . . . 6 | 5. Versions of the standards . . . . . . . . . . . . . . . . . . 6 | |||
| 6. Byte order mark (BOM) . . . . . . . . . . . . . . . . . . . . 7 | 6. Byte order mark (BOM) . . . . . . . . . . . . . . . . . . . . 7 | |||
| 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 8. MIME registration . . . . . . . . . . . . . . . . . . . . . . 9 | 8. MIME registration . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 12. Changes from RFC 2279 . . . . . . . . . . . . . . . . . . . . 11 | 12. Changes from RFC 2279 . . . . . . . . . . . . . . . . . . . . 12 | |||
| Normative references . . . . . . . . . . . . . . . . . . . . . 12 | Normative references . . . . . . . . . . . . . . . . . . . . . 12 | |||
| Informative references . . . . . . . . . . . . . . . . . . . . 12 | Informative references . . . . . . . . . . . . . . . . . . . . 13 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 14 | Intellectual Property and Copyright Statements . . . . . . . . 15 | |||
| 1. Introduction | 1. Introduction | |||
| ISO/IEC 10646 [ISO.10646] defines a large character set called the | ISO/IEC 10646 [ISO.10646] defines a large character set called the | |||
| Universal Character Set (UCS), which encompasses most of the world's | Universal Character Set (UCS), which encompasses most of the world's | |||
| writing systems. The same set of characters is defined by the Unicode | writing systems. The same set of characters is defined by the Unicode | |||
| standard [UNICODE], which further defines additional character | standard [UNICODE], which further defines additional character | |||
| properties and other application details of great interest to | properties and other application details of great interest to | |||
| implementers. Up to the present time, changes in Unicode and | implementers. Up to the present time, changes in Unicode and | |||
| amendments and additions to ISO/IEC 10646 have tracked each other, so | amendments and additions to ISO/IEC 10646 have tracked each other, so | |||
| skipping to change at page 3, line 34 ¶ | skipping to change at page 3, line 34 ¶ | |||
| UTF-8, the object of this memo, has a one-octet encoding unit. It | UTF-8, the object of this memo, has a one-octet encoding unit. It | |||
| uses all bits of an octet, but has the quality of preserving the full | uses all bits of an octet, but has the quality of preserving the full | |||
| US-ASCII [US-ASCII] range: US-ASCII characters are encoded in one | US-ASCII [US-ASCII] range: US-ASCII characters are encoded in one | |||
| octet having the normal US-ASCII value, and any octet with such a | octet having the normal US-ASCII value, and any octet with such a | |||
| value can only stand for a US-ASCII character, and nothing else. | value can only stand for a US-ASCII character, and nothing else. | |||
| UTF-8 encodes UCS characters as a varying number of octets, where the | UTF-8 encodes UCS characters as a varying number of octets, where the | |||
| number of octets, and the value of each, depend on the integer value | number of octets, and the value of each, depend on the integer value | |||
| assigned to the character in ISO/IEC 10646 (the character number, | assigned to the character in ISO/IEC 10646 (the character number, | |||
| a.k.a. code point or Unicode scalar value). This encoding form has | a.k.a. code position, code point or Unicode scalar value). This | |||
| the following characteristics (all values are in hexadecimal): | encoding form has the following characteristics (all values are in | |||
| hexadecimal): | ||||
| o Character numbers from U+0000 to U+007F (US-ASCII repertoire) | o Character numbers from U+0000 to U+007F (US-ASCII repertoire) | |||
| correspond to octets 00 to 7F (7 bit US-ASCII values). A direct | correspond to octets 00 to 7F (7 bit US-ASCII values). A direct | |||
| consequence is that a plain ASCII string is also a valid UTF-8 | consequence is that a plain ASCII string is also a valid UTF-8 | |||
| string. | string. | |||
| o US-ASCII octet values do not appear otherwise in a UTF-8 encoded | o US-ASCII octet values do not appear otherwise in a UTF-8 encoded | |||
| character stream. This provides compatibility with file systems | character stream. This provides compatibility with file systems | |||
| or other software (e.g. the printf() function in C libraries) that | or other software (e.g. the printf() function in C libraries) that | |||
| parse based on US-ASCII values but are transparent to other | parse based on US-ASCII values but are transparent to other | |||
| values. | values. | |||
| o Round-trip conversion is easy between UTF-8 and other encoding | o Round-trip conversion is easy between UTF-8 and other encoding | |||
| forms. | forms. | |||
| o The first octet of a multi-octet sequence indicates the number of | o The first octet of a multi-octet sequence indicates the number of | |||
| octets in the sequence. | octets in the sequence. | |||
| o The octet values C0, C1, FE and FF never appear. If the range of | o The octet values C0, C1, F5 to FF never appear. | |||
| character numbers is restricted to U+0000..U+10FFFF (the UTF-16 | ||||
| accessible range), then the octet values F5..FD also never appear. | ||||
| o Character boundaries are easily found from anywhere in an octet | o Character boundaries are easily found from anywhere in an octet | |||
| stream. | stream. | |||
| o The lexicographic sorting order of UTF-8 strings is the same as if | o The byte-value lexicographic sorting order of UTF-8 strings is the | |||
| ordered by character numbers. Of course this is of limited | same as if ordered by character numbers. Of course this is of | |||
| interest since a sort order based on character numbers is not | limited interest since a sort order based on character numbers is | |||
| culturally valid. | not culturally valid. | |||
| o The Boyer-Moore fast search algorithm can be used with UTF-8 data. | o The Boyer-Moore fast search algorithm can be used with UTF-8 data. | |||
| o UTF-8 strings can be fairly reliably recognized as such by a | o UTF-8 strings can be fairly reliably recognized as such by a | |||
| simple algorithm, i.e. the probability that a string of characters | simple algorithm, i.e. the probability that a string of characters | |||
| in any other encoding appears as valid UTF-8 is low, diminishing | in any other encoding appears as valid UTF-8 is low, diminishing | |||
| with increasing string length. | with increasing string length. | |||
| UTF-8 was originally a project of the X/Open Joint | UTF-8 was originally a project of the X/Open Joint | |||
| Internationalization Group XOJIG with the objective to specify a File | Internationalization Group XOJIG with the objective to specify a File | |||
| skipping to change at page 6, line 28 ¶ | skipping to change at page 6, line 28 ¶ | |||
| Implementations of the decoding algorithm above MUST protect against | Implementations of the decoding algorithm above MUST protect against | |||
| decoding invalid sequences. For instance, a naive implementation may | decoding invalid sequences. For instance, a naive implementation may | |||
| decode the overlong UTF-8 sequence C0 80 into the character U+0000, | decode the overlong UTF-8 sequence C0 80 into the character U+0000, | |||
| or the surrogate pair ED A1 8C ED BE B4 into U+233B4. Decoding | or the surrogate pair ED A1 8C ED BE B4 into U+233B4. Decoding | |||
| invalid sequences may have security consequences or cause other | invalid sequences may have security consequences or cause other | |||
| problems. See Security Considerations (Section 10) below. | problems. See Security Considerations (Section 10) below. | |||
| 4. Syntax of UTF-8 Byte Sequences | 4. Syntax of UTF-8 Byte Sequences | |||
| For the convenience of implementors using ABNF, a definition of UTF-8 | ||||
| in ABNF syntax is given here. | ||||
| A UTF-8 string is a sequence of octets representing a sequence of UCS | A UTF-8 string is a sequence of octets representing a sequence of UCS | |||
| characters. An octet sequence is valid UTF-8 only if it matches the | characters. An octet sequence is valid UTF-8 only if it matches the | |||
| following syntax, which is derived from the rules for encoding UTF-8 | following syntax, which is derived from the rules for encoding UTF-8 | |||
| and is expressed in the ABNF of [RFC2234]. | and is expressed in the ABNF of [RFC2234]. | |||
| UTF8-octets = *( UTF8-char ) | UTF8-octets = *( UTF8-char ) | |||
| UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 | UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 | |||
| UTF8-1 = %x00-7F | UTF8-1 = %x00-7F | |||
| UTF8-2 = %xC2-DF UTF8-tail | UTF8-2 = %xC2-DF UTF8-tail | |||
| UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / | UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / | |||
| %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) | %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) | |||
| UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / | UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / | |||
| %xF4 %x80-8F 2( UTF8-tail ) | %xF4 %x80-8F 2( UTF8-tail ) | |||
| UTF8-tail = %x80-BF | UTF8-tail = %x80-BF | |||
| 5. Versions of the standards | NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This | |||
| grammar is believed to describe the same thing as what Unicode | ||||
| describes, but does not claim to be authoritative. Implementors are | ||||
| urged to rely on the authoritative source, rather than on this ABNF. | ||||
| 5. Versions of the standards | ||||
| ISO/IEC 10646 is updated from time to time by publication of | ISO/IEC 10646 is updated from time to time by publication of | |||
| amendments and additional parts; similarly, new versions of the | amendments and additional parts; similarly, new versions of the | |||
| Unicode standard are published over time. Each new version obsoletes | Unicode standard are published over time. Each new version obsoletes | |||
| and replaces the previous one, but implementations, and more | and replaces the previous one, but implementations, and more | |||
| significantly data, are not updated instantly. | significantly data, are not updated instantly. | |||
| In general, the changes amount to adding new characters, which does | In general, the changes amount to adding new characters, which does | |||
| not pose particular problems with old data. In 1996, Amendment 5 to | not pose particular problems with old data. In 1996, Amendment 5 to | |||
| the 1993 edition of ISO/IEC 10646 and Unicode 2.0 moved and expanded | the 1993 edition of ISO/IEC 10646 and Unicode 2.0 moved and expanded | |||
| the Korean Hangul block, thereby making any previous data containing | the Korean Hangul block, thereby making any previous data containing | |||
| skipping to change at page 11, line 22 ¶ | skipping to change at page 11, line 32 ¶ | |||
| been used in a widespread virus attacking Web servers in 2001; the | been used in a widespread virus attacking Web servers in 2001; the | |||
| security threat is thus very real. | security threat is thus very real. | |||
| Another security issue occurs when encoding to UTF-8: the ISO/IEC | Another security issue occurs when encoding to UTF-8: the ISO/IEC | |||
| 10646 description of UTF-8 allows encoding character numbers up to | 10646 description of UTF-8 allows encoding character numbers up to | |||
| U+7FFFFFFF, yielding sequences of up to 6 bytes. There is therefore | U+7FFFFFFF, yielding sequences of up to 6 bytes. There is therefore | |||
| a risk of buffer overflow if the range of character numbers is not | a risk of buffer overflow if the range of character numbers is not | |||
| explicitly limited to U+10FFFF or if buffer sizing doesn't take into | explicitly limited to U+10FFFF or if buffer sizing doesn't take into | |||
| account the possibility of 5- and 6-byte sequences. | account the possibility of 5- and 6-byte sequences. | |||
| Security may also be impacted by a characteristic of several | ||||
| character encodings, including UTF-8: the "same thing" (as far as a | ||||
| user can tell) can be represented by several distinct character | ||||
| sequences. For instance, an e with acute accent can be represented by | ||||
| the precomposed U+00E9 E ACUTE character or by the canonically | ||||
| equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE). Even though | ||||
| UTF-8 provides a single byte sequence for each character sequence, | ||||
| the existence of multiple character sequences for "the same thing" | ||||
| may have security consequences whenever string matching, indexing, | ||||
| searching, sorting, regular expression matching and selection are | ||||
| involved. An example would be string matching of an identifier | ||||
| appearing in a credential and in access control list entries. This | ||||
| issue is amenable to solutions based on Unicode Normalization Forms, | ||||
| see [UAX15]. | ||||
| 11. Acknowledgements | 11. Acknowledgements | |||
| The following have participated in the drafting and discussion of | The following have participated in the drafting and discussion of | |||
| this memo: James E. Agenbroad, Harald Alvestrand, Andries Brouwer, | this memo: James E. Agenbroad, Harald Alvestrand, Andries Brouwer, | |||
| Mark Davis, Martin J. Duerst, Patrick Faltstrom, Ned Freed, David | Mark Davis, Martin J. Duerst, Patrick Faltstrom, Ned Freed, David | |||
| Goldsmith, Tony Hansen, Edwin F. Hart, Paul Hoffman, David Hopwood, | Goldsmith, Tony Hansen, Edwin F. Hart, Paul Hoffman, David Hopwood, | |||
| Simon Josefsson, Kent Karlsson, Dan Kohn, Markus Kuhn, Michael Kung, | Simon Josefsson, Kent Karlsson, Dan Kohn, Markus Kuhn, Michael Kung, | |||
| Alain LaBonte, Ira McDonald, Alexey Melnikov, MURATA Makoto, John | Alain LaBonte, Ira McDonald, Alexey Melnikov, MURATA Makoto, John | |||
| Gardiner Myers, Dan Oscarsson, Roozbeh Pournader, Murray Sargent, | Gardiner Myers, Chris Newman, Dan Oscarsson, Roozbeh Pournader, | |||
| Markus Scherer, Keld Simonsen, Arnold Winkler, Kenneth Whistler and | Murray Sargent, Markus Scherer, Keld Simonsen, Arnold Winkler, | |||
| Misha Wolf. | Kenneth Whistler and Misha Wolf. | |||
| 12. Changes from RFC 2279 | 12. Changes from RFC 2279 | |||
| o Restricted the range of characters to 0000-10FFFF (the UTF-16 | o Restricted the range of characters to 0000-10FFFF (the UTF-16 | |||
| accessible range). | accessible range). | |||
| o Made Unicode the source of the normative definition of UTF-8, | o Made Unicode the source of the normative definition of UTF-8, | |||
| keeping ISO/IEC 10646 as the reference for characters. | keeping ISO/IEC 10646 as the reference for characters. | |||
| o Straightened out terminology. UTF-8 now described in terms of an | o Straightened out terminology. UTF-8 now described in terms of an | |||
| skipping to change at page 12, line 9 ¶ | skipping to change at page 12, line 32 ¶ | |||
| o Turned the note warning against decoding of invalid sequences into | o Turned the note warning against decoding of invalid sequences into | |||
| a normative MUST NOT. | a normative MUST NOT. | |||
| o Added a new section about the UTF-8 BOM, with advice for | o Added a new section about the UTF-8 BOM, with advice for | |||
| protocols. | protocols. | |||
| o Removed suggested UNICODE-1-1-UTF-8 MIME charset registration. | o Removed suggested UNICODE-1-1-UTF-8 MIME charset registration. | |||
| o Added an ABNF syntax for valid UTF-8 octet sequences | o Added an ABNF syntax for valid UTF-8 octet sequences | |||
| o Expanded Security Considerations section, in particular impact of | ||||
| Unicode normalization | ||||
| Normative references | Normative references | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | ||||
| Specifications: ABNF", RFC 2234, November 1997. | ||||
| [ISO.10646] | [ISO.10646] | |||
| International Organization for Standardization, | International Organization for Standardization, | |||
| "Information Technology - Universal Multiple-octet coded | "Information Technology - Universal Multiple-octet coded | |||
| Character Set (UCS)", ISO/IEC Standard 10646, comprised | Character Set (UCS)", ISO/IEC Standard 10646, comprised | |||
| of ISO/IEC 10646-1:2000, "Information technology -- | of ISO/IEC 10646-1:2000, "Information technology -- | |||
| Universal Multiple-Octet Coded Character Set (UCS) -- Part | Universal Multiple-Octet Coded Character Set (UCS) -- Part | |||
| 1: Architecture and Basic Multilingual Plane", ISO/IEC | 1: Architecture and Basic Multilingual Plane", ISO/IEC | |||
| 10646-2:2001, "Information technology -- Universal | 10646-2:2001, "Information technology -- Universal | |||
| Multiple-Octet Coded Character Set (UCS) -- Part 2: | Multiple-Octet Coded Character Set (UCS) -- Part 2: | |||
| Supplementary Planes" and ISO/IEC 10646-1:2000/Amd 1:2002, | Supplementary Planes" and ISO/IEC 10646-1:2000/Amd 1:2002, | |||
| "Mathematical symbols and other characters". | "Mathematical symbols and other characters". | |||
| [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version | [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version | |||
| 3.2", defined by The Unicode Standard, Version 3.0 | 4.0", defined by The Unicode Standard, Version 4.0 | |||
| (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), | (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), | |||
| as amended by the Unicode Standard Annex #27: Unicode 3.1 | April 2003, <http://www.unicode.org/unicode/standard/ | |||
| (see http://www.unicode.org/reports/tr27) and by the | versions/enumeratedversions.html#Unicode_4_0_0>. | |||
| Unicode Standard Annex #28: Unicode 3.2 (see | ||||
| http://www.unicode.org/reports/tr28), March 2002, | ||||
| <http://www.unicode.org/unicode/standard/versions/ | ||||
| enumeratedversions.html#Unicode_3_2_0>. | ||||
| Informative references | Informative references | |||
| [CESU-8] Phipps, T., "Compatibility Encoding Scheme for UTF-16: | [CESU-8] Phipps, T., "Unicode Technical Report #26: Compatibility | |||
| 8-Bit (CESU-8)", UTR 26, April 2002, | Encoding Scheme for UTF-16: 8-Bit (CESU-8)", UTR 26, April | |||
| <http://www.unicode.org/unicode/reports/tr26/>. | 2002, <http://www.unicode.org/unicode/reports/tr26/>. | |||
| [FSS_UTF] X/Open Company Ltd., "X/Open CAE Specification C501 -- | [FSS_UTF] X/Open Company Ltd., "X/Open CAE Specification C501 -- | |||
| File System Safe UCS Transformation Format (FSS_UTF)", | File System Safe UCS Transformation Format (FSS_UTF)", | |||
| ISBN 1-85912-082-2, April 1995. | ISBN 1-85912-082-2, April 1995. | |||
| [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | |||
| Extensions (MIME) Part One: Format of Internet Message | Extensions (MIME) Part One: Format of Internet Message | |||
| Bodies", RFC 2045, November 1996. | Bodies", RFC 2045, November 1996. | |||
| [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | ||||
| Specifications: ABNF", RFC 2234, November 1997. | ||||
| [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration | [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration | |||
| Procedures", BCP 19, RFC 2978, October 2000. | Procedures", BCP 19, RFC 2978, October 2000. | |||
| [UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15: | ||||
| Unicode Normalization Forms", An integral part of The | ||||
| Unicode Standard, Version 4.0.0, April 2003, <http:// | ||||
| www.unicode.org/unicode/reports/tr15>. | ||||
| [US-ASCII] | [US-ASCII] | |||
| American National Standards Institute, "Coded Character | American National Standards Institute, "Coded Character | |||
| Set - 7-bit American Standard Code for Information | Set - 7-bit American Standard Code for Information | |||
| Interchange", ANSI X3.4, 1986. | Interchange", ANSI X3.4, 1986. | |||
| URIs | URIs | |||
| [1] <http://www.unicode.org/unicode/standard/policies.html> | [1] <http://www.unicode.org/unicode/standard/policies.html> | |||
| Author's Address | Author's Address | |||
| End of changes. 21 change blocks. | ||||
| 37 lines changed or deleted | 62 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||