idnits 2.17.1 draft-zhu-apng-cc-encoding-v2-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Bad filename characters: the document name given in the document, 'draft-apng-cc-encoding-03.8', contains other characters than digits, lowercase letters and dash. ** Missing revision: the document name given in the document, 'draft-apng-cc-encoding-03.8', does not give the document revision number ~~ Missing draftname component: the document name given in the document, 'draft-apng-cc-encoding-03.8', does not seem to contain all the document name components required ('draft' prefix, document source, document name, and revision) -- see https://www.ietf.org/id-info/guidelines#naming for more information. == Mismatching filename: the document gives the document name as 'draft-apng-cc-encoding-03.8', but the file name used is 'draft-zhu-apng-cc-encoding-v2-01' ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1042 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 72 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([ISO-10646], [RFC-1036], [RFC-822]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 483: '... implementations SHOULD at least suppo...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 21 has weird spacing: '...-Drafts are ...' == Line 22 has weird spacing: '...F), its areas...' == Line 23 has weird spacing: '...ay also distr...' == Line 86 has weird spacing: '...similar to e...' == Line 107 has weird spacing: '...er sets used ...' == (9 more instances...) == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (July 1995) is 10513 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0xa1-0xfe' is mentioned on line 393, but not defined == Missing Reference: '0x40-0x7e' is mentioned on line 393, but not defined == Missing Reference: '0x81-0xa0' is mentioned on line 394, but not defined == Missing Reference: 'RFC-1502' is mentioned on line 568, but not defined == Unused Reference: 'GB-13132' is defined on line 927, but no explicit reference was found in the text == Unused Reference: 'MIME-1' is defined on line 952, but no explicit reference was found in the text == Unused Reference: 'MIME-2' is defined on line 957, but no explicit reference was found in the text == Unused Reference: 'SMTP' is defined on line 983, but no explicit reference was found in the text == Unused Reference: 'Unicode92' is defined on line 994, but no explicit reference was found in the text == Unused Reference: 'Unicode93' is defined on line 998, but no explicit reference was found in the text == Unused Reference: 'Unicode4' is defined on line 1002, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ASCII' -- Possible downref: Non-RFC (?) normative reference: ref. 'BIG5' -- Possible downref: Non-RFC (?) normative reference: ref. 'CJK' -- Possible downref: Non-RFC (?) normative reference: ref. 'CNS-5205' -- Possible downref: Non-RFC (?) normative reference: ref. 'CNS-11643' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-1988' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-2312' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-7589' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-7590' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-8565' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-12345' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-13000' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-13131' -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-13132' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-2022' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10021' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISOREG' ** Obsolete normative reference: RFC 1521 (ref. 'MIME-1') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1522 (ref. 'MIME-2') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 1036 (Obsoleted by RFC 5536, RFC 5537) ** Downref: Normative reference to an Informational RFC: RFC 1468 ** Downref: Normative reference to an Informational RFC: RFC 1557 ** Downref: Normative reference to an Experimental RFC: RFC 1641 ** Obsolete normative reference: RFC 1642 (Obsoleted by RFC 2152) ** Obsolete normative reference: RFC 1700 (Obsoleted by RFC 3232) ** Obsolete normative reference: RFC 821 (ref. 'SMTP') (Obsoleted by RFC 2821) ** Obsolete normative reference: RFC 1651 (ref. 'SMTPEXT') (Obsoleted by RFC 1869) -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode92' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode4' Summary: 24 errors (**), 1 flaw (~~), 22 warnings (==), 23 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group HF. Zhu 3 Internet Draft: Chinese Character Encoding Tsinghua U 4 Document: internet-drafts/draft-apng-cc-encoding-03.8.txt DY. Hu 5 Tsinghua U 6 ZG. Wang 7 CITS 8 TC. Kao 9 III 10 WC. Chang 11 III 12 M. Crispin 13 U Washington 15 July 1995 17 Chinese Character Encoding for Internet Messages 19 Status of this Memo 21 This document is an Internet-Draft. Internet-Drafts are working 22 documents of the Internet Engineering Task Force (IETF), its areas, 23 and its working groups. Note that other groups may also distribute 24 working documents as Internet-Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 To learn the current status of any Internet-Draft, please check the 32 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 33 Directories on ds.internic.net (US East Coast), nic.nordu.net 34 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific 35 Rim). 37 This is a draft document of APNG-CC, the Chinese Character 38 sub-working group of the I18N/L10N (Internationalization and 39 Localization) working group of APNG (Asia-Pacific Networking Group). 40 A revised version of this draft document will be submitted to the RFC 41 editor as an Informational RFC for the Internet Community. 42 Discussion and suggestions for improvement are requested, and should 43 be sent to apng-cc@apng.org or zhf@net.edu.cn (the coordinator). This 44 document will expire before February 30, 1996. Distribution of this 45 memo is unlimited. 47 Abstract 49 This memo provides methods for transporting Chinese characters 50 through, but not limited to, electronic mail [RFC-822] and network 51 news [RFC-1036] in the Internet community. 53 Introduction 55 As the use of Internet covers more and more Chinese people in the 56 world, the need has increased for the ability to send documents 57 containing Chinese characters on the Internet. The methods 58 described in this document provide means of transporting existing 59 Chinese character sets as well as leaving sufficient space for future 60 extension. 62 This document describes three groups of encodings: 63 1. ISO-2022-CN and ISO-2022-CN-EXT 64 2. CN-GB and CN-Big5 65 3. ISO/IEC 10646/Unicode 67 The first group of encodings are designed with interoperability in 68 mind and are encouraged in this document; they are 7-bit, support 69 both simplified and traditional characters using both GB and CNS/Big5, 70 and do not impose any unusual quoting requirements on ASCII characters 71 The second group of encodings describes current common domestic 72 usage. The third group of encodings refers to the universal 73 multilingual character set defined by ISO. 75 Note: ISO/IEC 10646 [ISO-10646] defines a 32bit character space 76 with the intent to encode all characters in the world. Currently, only 77 the lowest 16bit plane of ISO 10646, the Basic Multilingual Plane (BMP), 78 is defined. The BMP is code-by-code identical to Unicode [Unicode 1.1]. 80 Specification 82 1. 7bit Chinese encodings: ISO-2022-CN and ISO-2022-CN-EXT 84 1.1 Description 86 ISO-2022-CN is based upon ISO 2022 [ISO-2022], similar to earlier 87 work on ISO-2022-JP [RFC-1468] and ISO-2022-KR [RFC-1557] for Japanese 88 and Korean languages. It is 7-bit, and supports both simplified Chinese 89 characters using GB 2312-80 [GB-2312] and traditional Chinese characters 90 using the first two planes of CNS 11643 [CNS-11643], as well as ASCII 91 [ASCII] characters. 93 ISO-2022-CN-EXT is a superset of ISO-2022-CN that additionally 94 supports other GB character sets and planes of CNS 11643. 96 Since ISO-2022-CN and ISO-2022-CN-EXT are 7-bit encodings, they do 97 not require the 8-bit SMTP extensions. ISO-2022-CN supports almost 98 all the characters which appear in Big5 [BIG5] except for two duplicate 99 characters which were mistakes in defining Big5. 101 1.2 ISO-2022-CN 103 The starting code of ISO-2022-CN is ASCII. ASCII and Chinese characters 104 are distinguished by the use of designations (ESC sequences) and shift 105 functions. 107 Designations define the Chinese character sets used in the text. 108 There are three kinds of designations: SOdesignation, SS2designation 109 and SS3designation. 111 The SOdesignation is in the form ESC $ ) , where is the 112 "final character" assigned to the character set by ISO (refer to the 113 ISO registry [ISOREG] for more details). The SS2designation is in 114 the form ESC $ * , and the SS3designation is in the form ESC $ + 115 . A designation overrides any previous designation for 116 subsequent bytes in the text. 118 There are four kinds of shifts: SI, SO, SS2 and SS3. 120 The shift SI (one byte with hexadecimal value 0F) declares that 121 subsequent bytes are interpreted in ASCII. 123 The shift SO (one byte with hexadecimal value 0E) declares that 124 subsequent bytes are interpreted in the character set defined by 125 SOdesignation. 127 The shift SS2 (two bytes with hexadecimal values 1B 4E) declares 128 that the subsequent TWO bytes are interpreted in the character set 129 defined by SS2designation, after which the previous interpretation 130 (from SI or SO) is restored. 132 The shift SS3 (two bytes with hexadecimal values 1B 4F) declares 133 that the subsequent TWO bytes are interpreted in the character set 134 defined by SS3designation, after which the previous interpretation 135 (from SI or SO) is restored. 137 For example, the sequence: 138 ESC $ ) A SO c_char1 ... c_char1 ESC $ ) G c_char2 ... c_char2 SI 139 transfers mixed simplified Chinese and traditional Chinese text, in 140 which c_char1s are simplified Chinese characters from GB-2312 and 141 c_char2s are traditional Chinese characters from CNS-11643-plane 1. 143 The escape sequence, shift function and character set used in an 144 ISO-2022-CN text are as follows: 146 Character sets Shift in with 147 -------------------------------------------------------------------- 148 ASCII SI 149 GB 2312, CNS 11643-plane-1 SO 150 CNS 11643-plane-2 SS2 152 ESC $ ) A Indicates the bytes following SO are Chinese characters 153 as defined in GB 2312-80, until another SOdesignation 154 appears 156 ESC $ ) G Indicates the bytes following SO are as defined in 157 CNS 11643-plane-1, until another SOdesignation appears 159 ESC $ * H Indicates the two bytes immediately following SS2 is a 160 Chinese character as defined in CNS 11643-plane-2, until 161 another SS2designation appears 163 If there are any GB or CNS characters on a line, a designation for 164 the corresponding character set should be used so that each line has 165 its own character set information and the text can be displayed 166 correctly when scroll back in a window. Also, there must be a shift 167 to ASCII (SI) before the end of the line (i.e., before the CRLF). In 168 other words, each line starts in ASCII, and ends in ASCII. 170 The name given to this character encoding is "ISO-2022-CN". This name 171 is intended to be used as the "charset" parameter in MIME [MIME-1, 172 MIME-2] messages. 174 Content-Type: text/plain; charset=iso-2022-cn 176 The ISO-2022-CN encoding is already in 7-bit form, so it is not 177 necessary to use a Content-Transfer-Encoding header. 179 Other restrictions are given in the "Formal Syntax of ISO-2022-CN 180 and ISO-2022-CN-EXT" part at the end of this document. 182 1.3 ISO-2022-CN-EXT 184 ISO-2022-CN-EXT supports all characters in existing GB, Big5 and CNS 185 11643 character sets. 187 The escape sequence, shift function and character set used in an 188 ISO-2022-CN-EXT text are as follows: 190 Character sets Shift in with 191 -------------------------------------------------------------------- 192 ASCII SI 193 GB 2312, GB 12345, CNS 11643-plane-1, GB 2312+GB 8565 SO 194 GB 7589, GB 13131, CNS 11643-plane-2 SS2 195 GB 7590, GB 13132 or other new GBs,CNS 11643-plane-3 or SS3 196 higher planes of CNS 11643 198 Note: Currently, there are some GB sets that have not been 199 registered in ISO. Here , , , 200 and represent the final character that will be assigned 201 by ISO for those sets. 203 ESC $ ) A Indicates the bytes following SO are Chinese characters 204 as defined in GB 2312-80, until another SOdesignation 205 appears 207 ESC $ * 208 Indicates the two bytes immediately following SS2 is a 209 Chinese character as defined in GB 7589-87 [GB-7589], 210 until another SS2designation appears 212 ESC $ + 213 Indicates the two bytes immediately following SS3 is a 214 Chinese character as defined in GB 7590-87 [GB-7590], 215 until another SS3designation appears 217 ESC $ ) 218 Indicates the bytes following SO are as defined in 219 GB 12345-90 [GB-12345], until another SOdesignation 220 appears 222 ESC $ * 223 Indicates the two bytes immediately following SS2 is a 224 Chinese character as defined in GB 13131-91 [GB-13131], 225 until another SS2designation appears 227 ESC $ + 228 Indicates the two bytes immediately following SS3 is a 229 Chinese character as defined in GB 13132-91 [GB-13131], 230 until another SS3designation appears 232 ESC $ ) E Indicates the bytes following SO are as defined in GB 2312+ 233 GB 8565 [GB-8565], until another SOdesignation appears 235 ESC $ ) G Indicates the bytes following SO are as defined in 236 CNS 11643-plane-1, until another SOdesignation appears 237 ESC $ * H Indicates the two bytes immediately following SS2 is a 238 Chinese character as defined in CNS 11643-plane-2, until 239 another SS2designation appears 240 ESC $ + I Indicates the immediate two bytes following SS3 is a 241 Chinese character as defined in CNS 11643-plane-3, 242 until another SS3designation appears 243 ESC $ + J Indicates the immediate two bytes following SS3 is a 244 Chinese character as defined in CNS 11643-plane-4, 245 until another SS3designation appears 246 ESC $ + K Indicates the immediate two bytes following SS3 is a 247 Chinese character as defined in CNS 11643-plane-5, 248 until another SS3designation appears 249 ESC $ + L Indicates the immediate two bytes following SS3 is a 250 Chinese character as defined in CNS 11643-plane-6, 251 until another SS3designation appears 252 ESC $ + M Indicates the immediate two bytes following SS3 is a 253 Chinese character as defined in CNS 11643-plane-7, 254 until another SS3designation appears 256 As in ISO-2022-CN, each line should start in ASCII, and end in ASCII, 257 and should have its own designation information before any Chinese 258 characters appear. 260 The name given to this character encoding is "ISO-2022-CN-EXT". This name 261 is intended to be used as the "charset" parameter in MIME messages. 263 Content-Type: text/plain; charset=ISO-2022-CN-EXT 265 The ISO-2022-CN-EXT encoding is also in 7-bit form, so it is not 266 necessary to use a Content-Transfer-Encoding header. 268 Other restrictions are given in the "Formal Syntax of ISO-2022-CN and 269 ISO-2022-CN-EXT" part at the end of this document. 271 1.4 How to Support Big5 or other internal codesets with ISO-2022-CN 272 and ISO-2022-CN-EXT 274 Since there are many different Chinese internal coding systems [CJK], 275 such as Big5, GB internal code, CCCII (an encoding for library systems 276 in Taiwan), XGB (the codepage for Microsoft simplified Chinese Windows 277 95) etc. ISO-2022-CN and ISO-2022-CN-EXT, which are 7bit and will not 278 lose information during communication among different codesets and thus 279 increase interoperability, are ideal interchange encodings for various 280 internal Chinese codesets in international communication. 282 For instance, ISO-2022-CN and ISO-2022-CN-EXT can be used to support 283 Big5, because CNS-11643-plane 1 and 2 incorporate all Chinese characters 284 in Big5 except two duplicate characters which was a mistake when defining 285 Big5. 287 Since the code sequence of Big5 and CNS-11643 is different, it needs a 288 conversion table for converting Big5 to and from CNS-11643. The 289 conversion table is attached as an appendix in this document. 291 Public domain software (either binary or source in C) is provided in 292 many places in the Internet too: 294 1) Beijing: 296 ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/ 297 (IP address: 166.111.1.11) 299 2) Taiwan: 301 ftp://tpdns.seed.net.tw/Pub/Chinese/DOS/code-convert/chcode.zip 302 (IP address: 139.175.1.12) 304 3) US: 306 ftp://ftp.ifcss.org/pub/software/unix/convert/BeTTY-1.534.tar.gz 307 (IP address: 128.123.1.55) 309 4) Japan: 310 ftp://etlport.etl.go.jp/pub/iso-2022-cn/ 311 (IP address: 192.31.197.99) 313 2. 8bit Chinese encodings: CN-GB and CN-Big5 315 The CN-GB and CN-Big5 charset names are given below. 316 Among other things, these support current practice; specifically, 317 CN-GB reflects the current usage for simplified Chinese e-mail, 318 and CN-Big5 reflects the current usage for traditional Chinese e-mail. 320 Note: the use of 8-bit character sets requires the use of 321 either an 8-to-7 Content-Transfer-Encoding mechanism such as 322 "BASE64" or "QUOTED-PRINTABLE" if the network is not 8-bit clean, 323 or the 8-bit SMTP extensions [SMTPEXT] with the "8BIT" 324 Content-Transfer-Encoding on 8-bit clean networks. Otherwise, 325 an 8-bit message which passes through a 7-bit mailer is likely 326 to have the 8th bit truncated, resulting in an unreadable 327 message. Although "just send 8-bit data" has been common 328 practice in the past, it is incorrect according to the 329 Internet standards and causes interoperability problems. 331 2.1 CN-GB 333 E-mail using GB characters is sent in this way: 335 GB 2312-80 characters are used with ASCII characters, 336 not GB 1988-80 [GB-1988]. 338 GB 2312-80 is also 7-bit, to avoid conflicting with ASCII. If the 339 character is from GB 2312-80, the MSB (bit-8) of each byte is set to 340 1, and therefore becomes a 8-bit character. Otherwise, the byte is 341 interpreted as ASCII. This constructs a character set named "GB 342 Internal Code". 344 This method is also adopted in the .gb files in the Internet. 346 To use this character scheme with MIME, CN-GB is used as the value 347 for the charset parameter: 349 Content-Type: text/plain; charset=cn-gb 351 GB-12345 is the traditional form of GB-2312, the charset name given 352 to this set is CN-GB-12345-90. 354 There is also a kind of dependent character set that can only be used 355 with one of the above sets. For example, if GB 8565 is used, it can 356 only be used with GB 2312 or GB 12345, in this case, "+" is permitted 357 to appear in the charset name, i.e. CN-GB-2312-80+GB-8565-88. 359 Similarly as CN-GB, CN-GB-12345-90 and CN-GB-2312-80+GB-8565-88 support 360 ASCII too, the MSB of Chinese characters should be set to 1, in order to 361 be distinguished from ASCII. 363 Note: There are some supplementary character sets in GB, i.e. GB 7589-87, 364 GB 7590-87, GB 13131-91 and GB 13132-91. Normally, they won't 365 be used independently without using GB-2312 or GB-12345, so they 366 are not necessarily be registered. Characters in these standards 367 could be support with ISO-2022-CN and ISO-2022-CN-EXT. If, in the 368 future, they do needed to be used with "charset" names in some cases, 369 it is the responsibility of any interested third party (the 370 standardization organization herself or anybody else) to write the 371 necessary documents and do the IANA registration for them. It is 372 greatly encouraged that their charset names should also take the form 373 of CN-GB-- as CN-GB-12345-90. Here, is the 374 GB standard number, and is the year of edition represented 375 with the last two digits of the year. They should be coded in 8-bit 376 as CN-GB. 378 To avoid hindering interoperability, CN-GB is encouraged to be used 379 whenever possible. 381 2.2 CN-Big5 383 Big5 is a character set of traditional Chinese characters, widely 384 used in Taiwan and overseas. E-mail using Big5 characters is 385 sent in this way: 387 Big5 characters are used with ASCII characters. 389 Big5 is a two-byte coding, in which the first byte is 7-bit, and 390 the second byte is 8-bit. If the character is from Big5, the MSB 391 (bit-8) of the first byte is set to 1, and therefore becomes an 8-bit 392 character. Otherwise, the byte is interpreted as ASCII. (Big5 uses 393 the code space: [0xa1-0xfe,0x40-0x7e] and [0xa1-0xfe,0xa1-0xfe], and 394 two other user areas with the first byte in the range of [0x81-0xa0].) 396 To use this character scheme with MIME, CN-Big5 is used as the value 397 for the charset parameter: 399 Content-Type: text/plain; charset=cn-big5 401 3. Universal Multilingual Character Set: ISO/IEC-10646/Unicode 403 ISO/IEC 10646's BMP (code-to-code identical to Unicode) contains 404 large repertoire of Chinese characters (it currently includes all 405 the characters of GB 2312-80, GB 12345-90, GB 8565-89, CNS 11643's 406 plane 1 and 2, and part of some other standards) and therefore can 407 be used to transporting Chinese characters in the Internet community. 408 This document does not give any details on how to do this, as this has 409 been done elsewhere. For details of using Unicode with MIME, refer to 410 RFC 1641 [RFC-1641], RFC 1642 [RFC-1642]. For assigned names for 411 10646 sets, refer to STD 2--"Assigned Numbers", which is RFC 1700 412 [RFC-1700] currently. For more up-to-date assigned numbers, please 413 check: 415 ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets 417 A New MIME parameter -- "charset-variant" 419 Here, a new MIME parameter--"charset-variant" is defined as below: 421 This parameter is used after the MIME "charset" parameter mainly in 422 the form of -, or any extension based on this form, 423 in which is the product name and indicates its 424 version number. It is case-insensetive and optional, and any value 425 of this parameter should be registered in IANA. 427 For example: 428 Content-Type: text/plain; charset=CN-Big5; charset-variant=ETen-2.00.03-DOS 430 This may indicate Eten company's variant of Big5: ETen 2.00.03 for DOS. 432 The reason to define this parameter is that some implementation may 433 want to check the variants in order to deal with them in slightly different 434 methods to gain better operability. Although some features of certain 435 variant may bring problem of interoperability, however, variants will 436 still exist as they will go; moreover, certain variant may be so popular 437 that it becomes de facto industrial standard, therefore indicating its 438 name can improve the ability of communication implementation in handling 439 its messages. 441 Background Information 443 1. Writing systems and their encodings in Chinese-spoken nations and regions 445 The mainland provinces of China use simplified Chinese character in 446 daily life. GB is the standard electronic character set. It is the 447 main means for communications between people who share simplified 448 Chinese characters in the world. 450 Taiwan uses traditional Chinese characters in daily life. CNS-11643 451 is the formal character set for information interchange in Taiwan; 452 however, Big5, a widely-used character set of traditional Chinese 453 characters, is the de-facto industrial standard in Taiwan. 455 Hong Kong uses traditional Chinese characters in daily life, but uses 456 both GB and Big5 in electronic form, because Hong Kong people often 457 communicate with people in all of China's provinces. 459 Singapore seldom uses Chinese characters, and uses the simplified 460 form when Chinese characters are used. In electronic form, Unicode 461 is more popular, however GB is also used. 463 2. Miscellaneouses about Chinese character sets 465 The GB 1988-80 character set is identical to ISO 646 [ISO-646] except 466 for currency symbol and tilde. The currency symbol and the tilde are 467 replaced by the Yuan sign and the over line. This set is GB's variant 468 of ISO 646. This character set and CNS 5205 [CNS-5205] are not 469 encouraged for use in the Internet, since ASCII combined with GB 2312 470 or CNS 11643-plane 1 and plane 2 comprises all characters in them. 472 The GB 2312-80 character set consists of simplified Chinese 473 characters, digits, Latin, Greek and Russian alphabets, and some 474 other symbols; in all, 7445 characters. Each character is represented 475 with two bytes. 477 GB 13000-95 [GB-13000] is the GB's variant of ISO 10646. However, for 478 interoperability in the Internet, assigned names for ISO 10646 are 479 encouraged to be used. 481 3. Miscellaneous implementation information 483 For maximum interoperability, implementations SHOULD at least support 484 sending and receiving ISO-2022-CN. Supporting all registered character 485 sets in ISO-2022-CN-EXT is greatly encouraged. 487 It is also essential to be able to support CN-GB (the status quo for 488 simplified Chinese e-mail ) and CN-Big5 (the status quo for traditional 489 Chinese e-mail). But sending ISO-2022-CN message is always encouraged 490 whenever possible. 492 To the maximum extent possible, implementations should be capable of 493 receiving messages in any of the encodings introduced in this document, 494 even if they only transmit messages in one form. Preferably the 495 implementation should display the characters with glyphs appropriate 496 to the typographic tradition that is implied in the encoding of the 497 received text. Implementation may also translate these encodings 498 to the encoding that its platform supports. 500 The human user (not implementor) should try to keep lines within 80 501 display columns, or, preferably, within 75 (or so) columns, to allow 502 insertion of ">" at the beginning of each line in excerpts. Each 503 Chinese character takes up two columns, and the shift sequences do 504 not take up any columns. The implementor is reminded that Chinese 505 characters take up two bytes and should not be split in the middle to 506 break lines for displaying, etc. 508 Freely available fonts of Chinese characters: 510 Beijing: 511 ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/fonts/ 512 Taiwan: 513 ftp://ftp.edu.tw/Chinese/ifcss/software/fonts/ 514 ftp://ftp.ntu.edu.tw/Chinese/ifcss/software/fonts/ 515 HongKong: 516 ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/fonts/ 517 Singapore: 518 ftp://ftp.technet.sg:/pub/chinese/fonts/ 519 US: 520 ftp://ftp.ifcss.org/pub/software/fonts/ 521 http://ccic.ifcss.org/www/pub/software/fonts/ 523 X.400 Considerations 525 X.400 has the ability of carrying different character sets in a 526 message by using the body part "GeneralText" defined by ISO/IEC-10021-7. 527 [ISO-10021]. 529 The X.400 ASN.1 definition of the GeneralText body part is: 531 general-text-body-part EXTENDED-BODY-PART-TYPE 532 PARAMETERS GeneralTextParameters IDENTIFIED BY id-ep-general-text 533 DATA GeneralTextData 534 ::= id-et-general-text 536 GeneralTextParameters ::= SET OF CharacterSetRegistration 538 CharacterSetRegistration ::= INTEGER (1..32767) 540 GeneralTextData ::= GeneralString 542 Therefore, to use ISO-2022-CN, set the "CharacterSetRegistration" 543 part as { 6 58 171 172 }, and add an ESC sequence of ESC ( B (three bytes, 544 hexadecimal values: 1B 28 42) before the beginning of ISO-2022-CN text. 546 Similarly, to use ISO-2022-CN-EXT, set the registered numbers of 547 all character sets in the "CharacterSetRegistration" part and add ESC 548 ( B at the beginning. For the registered numbers, please refer to 549 ISO registry. In addition to the character sets supported by ISO-2022-CN, 550 currently registered numbers are: 552 GB 2312+GB 8565: 165 553 CNS 11643-plane 3: 183 554 CNS 11643-plane 4: 184 555 CNS 11643-plane 5: 185 556 CNS 11643-plane 6: 186 557 CNS 11643-plane 7: 187 559 176 is the registered number for the BASESET of ISO/IEC 10646-1:1993 560 UCS-2 with implementation level 3, Escape sequence of ESC % / E 561 (four bytes, hexadecimal values 1B 25 2F 45) indicates starting of 562 this codeset. 564 For CN-GB and CN-Big5 character sets, there currently are no formal 565 methods that could be used in X.400 yet. 567 For detail about X.400 use of character sets, please refer to 568 RFC 1502 [RFC-1502]. 570 Formal Syntax of ISO-2022-CN and ISO-2022-CN-EXT 572 The notational conventions used here are identical to those used in 573 RFC 822. 575 1. Formal Syntax of ISO-2022-CN 577 body ::= * ( ascii_line / c_line ) 579 ascii_line ::= *char CRLF 581 c_line ::= *char 1*(1*designation 1*(*char 1*c_text *char)) CRLF 583 designation ::= SOdesignation / SS2designation 585 SOdesignation ::= ESC "$" ")" finalchar_for_SO 587 SS2designation ::= ESC "$" "*" finalchar_for_SS2 589 finalchar_for_SO ::= "A" / "G" 591 finalchar_for_SS2 ::= "H" 593 c_text ::= 1* ( SO-SI-segment / SS2segment ) 595 SO-SI-segment ::= SO 1*c_char *designation *( c_segment / SO-segment ) SI 597 c_segment ::= 1* ( c_char / SS2segment ) 599 SO-segment ::= SO 1*c_char 601 SS2segment ::= SS2 c_char 603 c_char ::= one_of_94 one_of_94 605 ; ( Octal, Decimal.) 607 ESC ::= ; ( 33, 27.) 609 SI ::= ; ( 17, 15.) 611 SO ::= ; ( 16, 14.) 613 SS2 ::= ; ( 33 116, 27 78.) 615 SS3 ::= ; ( 33 117, 27 79.) 617 one_of_94 ::= ; (41-176, 33-126.) 619 char ::= ; (40-177, 30-127.) 621 2. Formal Syntax of ISO-2022-CN-EXT 623 body ::= * ( ascii_line / c_line ) 625 ascii_line ::= *char CRLF 627 c_line ::= *char 1*(1*designation 1*(*char 1*c_text *char)) CRLF 629 designation ::= SOdesignation / SS2designation / SS3designation 631 SOdesignation ::= ESC "$" ")" finalchar_for_SO 633 SS2designation ::= ESC "$" "*" finalchar_for_SS2 635 SS3designation ::= ESC "$" "+" finalchar_for_SS3 637 finalchar_for_SO ::= "A" / / "G" / "E" 639 finalchar_for_SS2 ::= / / "H" 641 finalchar_for_SS3 ::= / / "I" / "J" / "K" / "L" / "M" 643 c_text ::= 1* ( SO-SI-segment / SS2segment / SS3segment ) 645 SO-SI-segment ::= SO 1*c_char *designation *( c_segment / SO-segment ) SI 647 c_segment ::= 1* ( c_char / SS2segment / SS3segment ) 649 SO-segment ::= SO 1*c_char 651 SS2segment ::= SS2 c_char 653 SS3segment ::= SS3 c_char 655 c_char ::= one_of_94 one_of_94 657 ; ( Octal, Decimal.) 659 ESC ::= ; ( 33, 27.) 661 SI ::= ; ( 17, 15.) 663 SO ::= ; ( 16, 14.) 665 SS2 ::= ; ( 33 116, 27 78.) 667 SS3 ::= ; ( 33 117, 27 79.) 669 one_of_94 ::= ; (41-176, 33-126.) 671 char ::= ; (40-177, 30-127.) 673 Registration of New "charset"s and New MIME parameter 675 1. This document defines the following MIME "charset" names for Chinese 676 text: 678 ISO-2022-CN, ISO-2022-CN-EXT 679 CN-GB, CN-Big5 680 CN-GB-12345-90 681 CN-GB-2312-80+GB-8565-88 683 2. This document defines a new MIME parameter: 685 charset-variant 687 Acknowledgments 689 This document is the result of cooperation in the APNG-CC, the 690 Chinese Character sub-working group of the I18N/L10N 691 (Internationalization and Localization) working group of APNG 692 (Asia-Pacific Networking Group), coordinator Zhu Haifeng 693 . The membership of APNG-CC consists 694 of individuals from both sides of the Taiwan Strait, HongKong, 695 and from Singapore and other countries. The authors wish to 696 thank all members of APNG-CC. 698 Prof.Yao Shiquan and Ms.Lin Ning of CITS (China Information Technology 699 Standardization Technical Committee), Prof. Zhao Jingrong, Prof. Li Xing, 700 and Mr.YouYue of Tsinghua University gave many help in the process 701 of the work. 703 Many thanks to Mr. C.J.Cherng and Mr. C.K.Fan of III (Institute for 704 Information Industry), and Mr. Chang JingShin from Tsinghua University 705 in Hsinchu, Taiwan. 707 In particular, Mr.Masataka Ohta, who is the coordinator of APNG-I18N, 708 contributed many efforts towards the work from the beginning of APNG-CC. 710 The authors also wish to thank the following people who contributed 711 in many ways towards this draft. 713 Martin J. Duerst Kenichi Handa 714 Zhang Ling Zhang ZhouCai 715 Zhu Bin Nelson Chin 716 Lu Chin Ding ZyKaan 717 Chen Shuyi Mao Yonggang 718 Mao Yonggang Ken Lunde 719 Lua Kim Teng Victor Cheng 720 Stephen G. Simpson Yuan Jiang 721 Liu HuiFang Harald T. Alvestrand 722 Feng Hui 724 Security Considerations 726 Security issues are not discussed in this memo. 728 Authors' Addresses 730 Zhu,Hai-feng (HF. Zhu) 731 Dept. of Computer Science & Technology 732 Tsinghua University 733 Beijing, 100084 734 China 736 Tel: +86-1-2561144 ext. 3492 737 Fax: +86-1-2564173 738 Email: zhf@net.edu.cn, zhf@net.tsinghua.edu.cn 740 Hu,Dao-yuan (DY. Hu) 741 Tsinghua Networking Center 742 Tsinghua University 743 Beijing, 100084 744 China 746 Tel: +86-1-2594016 747 Fax: +86-1-2564173 748 Email: hdy@tsinghua.edu.cn 750 Wang,Zhi-guan (ZG. Wang) 751 SubCommitte 2 (SC2) 752 China Information Technology Standardization Technical Committee 753 (CITS) 754 Beijing, 100083 755 China 757 Tel: +86-1-4012392 758 Fax: +86-1-4010601 760 Kao,Tien-cheu (TC. Kao) 761 I.T. Promotion Division 762 Institute for Information Industry(III) 763 Taipei 764 Taiwan 766 Tel: +886-2-5631688 767 Fax: +886-2-563-4209 768 Email: tckao@iiidns.iii.org.tw 770 Chang,Wen-chung (WC. Chang) 771 Institute for Information Industry(III) 772 Taipei 773 Taiwan 775 Tel: +886-2-7327771 776 Fax: +886-2-7370188 777 Email: chung@iiidns.iii.org.tw 779 Mark R. Crispin 780 Networks and Distributed Computing 781 University of Washington 782 4545 15th Avenue NE 783 Seattle, WA 98105-4527 784 USA 786 Tel: +1 (206) 543-5762 787 Fax: +1 (206) 685-4045 788 Email: MRC@CAC.Washington.EDU 790 Appendix -- Conversion Table for CNS-11643 and Big5 792 This is a conversion table for the Chinese characters in Big5 and 793 CNS-11643, including some specific characters in Eten variant of Big5. 794 Noted that this list only contains Chinese characters, symbols are 795 not provided. For more complete table, please refer to [CJK] or 796 the ftp sites listed in section 1.4, where conversion programs are 797 available. 799 1. Big5 Level 1 correspondence to CNS 11643-1992 Plane 1: 801 0xA440-0xACFD <-> 0x4421-0x5322 # Level 1 Chinese start 802 0xACFE <-> 0x5753 803 0xAD40-0xAFCF <-> 0x5323-0x5752 804 0xAFD0-0xBBC7 <-> 0x5754-0x6B4F 805 0xBBC8-0xBE51 <-> 0x6B51-0x6F5B 806 0xBE52 <-> 0x6B50 807 0xBE53-0xC1AA <-> 0x6F5C-0x7534 808 0xC1AB-0xC2CA <-> 0x7536-0x7736 809 0xC2CB <-> 0x7535 810 0xC2CC-0xC360 <-> 0x7737-0x782C 811 0xC361-0xC3B8 <-> 0x782E-0x7863 812 0xC3B9 <-> 0x7865 813 0xC3BA <-> 0x7864 814 0xC3BB-0xC455 <-> 0x7866-0x7961 815 0xC456 <-> 0x782D 816 0xC457-0xC67E <-> 0x7962-0x7D4B # Level 1 Chinese end 818 2. Big5 Level 2 correspondence to CNS 11643-1992 Plane 2: 820 0xC940-0xC949 <-> 0x2121-0x212A 821 0xC94A -> 0x4442 # duplicate of 0xA461 822 0xC94B-0xC96B <-> 0x212B-0x214B 823 0xC96C-0xC9BD <-> 0x214D-0x217C 824 0xC9BE <-> 0x214C 825 0xC9BF-0xC9EC <-> 0x217D-0x224C 826 0xC9ED-0xCAF6 <-> 0x224E-0x2438 827 0xCAF7 <-> 0x224D 828 0xCAF8-0xD6CB <-> 0x2439-0x376E 829 0xD6CC <-> 0x3E63 830 0xD6CD-0xD779 <-> 0x3770-0x387D 831 0xD77A <-> 0x3F6A 832 0xD77B-0xDADE <-> 0x387E-0x3E62 833 0xDADF <-> 0x376F 834 0xDAE0-0xDBA6 <-> 0x3E64-0x3F69 835 0xDBA7-0xDDFB <-> 0x3F6B-0x4423 836 0xDDFC -> 0x4176 # duplicate of 0xDCD1 837 0xDDFD-0xE8A2 <-> 0x4424-0x554A 838 0xE8A3-0xE975 <-> 0x554C-0x5721 839 0xE976-0xEB5A <-> 0x5723-0x5A27 840 0xEB5B-0xEBF0 <-> 0x5A29-0x5B3E 841 0xEBF1 <-> 0x554B 842 0xEBF2-0xECDD <-> 0x5B3F-0x5C69 843 0xECDE <-> 0x5722 844 0xECDF-0xEDA9 <-> 0x5C6A-0x5D73 845 0xEDAA-0xEEEA <-> 0x5D75-0x6038 846 0xEEEB <-> 0x642F 847 0xEEEC-0xF055 <-> 0x6039-0x6242 848 0xF056 <-> 0x5D74 849 0xF057-0xF0CA <-> 0x6243-0x6336 850 0xF0CB <-> 0x5A28 851 0xF0CC-0xF162 <-> 0x6337-0x642E 852 0xF163-0xF16A <-> 0x6430-0x6437 853 0xF16B <-> 0x6761 854 0xF16C-0xF267 <-> 0x6438-0x6572 855 0xF268 <-> 0x6934 856 0xF269-0xF2C2 <-> 0x6573-0x664C 857 0xF2C3-0xF374 <-> 0x664E-0x6760 858 0xF375-0xF465 <-> 0x6762-0x6933 859 0xF466-0xF4B4 <-> 0x6935-0x6961 860 0xF4B5 <-> 0x664D 861 0xF4B6-0xF4FC <-> 0x6962-0x6A4A 862 0xF4FD-0xF662 <-> 0x6A4C-0x6C51 863 0xF663 <-> 0x6A4B 864 0xF664-0xF976 <-> 0x6C52-0x7165 865 0xF977-0xF9C3 <-> 0x7167-0x7233 866 0xF9C4 <-> 0x7166 867 0xF9C5 <-> 0x7234 868 0xF9C6 <-> 0x7240 869 0xF9C7-0xF9D1 <-> 0x7235-0x723F 870 0xF9D2-0xF9D5 <-> 0x7241-0x7244 872 3. Big5 Level 2 correspondence to CNS 11643-1992 Plane 3: 874 0xF9D6 <-> 0x4337 # ETen-specific Chinese 875 0xF9D7 <-> 0x4F50 # ETen-specific Chinese 876 0xF9D8 <-> 0x444E # ETen-specific Chinese 877 0xF9D9 <-> 0x504A # ETen-specific Chinese 878 0xF9DA <-> 0x2C5D # ETen-specific Chinese 879 0xF9DB <-> 0x3D7E # ETen-specific Chinese 880 0xF9DC <-> 0x4B5C # ETen-specific Chinese 882 References 884 [ASCII] American National Standards Institute, "Coded character set 885 -- 7-bit American National Standard Code for Information 886 Interchange", ANSI X3.4-1986. 888 [BIG5] Institute for Information Industry, "Chinese Coded 889 Character Set in Computer ", March, 1984 891 [CJK] Ken Lunde, On-line documentation of Chinese/Japanese/Korean 892 Information Processing, 1995, available at: 893 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf 895 [CNS-5205] "Information processing -- 7-Bit Coded Character Set For 896 Information Interchange", CNS-5205. 898 [CNS-11643] "Chinese Standard Interchange Code", CNS-11643 version 899 1992; "Standard Interchange Code for Generally-Used Chinese 900 Characters", CNS 11643 version 1986. 902 [GB-1988] "7-bit Coding Character Set for Information Interchange", 903 GB 1988-80. 905 [GB-2312] "Coding of Chinese Ideogram Set for Information Interchange 906 Basic Set", GB 2312-80. 908 [GB-7589] "Code of Chinese Ideograms Set for Information Interchange, 909 the 2nd Supplementary Set", UDC 681.3.048, GB 7589-87. 911 [GB-7590] "Code of Chinese Ideogram Set for Information Interchange, 912 the 4th Supplementary Set",UDC 681.3.048, GB 7590-87. 914 [GB-8565] "Information Processing Coded Character Sets for Text 915 Communication", UDC 681.3, GB 8565-88. 917 [GB-12345] "Code of Chinese Ideogram Set for Information Interchange 918 Supplementary Set", GB/T 12345-90. 920 [GB-13000] "Information technology--Universal Multiple-Octet Coded 921 Character Set(UCS)---Part 1: Architecture and Basic Multilingual Plane", 922 GB13000.1 924 [GB-13131] "Code of Chinese Ideogram Set for Information Interchange, 925 the 3rd Supplementary Set", GB 13131-91. 927 [GB-13132] "Code of Chinese Ideogram Set for Information Interchange, 928 the 5th Supplementary Set", GB 13132-91. 930 [ISO-646] International Organization for Standardization (ISO), 931 "Information technology -- ISO 7-bit coded character set for 932 information interchange", International Standard, Ref. No. ISO/IEC 933 646:1991. 935 [ISO-2022] International Organization for Standardization (ISO), 936 "Information processing -- ISO 7-bit and 8-bit coded character sets 937 -- Code extension techniques", International Standard, Ref. No. ISO 938 2022-1986 (E). 940 [ISO-10021] Information Technology - Text communication - 941 Message-Oriented Text Interchange Systems (MOTIS), ISO 10021, 942 October 1988. 944 [ISO-10646] ISO/IEC 10646-1:1993(E) Information Technology--Universal 945 Multiple-octet Coded Character Set (UCS)---Part 1: Architecture and 946 Basic Multilingual Plane" 948 [ISOREG] International Organization for Standardization (ISO), 949 "International Register of Coded Character Sets To Be Used With 950 Escape Sequences". 952 [MIME-1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet 953 Mail Extensions) Part One: Mechanisms for Specifying and Describing 954 the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, 955 September 1993. 957 [MIME-2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 958 Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522, 959 University of Tennessee, September 1993. 961 [RFC-822] Crocker, D., "Standard for the Format of ARPA Internet Text 962 Messages", STD 11, RFC 822, University of Delaware, August 1982. 964 [RFC-1036] Horton M., and Adams, R., "Standard for Interchange of 965 USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for 966 Seismic Studies, December 1987. 968 [RFC-1468] Murai J., Crispin M. and E. van der Poel, Japanese 969 Character Encoding for Internet Messages, June 1993. 971 [RFC-1557] Choi U., Chon K. and Park H., Korean Character Encoding 972 for Internet Messages, December 1993. 974 [RFC-1641] Goldsmith D., and Davis M., "Using Unicode with MIME", RFC 975 1641, Taligent Inc., July 1994 977 [RFC-1642] Goldsmith D., and Davis M.," UTF-7, A Mail-Safe Transformation 978 Format of Unicode", July 1994 980 [RFC-1700] Reynolds J., and Postel J., "Assigned Numbers",RFC 1700, 981 STD 2, ISI, October 1994 983 [SMTP] Postel, Jonathan B. "Simple Mail Transfer Protocol", STD 10, 984 RFC 821, USC/Information Sciences Institute, August 1982. 986 [SMTPEXT] Klensin, J.; Freed, N.; Rose, M.; Stefferud, E.; and 987 Crocker, D., "SMTP Service Extensions", RFC 1651, July 1994. 989 [Unicode 1.1] "The Unicode Standard, Version 1.1", 990 Addison-Wesley, Reading, MA (to be published; the contents 991 of this standard is currently available by combining 992 [Unicode92], [Unicode93], and [Unicode4]). 994 [Unicode92] The Unicode Consortium, "The Unicode Standard - 995 Worldwide Character Encoding - Version 1.0", Volume 1, 996 Addison-Wesley, Reading, MA, 1992 (ISBN 0-201-56788-1). 998 [Unicode93] The Unicode Consortium, "The Unicode Standard - 999 Worldwide Character Encoding - Version 1.0", Volume 2, 1000 Addison-Wesley, Reading, MA, 1992 (ISBN 0-201-60845-6). 1002 [Unicode4] The Unicode Consortium, "The Unicode Standard - 1003 Version 1.1 (Prepublication Edition)", Unicode Technical 1004 Report #4 (avaliable from the Unicode Consortium).