idnits 2.17.1 

draft-zhu-apng-cc-encoding-v2-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-26) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** Bad filename characters: the document name given in the document,
     'draft-apng-cc-encoding-03.8', contains other characters than digits,
     lowercase letters and dash.

  ** Missing revision: the document name given in the document,
     'draft-apng-cc-encoding-03.8', does not give the document revision number

  ~~ Missing draftname component: the document name given in the document,
     'draft-apng-cc-encoding-03.8', does not seem to contain all the document
     name components required ('draft' prefix, document source, document name,
     and revision) -- see https://www.ietf.org/id-info/guidelines#naming for
     more information.

  == Mismatching filename: the document gives the document name as
     'draft-apng-cc-encoding-03.8', but the file name used is
     'draft-zhu-apng-cc-encoding-v2-01'

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 1042 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 72 instances of too long lines in the document, the longest
     one being 7 characters in excess of 72.

  ** The abstract seems to contain references ([ISO-10646], [RFC-1036],
     [RFC-822]), which it shouldn't.  Please replace those with straight
     textual mentions of the documents in question.

  == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 483: '... implementations SHOULD at least suppo...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 21 has weird spacing: '...-Drafts  are  ...'

  == Line 22 has weird spacing: '...F), its  areas...'

  == Line 23 has weird spacing: '...ay also  distr...'

  == Line 86 has weird spacing: '...similar  to  e...'

  == Line 107 has weird spacing: '...er sets  used ...'

  == (9 more instances...)

  == Couldn't figure out when the document was first submitted -- there may
     comments or warnings related to the use of a disclaimer for pre-RFC5378
     work that could not be issued because of this.  Please check the Legal
     Provisions document at https://trustee.ietf.org/license-info to determine
     if you need the pre-RFC5378 disclaimer.

  -- The document date (July 1995) is 10513 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0xa1-0xfe' is mentioned on line 393, but not defined

  == Missing Reference: '0x40-0x7e' is mentioned on line 393, but not defined

  == Missing Reference: '0x81-0xa0' is mentioned on line 394, but not defined

  == Missing Reference: 'RFC-1502' is mentioned on line 568, but not defined

  == Unused Reference: 'GB-13132' is defined on line 927, but no explicit
     reference was found in the text

  == Unused Reference: 'MIME-1' is defined on line 952, but no explicit
     reference was found in the text

  == Unused Reference: 'MIME-2' is defined on line 957, but no explicit
     reference was found in the text

  == Unused Reference: 'SMTP' is defined on line 983, but no explicit
     reference was found in the text

  == Unused Reference: 'Unicode92' is defined on line 994, but no explicit
     reference was found in the text

  == Unused Reference: 'Unicode93' is defined on line 998, but no explicit
     reference was found in the text

  == Unused Reference: 'Unicode4' is defined on line 1002, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ASCII'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BIG5'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CJK'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CNS-5205'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CNS-11643'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-1988'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-2312'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-7589'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-7590'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-8565'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-12345'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-13000'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-13131'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GB-13132'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-2022'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10021'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISOREG'

  ** Obsolete normative reference: RFC 1521 (ref. 'MIME-1') (Obsoleted by RFC
     2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049)

  ** Obsolete normative reference: RFC 1522 (ref. 'MIME-2') (Obsoleted by RFC
     2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049)

  ** Obsolete normative reference: RFC  822 (Obsoleted by RFC 2822)

  ** Obsolete normative reference: RFC 1036 (Obsoleted by RFC 5536, RFC 5537)

  ** Downref: Normative reference to an Informational RFC: RFC 1468

  ** Downref: Normative reference to an Informational RFC: RFC 1557

  ** Downref: Normative reference to an Experimental RFC: RFC 1641

  ** Obsolete normative reference: RFC 1642 (Obsoleted by RFC 2152)

  ** Obsolete normative reference: RFC 1700 (Obsoleted by RFC 3232)

  ** Obsolete normative reference: RFC  821 (ref. 'SMTP') (Obsoleted by RFC
     2821)

  ** Obsolete normative reference: RFC 1651 (ref. 'SMTPEXT') (Obsoleted by
     RFC 1869)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode92'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode4'


     Summary: 24 errors (**), 1 flaw (~~), 22 warnings (==), 23 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                            HF. Zhu
3	Internet Draft: Chinese Character Encoding                    Tsinghua U
4	Document: internet-drafts/draft-apng-cc-encoding-03.8.txt         DY. Hu
5	                                                              Tsinghua U
6	                                                                ZG. Wang
7	                                                                    CITS
8	                                                                 TC. Kao
9	                                                                     III
10	                                                               WC. Chang
11	                                                                     III
12	                                                              M. Crispin
13	                                                            U Washington

15	                                                               July 1995

17	            Chinese Character Encoding for Internet Messages

19	Status of this Memo

21	   This document is an Internet-Draft.  Internet-Drafts  are  working
22	   documents of the Internet Engineering Task Force (IETF), its  areas,
23	   and its working groups.  Note that other groups may also  distribute
24	   working documents as Internet-Drafts.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   To learn the current status of any Internet-Draft, please check the
32	   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
33	   Directories on ds.internic.net (US East Coast), nic.nordu.net
34	   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
35	   Rim).

37	   This is a draft document of APNG-CC, the Chinese Character
38	   sub-working group of the I18N/L10N (Internationalization and
39	   Localization) working group of APNG (Asia-Pacific Networking Group).
40	   A revised version of this draft document will be submitted to the RFC
41	   editor as an Informational RFC for the Internet Community.
42	   Discussion and suggestions for improvement are requested, and should
43	   be sent to apng-cc@apng.org or zhf@net.edu.cn (the coordinator).  This
44	   document will expire before February 30, 1996.  Distribution of this
45	   memo is unlimited.

47	Abstract

49	   This memo provides methods for transporting Chinese characters
50	   through, but not limited to, electronic mail [RFC-822] and network
51	   news [RFC-1036] in the Internet community.

53	Introduction

55	   As the use of Internet covers more and more Chinese people in the
56	   world, the need has increased for the ability to send documents
57	   containing Chinese characters on the Internet.  The methods
58	   described in this document provide means of transporting existing
59	   Chinese character sets as well as leaving sufficient space for future
60	   extension.

62	   This document describes three groups of encodings:
63	      1. ISO-2022-CN and ISO-2022-CN-EXT
64	      2. CN-GB and CN-Big5
65	      3. ISO/IEC 10646/Unicode

67	    The first group of encodings are designed with interoperability in
68	    mind and are encouraged in this document; they are 7-bit, support
69	    both simplified and traditional characters using both GB and CNS/Big5,
70	    and do not impose any unusual quoting requirements on ASCII characters
71	    The second group of encodings describes current common domestic
72	    usage.  The third group of encodings refers to the universal
73	    multilingual character set defined by ISO.

75	    Note:  ISO/IEC 10646 [ISO-10646] defines a 32bit character space
76	    with the intent to encode all characters in the world. Currently, only
77	    the lowest 16bit plane of ISO 10646, the Basic Multilingual Plane (BMP),
78	    is defined. The BMP is code-by-code identical to Unicode [Unicode 1.1].

80	Specification

82	1. 7bit Chinese encodings: ISO-2022-CN and ISO-2022-CN-EXT

84	 1.1  Description

86	   ISO-2022-CN is based upon ISO 2022 [ISO-2022],   similar  to  earlier
87	   work on ISO-2022-JP [RFC-1468] and ISO-2022-KR [RFC-1557] for Japanese
88	   and Korean languages.  It is 7-bit, and supports both simplified Chinese
89	   characters using GB 2312-80 [GB-2312] and traditional Chinese characters
90	   using the first two planes of CNS 11643 [CNS-11643], as well as  ASCII
91	   [ASCII] characters.

93	   ISO-2022-CN-EXT is a superset of ISO-2022-CN that additionally
94	   supports other GB character sets and planes of CNS 11643.

96	   Since ISO-2022-CN and ISO-2022-CN-EXT are 7-bit encodings, they do
97	   not require the 8-bit SMTP extensions.  ISO-2022-CN supports almost
98	   all the characters which appear in Big5 [BIG5] except for two duplicate
99	   characters which were mistakes in defining Big5.

101	 1.2  ISO-2022-CN

103	   The starting code of ISO-2022-CN is ASCII.  ASCII and Chinese characters
104	   are distinguished by the use of designations (ESC sequences) and shift
105	   functions.

107	   Designations define the Chinese character sets  used  in  the  text.
108	   There are three kinds of designations: SOdesignation, SS2designation
109	   and SS3designation.

111	   The SOdesignation is in the form ESC $ ) <F>,  where  <F>   is  the
112	   "final character" assigned to the character set by ISO (refer to the
113	   ISO registry [ISOREG] for more details).  The SS2designation  is  in
114	   the form ESC $ * <F>, and the SS3designation is in the form ESC $ +
115	   <F> .   A  designation  overrides  any  previous  designation  for
116	   subsequent bytes in the text.

118	   There are four kinds of shifts: SI, SO, SS2 and SS3.

120	   The shift SI (one byte with  hexadecimal  value  0F)   declares  that
121	   subsequent bytes are interpreted in ASCII.

123	   The shift SO (one byte with hexadecimal  value  0E)   declares  that
124	   subsequent bytes are interpreted in  the  character  set  defined  by
125	   SOdesignation.

127	   The shift SS2 (two bytes with hexadecimal values  1B  4E)   declares
128	   that the subsequent TWO bytes are interpreted in the  character  set
129	   defined by SS2designation, after which the previous interpretation
130	   (from SI or SO) is restored.

132	   The shift SS3 (two bytes with hexadecimal values  1B  4F)   declares
133	   that the subsequent TWO bytes are interpreted in the  character  set
134	   defined by SS3designation, after which the previous interpretation
135	   (from SI or SO) is restored.

137	   For example, the sequence:
138	     ESC $ ) A SO c_char1 ... c_char1 ESC $ ) G c_char2 ... c_char2 SI
139	   transfers mixed simplified Chinese and traditional Chinese text, in
140	   which c_char1s are simplified Chinese characters from GB-2312 and
141	   c_char2s are traditional Chinese characters from CNS-11643-plane 1.

143	   The escape sequence, shift function and character set used in an
144	   ISO-2022-CN text are as follows:

146	   Character sets                                       Shift in with
147	  --------------------------------------------------------------------
148	    ASCII                                                    SI
149	    GB 2312, CNS 11643-plane-1                                SO
150	             CNS 11643-plane-2                                SS2

152	      ESC $ ) A   Indicates the bytes following SO are Chinese characters
153	                  as defined in GB 2312-80, until another SOdesignation
154	                  appears

156	      ESC $ ) G   Indicates the bytes following SO are as defined in
157	                  CNS 11643-plane-1, until another SOdesignation appears

159	      ESC $ * H   Indicates the two bytes immediately following SS2 is a
160	                  Chinese character as defined in CNS 11643-plane-2, until
161	                  another SS2designation appears

163	   If there are any GB or CNS characters on a line, a designation for
164	   the corresponding character set should be used so that each line has
165	   its own character set information and the text can be displayed
166	   correctly when scroll back in a window.  Also, there must be a shift
167	   to ASCII (SI) before the end of the line (i.e., before the CRLF).  In
168	   other words, each line starts in ASCII, and ends in ASCII.

170	   The name given to this character encoding is "ISO-2022-CN". This name
171	   is intended to be used as the "charset" parameter in MIME [MIME-1,
172	   MIME-2] messages.

174	       Content-Type: text/plain; charset=iso-2022-cn

176	   The ISO-2022-CN encoding is already in 7-bit form, so it is not
177	   necessary to use a Content-Transfer-Encoding header.

179	   Other restrictions are given in the "Formal Syntax of ISO-2022-CN
180	   and ISO-2022-CN-EXT" part at the end of this document.

182	 1.3  ISO-2022-CN-EXT

184	   ISO-2022-CN-EXT supports all characters in existing GB, Big5 and CNS
185	   11643 character sets.

187	   The escape sequence, shift function and character set used in an
188	   ISO-2022-CN-EXT text are as follows:

190	   Character sets                                       Shift in with
191	  --------------------------------------------------------------------
192	    ASCII                                                    SI
193	    GB 2312, GB 12345, CNS 11643-plane-1, GB 2312+GB 8565    SO
194	    GB 7589, GB 13131, CNS 11643-plane-2                     SS2
195	    GB 7590, GB 13132 or other new GBs,CNS 11643-plane-3 or  SS3
196	     higher planes of CNS 11643

198	      Note: Currently, there are some GB sets that have not been
199	      registered in ISO. Here <X7589>, <X7590>, <X12345>, <X13131>
200	      and <X13132> represent the final character that will be assigned
201	      by ISO for those sets.

203	      ESC $ ) A   Indicates the bytes following SO are Chinese characters
204	                  as defined in GB 2312-80, until another SOdesignation
205	                  appears

207	      ESC $ * <X7589>
208	                  Indicates the two bytes immediately following SS2 is a
209	                  Chinese character as defined in GB 7589-87 [GB-7589],
210	                  until another SS2designation appears

212	      ESC $ + <X7590>
213	                  Indicates the two bytes immediately following SS3 is a
214	                  Chinese character as defined in GB 7590-87 [GB-7590],
215	                  until another SS3designation appears

217	      ESC $ ) <X12345>
218	                  Indicates the bytes following SO are as defined in
219	                  GB 12345-90 [GB-12345], until another SOdesignation
220	                  appears

222	      ESC $ * <X13131>
223	                  Indicates the two bytes immediately following SS2 is a
224	                  Chinese character as defined in GB 13131-91 [GB-13131],
225	                  until another SS2designation appears

227	      ESC $ + <X13132>
228	                  Indicates the two bytes immediately following SS3 is a
229	                  Chinese character as defined in GB 13132-91 [GB-13131],
230	                  until another SS3designation appears

232	      ESC $ ) E   Indicates the bytes following SO are as defined in GB 2312+
233	                  GB 8565 [GB-8565], until another SOdesignation appears

235	      ESC $ ) G   Indicates the bytes following SO are as defined in
236	                  CNS 11643-plane-1, until another SOdesignation appears
237	      ESC $ * H   Indicates the two bytes immediately following SS2 is a
238	                  Chinese character as defined in CNS 11643-plane-2, until
239	                  another SS2designation appears
240	      ESC $ + I   Indicates the immediate two bytes following SS3 is a
241	                  Chinese character as defined in CNS 11643-plane-3,
242	                  until another SS3designation appears
243	      ESC $ + J   Indicates the immediate two bytes following SS3 is a
244	                  Chinese character as defined in CNS 11643-plane-4,
245	                  until another SS3designation appears
246	      ESC $ + K   Indicates the immediate two bytes following SS3 is a
247	                  Chinese character as defined in CNS 11643-plane-5,
248	                  until another SS3designation appears
249	      ESC $ + L   Indicates the immediate two bytes following SS3 is a
250	                  Chinese character as defined in CNS 11643-plane-6,
251	                  until another SS3designation appears
252	      ESC $ + M   Indicates the immediate two bytes following SS3 is a
253	                  Chinese character as defined in CNS 11643-plane-7,
254	                  until another SS3designation appears

256	   As in ISO-2022-CN, each line should start in ASCII, and end in ASCII,
257	   and should have its own designation information before any Chinese
258	   characters appear.

260	   The name given to this character encoding is "ISO-2022-CN-EXT". This name
261	   is intended to be used as the "charset" parameter in MIME messages.

263	       Content-Type: text/plain; charset=ISO-2022-CN-EXT

265	   The ISO-2022-CN-EXT encoding is also in 7-bit form, so it is not
266	   necessary to use a Content-Transfer-Encoding header.

268	   Other restrictions are given in the "Formal Syntax of ISO-2022-CN and
269	   ISO-2022-CN-EXT" part at the end of this document.

271	  1.4  How to Support Big5 or other internal codesets with ISO-2022-CN
272	       and ISO-2022-CN-EXT

274	   Since there are many different Chinese internal coding systems [CJK],
275	   such as Big5, GB internal code, CCCII (an encoding for library systems
276	   in Taiwan), XGB (the codepage for Microsoft simplified Chinese Windows
277	   95) etc.  ISO-2022-CN and ISO-2022-CN-EXT,  which are 7bit and will not
278	   lose information during communication among different codesets and thus
279	   increase interoperability,  are ideal interchange encodings for various
280	   internal Chinese codesets in international communication.

282	   For instance, ISO-2022-CN and ISO-2022-CN-EXT can be used to support
283	   Big5,  because CNS-11643-plane 1 and 2 incorporate all Chinese characters
284	   in Big5 except two duplicate characters which was a mistake when defining
285	   Big5.

287	   Since the code sequence of Big5 and CNS-11643 is different, it needs a
288	   conversion table for converting Big5 to and from CNS-11643.  The
289	   conversion table is attached as an appendix in this document.

291	   Public domain software (either binary or source in C) is provided in
292	   many places in the Internet too:

294	   1) Beijing:

296	    ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/
297	    (IP address: 166.111.1.11)

299	   2) Taiwan:

301	    ftp://tpdns.seed.net.tw/Pub/Chinese/DOS/code-convert/chcode.zip
302	    (IP address: 139.175.1.12)

304	   3) US:

306	    ftp://ftp.ifcss.org/pub/software/unix/convert/BeTTY-1.534.tar.gz
307	    (IP address: 128.123.1.55)

309	   4) Japan:
310	    ftp://etlport.etl.go.jp/pub/iso-2022-cn/
311	    (IP address: 192.31.197.99)

313	2. 8bit Chinese encodings: CN-GB and CN-Big5

315	   The CN-GB and CN-Big5 charset names are given below.
316	   Among other things, these support current practice; specifically,
317	   CN-GB reflects the current usage for simplified Chinese e-mail,
318	   and CN-Big5 reflects the current usage for traditional Chinese e-mail.

320	     Note: the use of 8-bit character sets requires the use of
321	     either an 8-to-7 Content-Transfer-Encoding mechanism such as
322	     "BASE64" or "QUOTED-PRINTABLE" if the network is not 8-bit clean,
323	     or the 8-bit SMTP extensions [SMTPEXT] with the "8BIT"
324	     Content-Transfer-Encoding on 8-bit clean networks.  Otherwise,
325	     an 8-bit message which passes through a 7-bit mailer is likely
326	     to have the 8th bit truncated, resulting in an unreadable
327	     message.  Although "just send 8-bit data" has been common
328	     practice in the past, it is incorrect according to the
329	     Internet standards and causes interoperability problems.

331	 2.1   CN-GB

333	   E-mail using GB characters is sent in this way:

335	   GB 2312-80 characters are used with ASCII characters,
336	   not GB 1988-80 [GB-1988].

338	   GB 2312-80 is also 7-bit, to avoid conflicting with ASCII.  If the
339	   character is from GB 2312-80, the MSB (bit-8) of each byte is set to
340	   1, and therefore becomes a 8-bit character.  Otherwise, the byte is
341	   interpreted as ASCII.  This constructs a character set named "GB
342	   Internal Code".

344	   This method is also adopted in the .gb files in the Internet.

346	   To use this character scheme with MIME, CN-GB is used as the value
347	   for the charset parameter:

349	      Content-Type: text/plain; charset=cn-gb

351	   GB-12345 is the traditional form of GB-2312, the charset name given
352	   to this set is CN-GB-12345-90.

354	   There is also a kind of dependent character set that can only be used
355	   with one of the above sets.  For example, if GB 8565 is used, it can
356	   only be used with GB 2312 or GB 12345, in this case, "+" is permitted
357	   to appear in the charset name, i.e. CN-GB-2312-80+GB-8565-88.

359	   Similarly as CN-GB, CN-GB-12345-90 and CN-GB-2312-80+GB-8565-88 support
360	   ASCII too, the MSB of Chinese characters should be set to 1, in order to
361	   be distinguished from ASCII.

363	     Note: There are some supplementary character sets in GB, i.e. GB 7589-87,
364	     GB 7590-87, GB 13131-91 and GB 13132-91.  Normally, they won't
365	     be used independently without using GB-2312 or GB-12345, so they
366	     are not necessarily be registered.  Characters in these standards
367	     could be support with ISO-2022-CN and ISO-2022-CN-EXT.  If, in the
368	     future, they do needed to be used with "charset" names in some cases,
369	     it is the responsibility of any interested third party (the
370	     standardization organization herself or anybody else) to write the
371	     necessary documents and do the IANA registration for them.  It is
372	     greatly encouraged that their charset names should also take the form
373	     of CN-GB-<number>-<edition> as CN-GB-12345-90.  Here, <number> is the
374	     GB standard number, and <edition> is the year of edition represented
375	     with the last two digits of the year.  They should be coded in 8-bit
376	     as CN-GB.

378	   To avoid hindering interoperability, CN-GB is encouraged to be used
379	   whenever possible.

381	  2.2   CN-Big5

383	   Big5 is a character set of traditional Chinese characters, widely
384	   used in Taiwan and overseas.  E-mail using Big5 characters is
385	   sent in this way:

387	   Big5 characters are used with ASCII characters.

389	   Big5 is a two-byte coding, in which the first byte is 7-bit,  and
390	   the second byte is 8-bit.  If the character is from Big5, the MSB
391	   (bit-8) of the first byte is set to 1, and therefore becomes an 8-bit
392	   character.  Otherwise, the byte is interpreted as ASCII.  (Big5 uses
393	   the code space: [0xa1-0xfe,0x40-0x7e] and [0xa1-0xfe,0xa1-0xfe], and
394	   two other user areas with the first byte in the range of [0x81-0xa0].)

396	   To use this character scheme with MIME, CN-Big5 is used as the value
397	   for the charset parameter:

399	      Content-Type: text/plain; charset=cn-big5

401	3. Universal Multilingual Character Set:  ISO/IEC-10646/Unicode

403	   ISO/IEC 10646's BMP (code-to-code identical to Unicode) contains
404	   large repertoire of Chinese characters (it currently includes all
405	   the characters of GB 2312-80, GB 12345-90, GB 8565-89, CNS 11643's
406	   plane 1 and 2, and part of some other standards) and therefore can
407	   be used to transporting Chinese characters in the Internet community.
408	   This document does not give any details on how to do this, as this has
409	   been done elsewhere.  For details of using Unicode with MIME, refer to
410	   RFC 1641 [RFC-1641], RFC 1642 [RFC-1642].  For assigned names for
411	   10646 sets, refer to STD 2--"Assigned Numbers", which is RFC 1700
412	   [RFC-1700] currently.  For more up-to-date assigned numbers, please
413	   check:

415	    ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

417	A New MIME parameter -- "charset-variant"

419	   Here, a new MIME parameter--"charset-variant" is defined as below:

421	   This parameter is used after the MIME "charset" parameter mainly in
422	   the form of <variant>-<version>, or any extension based on this form,
423	   in which <variant> is the product name and <version> indicates its
424	   version number.  It is case-insensetive and optional, and any value
425	   of this parameter should be registered in IANA.

427	   For example:
428	    Content-Type: text/plain; charset=CN-Big5; charset-variant=ETen-2.00.03-DOS

430	   This may indicate Eten company's variant of Big5: ETen 2.00.03 for DOS.

432	   The reason to define this parameter is that some implementation may
433	   want to check the variants in order to deal with them in slightly different
434	   methods to gain better operability.  Although some features of certain
435	   variant may bring problem of interoperability, however, variants will
436	   still exist as they will go; moreover, certain variant may be so popular
437	   that it becomes de facto industrial standard, therefore indicating its
438	   name can improve the ability of communication implementation in handling
439	   its messages.

441	Background Information

443	1. Writing systems and their encodings in Chinese-spoken nations and regions

445	   The mainland provinces of China use simplified Chinese character in
446	   daily life.  GB is the standard electronic character set.  It is the
447	   main means for communications between people who share simplified
448	   Chinese characters in the world.

450	   Taiwan uses traditional Chinese characters in daily life.  CNS-11643
451	   is the formal character set for information interchange in Taiwan;
452	   however, Big5, a widely-used character set of traditional Chinese
453	   characters, is the de-facto industrial standard in Taiwan.

455	   Hong Kong uses traditional Chinese characters in daily life, but uses
456	   both GB and Big5 in electronic form, because Hong Kong people often
457	   communicate with people in all of China's provinces.

459	   Singapore seldom uses Chinese characters, and uses the simplified
460	   form when Chinese characters are used.  In electronic form, Unicode
461	   is more popular, however GB is also used.

463	2. Miscellaneouses about Chinese character sets

465	   The GB 1988-80 character set is identical to ISO 646 [ISO-646] except
466	   for currency symbol and tilde. The currency symbol and the tilde are
467	   replaced by the Yuan sign and the over line.  This set is GB's variant
468	   of ISO 646.  This character set and CNS 5205 [CNS-5205] are not
469	   encouraged for use in the Internet, since ASCII combined with GB 2312
470	   or CNS 11643-plane 1 and plane 2 comprises all characters in them.

472	   The GB 2312-80 character set consists of simplified Chinese
473	   characters, digits, Latin, Greek and Russian alphabets, and some
474	   other symbols; in all, 7445 characters.  Each character is represented
475	   with two bytes.

477	   GB 13000-95 [GB-13000] is the GB's variant of ISO 10646.  However, for
478	   interoperability in the Internet, assigned names for ISO 10646 are
479	   encouraged to be used.

481	3. Miscellaneous implementation information

483	   For maximum interoperability, implementations SHOULD at least support
484	   sending and receiving ISO-2022-CN.  Supporting all registered character
485	   sets in ISO-2022-CN-EXT is greatly encouraged.

487	   It is also essential to be able to support CN-GB (the status quo for
488	   simplified Chinese e-mail ) and CN-Big5 (the status quo for traditional
489	   Chinese e-mail).  But sending ISO-2022-CN message is always encouraged
490	   whenever possible.

492	   To the maximum extent possible, implementations should be capable of
493	   receiving messages in any of the encodings introduced in this document,
494	   even if they only transmit messages in one form.  Preferably the
495	   implementation should display the characters with glyphs appropriate
496	   to the typographic tradition that is implied in the encoding of the
497	   received text.  Implementation may also translate these encodings
498	   to the encoding that its platform supports.

500	   The human user (not implementor) should try to keep lines within 80
501	   display columns, or, preferably, within 75 (or so) columns, to allow
502	   insertion of ">" at the beginning of each line in excerpts. Each
503	   Chinese character takes up two columns, and the shift sequences do
504	   not take up any columns. The implementor is reminded that Chinese
505	   characters take up two bytes and should not be split in the middle to
506	   break lines for displaying, etc.

508	   Freely available fonts of Chinese characters:

510	     Beijing:
511	       ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/fonts/
512	     Taiwan:
513	       ftp://ftp.edu.tw/Chinese/ifcss/software/fonts/
514	       ftp://ftp.ntu.edu.tw/Chinese/ifcss/software/fonts/
515	     HongKong:
516	       ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/fonts/
517	     Singapore:
518	       ftp://ftp.technet.sg:/pub/chinese/fonts/
519	     US:
520	       ftp://ftp.ifcss.org/pub/software/fonts/
521	       http://ccic.ifcss.org/www/pub/software/fonts/

523	X.400 Considerations

525	   X.400 has the ability of carrying different character sets in a
526	   message by using the body part "GeneralText" defined by ISO/IEC-10021-7.
527	   [ISO-10021].

529	   The X.400 ASN.1 definition of the GeneralText body part is:

531	      general-text-body-part EXTENDED-BODY-PART-TYPE
532	        PARAMETERS GeneralTextParameters IDENTIFIED BY id-ep-general-text
533	        DATA       GeneralTextData
534	        ::= id-et-general-text

536	      GeneralTextParameters ::= SET OF CharacterSetRegistration

538	      CharacterSetRegistration ::= INTEGER (1..32767)

540	      GeneralTextData ::= GeneralString

542	   Therefore, to use ISO-2022-CN, set the "CharacterSetRegistration"
543	   part as { 6 58 171 172 }, and add an ESC sequence of ESC ( B (three bytes,
544	   hexadecimal values: 1B 28 42) before the beginning of ISO-2022-CN text.

546	   Similarly, to use ISO-2022-CN-EXT, set the registered numbers of
547	   all character sets in the "CharacterSetRegistration" part and add ESC
548	   ( B at the beginning.  For the registered numbers, please refer to
549	   ISO registry.  In addition to the character sets supported by ISO-2022-CN,
550	   currently registered numbers are:

552	     GB 2312+GB 8565:   165
553	     CNS 11643-plane 3: 183
554	     CNS 11643-plane 4: 184
555	     CNS 11643-plane 5: 185
556	     CNS 11643-plane 6: 186
557	     CNS 11643-plane 7: 187

559	   176 is the registered number for the BASESET of ISO/IEC 10646-1:1993
560	   UCS-2 with implementation level 3, Escape sequence of ESC % / E
561	   (four bytes, hexadecimal values 1B 25 2F 45) indicates starting of
562	   this codeset.

564	   For CN-GB and CN-Big5 character sets, there currently are no formal
565	   methods that could be used in X.400 yet.

567	   For detail about X.400 use of character sets, please refer to
568	   RFC 1502 [RFC-1502].

570	Formal Syntax of ISO-2022-CN and ISO-2022-CN-EXT

572	   The notational conventions used here are identical to those used in
573	   RFC 822.

575	1.  Formal Syntax of ISO-2022-CN

577	   body  ::= * ( ascii_line / c_line )

579	   ascii_line  ::= *char CRLF

581	   c_line ::= *char 1*(1*designation 1*(*char 1*c_text *char)) CRLF

583	   designation  ::= SOdesignation / SS2designation

585	   SOdesignation  ::= ESC "$" ")" finalchar_for_SO

587	   SS2designation  ::= ESC "$" "*" finalchar_for_SS2

589	   finalchar_for_SO  ::= "A" / "G"

591	   finalchar_for_SS2  ::= "H"

593	   c_text  ::= 1* ( SO-SI-segment / SS2segment )

595	   SO-SI-segment ::= SO 1*c_char *designation *( c_segment / SO-segment ) SI

597	   c_segment  ::= 1* ( c_char / SS2segment )

599	   SO-segment ::= SO 1*c_char

601	   SS2segment  ::= SS2 c_char

603	   c_char  ::= one_of_94  one_of_94

605	                                                     ; ( Octal, Decimal.)

607	   ESC             ::= <ISO-646 ESC, escape>         ; ( 33, 27.)

609	   SI              ::= <ASCII SI, shift in>          ; ( 17, 15.)

611	   SO              ::= <ASCII SO, shift out>         ; ( 16, 14.)

613	   SS2             ::= <ISO 2022 Single_shift two>   ; ( 33 116, 27 78.)

615	   SS3             ::= <ISO 2022 Single_shift three> ; ( 33 117, 27 79.)

617	   one_of_94       ::= <any char in 94_char set>     ; (41-176, 33-126.)

619	   char            ::= <any char in 96_char_set>     ; (40-177, 30-127.)

621	2.  Formal Syntax of ISO-2022-CN-EXT

623	   body  ::= * ( ascii_line / c_line )

625	   ascii_line  ::= *char CRLF

627	   c_line ::= *char 1*(1*designation 1*(*char 1*c_text *char)) CRLF

629	   designation  ::= SOdesignation / SS2designation / SS3designation

631	   SOdesignation  ::= ESC "$" ")" finalchar_for_SO

633	   SS2designation  ::= ESC "$" "*" finalchar_for_SS2

635	   SS3designation  ::= ESC "$" "+" finalchar_for_SS3

637	   finalchar_for_SO  ::= "A" / <X12345> / "G" / "E"

639	   finalchar_for_SS2  ::= <X7589> / <X13131> / "H"

641	   finalchar_for_SS3  ::= <X7590> / <X13132> / "I" / "J" / "K" / "L" / "M"

643	   c_text  ::= 1* ( SO-SI-segment / SS2segment / SS3segment )

645	   SO-SI-segment ::= SO 1*c_char *designation *( c_segment / SO-segment ) SI

647	   c_segment  ::= 1* ( c_char / SS2segment / SS3segment )

649	   SO-segment ::= SO 1*c_char

651	   SS2segment  ::= SS2 c_char

653	   SS3segment  ::= SS3 c_char

655	   c_char  ::= one_of_94  one_of_94

657	                                                     ; ( Octal, Decimal.)

659	   ESC             ::= <ISO-646 ESC, escape>         ; ( 33, 27.)

661	   SI              ::= <ASCII SI, shift in>          ; ( 17, 15.)

663	   SO              ::= <ASCII SO, shift out>         ; ( 16, 14.)

665	   SS2             ::= <ISO 2022 Single_shift two>   ; ( 33 116, 27 78.)

667	   SS3             ::= <ISO 2022 Single_shift three> ; ( 33 117, 27 79.)

669	   one_of_94       ::= <any char in 94_char set>     ; (41-176, 33-126.)

671	   char            ::= <any char in 96_char_set>     ; (40-177, 30-127.)

673	Registration of New "charset"s and New MIME parameter

675	1. This document defines the following MIME "charset" names for Chinese
676	   text:

678	       ISO-2022-CN, ISO-2022-CN-EXT
679	       CN-GB, CN-Big5
680	       CN-GB-12345-90
681	       CN-GB-2312-80+GB-8565-88

683	2.  This document defines a new MIME parameter:

685	       charset-variant

687	Acknowledgments

689	   This document is the result of cooperation in the APNG-CC, the
690	   Chinese Character sub-working group of the I18N/L10N
691	   (Internationalization and Localization) working group of APNG
692	   (Asia-Pacific Networking Group), coordinator Zhu Haifeng
693	   <zhf@net.tsinghua.edu.cn>.  The membership of APNG-CC consists
694	   of individuals from both sides of the Taiwan Strait, HongKong,
695	   and from Singapore and other countries.  The authors wish to
696	   thank all members of APNG-CC.

698	   Prof.Yao Shiquan and Ms.Lin Ning of CITS (China Information Technology
699	   Standardization Technical Committee), Prof. Zhao Jingrong, Prof. Li Xing,
700	   and Mr.YouYue of Tsinghua University gave many help in the process
701	   of the work.

703	   Many thanks to Mr. C.J.Cherng and Mr. C.K.Fan of III (Institute for
704	   Information Industry), and Mr. Chang JingShin from Tsinghua University
705	   in Hsinchu, Taiwan.

707	   In particular, Mr.Masataka Ohta, who is the coordinator of APNG-I18N,
708	   contributed many efforts towards the work from the beginning of APNG-CC.

710	   The authors also wish to thank the following people who contributed
711	   in many ways towards this draft.

713	     Martin J. Duerst           Kenichi Handa
714	     Zhang Ling                 Zhang ZhouCai
715	     Zhu Bin                    Nelson Chin
716	     Lu Chin                    Ding ZyKaan
717	     Chen Shuyi                 Mao Yonggang
718	     Mao Yonggang               Ken Lunde
719	     Lua Kim Teng               Victor Cheng
720	     Stephen G. Simpson         Yuan Jiang
721	     Liu HuiFang                Harald T. Alvestrand
722	     Feng Hui

724	Security Considerations

726	   Security issues are not discussed in this memo.

728	Authors' Addresses

730	   Zhu,Hai-feng  (HF. Zhu)
731	   Dept. of Computer Science & Technology
732	   Tsinghua University
733	   Beijing, 100084
734	   China

736	   Tel: +86-1-2561144 ext. 3492
737	   Fax: +86-1-2564173
738	   Email: zhf@net.edu.cn, zhf@net.tsinghua.edu.cn

740	   Hu,Dao-yuan  (DY. Hu)
741	   Tsinghua Networking Center
742	   Tsinghua University
743	   Beijing, 100084
744	   China

746	   Tel: +86-1-2594016
747	   Fax: +86-1-2564173
748	   Email: hdy@tsinghua.edu.cn

750	   Wang,Zhi-guan  (ZG. Wang)
751	   SubCommitte 2 (SC2)
752	   China Information Technology Standardization Technical Committee
753	   (CITS)
754	   Beijing, 100083
755	   China

757	   Tel: +86-1-4012392
758	   Fax: +86-1-4010601

760	   Kao,Tien-cheu (TC. Kao)
761	   I.T. Promotion Division
762	   Institute for Information Industry(III)
763	   Taipei
764	   Taiwan

766	   Tel: +886-2-5631688
767	   Fax: +886-2-563-4209
768	   Email: tckao@iiidns.iii.org.tw

770	   Chang,Wen-chung  (WC. Chang)
771	   Institute for Information Industry(III)
772	   Taipei
773	   Taiwan

775	   Tel: +886-2-7327771
776	   Fax: +886-2-7370188
777	   Email: chung@iiidns.iii.org.tw

779	   Mark R. Crispin
780	   Networks and Distributed Computing
781	   University of Washington
782	   4545 15th Avenue NE
783	   Seattle, WA  98105-4527
784	   USA

786	   Tel: +1 (206) 543-5762
787	   Fax: +1 (206) 685-4045
788	   Email: MRC@CAC.Washington.EDU

790	Appendix -- Conversion Table for CNS-11643 and Big5

792	  This is a conversion table for the Chinese characters in Big5 and
793	  CNS-11643, including some specific characters in Eten variant of Big5.
794	  Noted that this list only contains Chinese characters, symbols are
795	  not provided.  For more complete table, please refer to [CJK] or
796	  the ftp sites listed in section 1.4, where conversion programs are
797	  available.

799	 1. Big5 Level 1 correspondence to CNS 11643-1992 Plane 1:

801	  0xA440-0xACFD <-> 0x4421-0x5322  # Level 1 Chinese start
802	         0xACFE <-> 0x5753
803	  0xAD40-0xAFCF <-> 0x5323-0x5752
804	  0xAFD0-0xBBC7 <-> 0x5754-0x6B4F
805	  0xBBC8-0xBE51 <-> 0x6B51-0x6F5B
806	         0xBE52 <-> 0x6B50
807	  0xBE53-0xC1AA <-> 0x6F5C-0x7534
808	  0xC1AB-0xC2CA <-> 0x7536-0x7736
809	         0xC2CB <-> 0x7535
810	  0xC2CC-0xC360 <-> 0x7737-0x782C
811	  0xC361-0xC3B8 <-> 0x782E-0x7863
812	         0xC3B9 <-> 0x7865
813	         0xC3BA <-> 0x7864
814	  0xC3BB-0xC455 <-> 0x7866-0x7961
815	         0xC456 <-> 0x782D
816	  0xC457-0xC67E <-> 0x7962-0x7D4B  # Level 1 Chinese end

818	 2. Big5 Level 2 correspondence to CNS 11643-1992 Plane 2:

820	  0xC940-0xC949 <-> 0x2121-0x212A
821	         0xC94A  -> 0x4442         # duplicate of 0xA461
822	  0xC94B-0xC96B <-> 0x212B-0x214B
823	  0xC96C-0xC9BD <-> 0x214D-0x217C
824	         0xC9BE <-> 0x214C
825	  0xC9BF-0xC9EC <-> 0x217D-0x224C
826	  0xC9ED-0xCAF6 <-> 0x224E-0x2438
827	         0xCAF7 <-> 0x224D
828	  0xCAF8-0xD6CB <-> 0x2439-0x376E
829	         0xD6CC <-> 0x3E63
830	  0xD6CD-0xD779 <-> 0x3770-0x387D
831	         0xD77A <-> 0x3F6A
832	  0xD77B-0xDADE <-> 0x387E-0x3E62
833	         0xDADF <-> 0x376F
834	  0xDAE0-0xDBA6 <-> 0x3E64-0x3F69
835	  0xDBA7-0xDDFB <-> 0x3F6B-0x4423
836	         0xDDFC  -> 0x4176         # duplicate of 0xDCD1
837	  0xDDFD-0xE8A2 <-> 0x4424-0x554A
838	  0xE8A3-0xE975 <-> 0x554C-0x5721
839	  0xE976-0xEB5A <-> 0x5723-0x5A27
840	  0xEB5B-0xEBF0 <-> 0x5A29-0x5B3E
841	         0xEBF1 <-> 0x554B
842	  0xEBF2-0xECDD <-> 0x5B3F-0x5C69
843	         0xECDE <-> 0x5722
844	  0xECDF-0xEDA9 <-> 0x5C6A-0x5D73
845	  0xEDAA-0xEEEA <-> 0x5D75-0x6038
846	         0xEEEB <-> 0x642F
847	  0xEEEC-0xF055 <-> 0x6039-0x6242
848	         0xF056 <-> 0x5D74
849	  0xF057-0xF0CA <-> 0x6243-0x6336
850	         0xF0CB <-> 0x5A28
851	  0xF0CC-0xF162 <-> 0x6337-0x642E
852	  0xF163-0xF16A <-> 0x6430-0x6437
853	         0xF16B <-> 0x6761
854	  0xF16C-0xF267 <-> 0x6438-0x6572
855	         0xF268 <-> 0x6934
856	  0xF269-0xF2C2 <-> 0x6573-0x664C
857	  0xF2C3-0xF374 <-> 0x664E-0x6760
858	  0xF375-0xF465 <-> 0x6762-0x6933
859	  0xF466-0xF4B4 <-> 0x6935-0x6961
860	         0xF4B5 <-> 0x664D
861	  0xF4B6-0xF4FC <-> 0x6962-0x6A4A
862	  0xF4FD-0xF662 <-> 0x6A4C-0x6C51
863	         0xF663 <-> 0x6A4B
864	  0xF664-0xF976 <-> 0x6C52-0x7165
865	  0xF977-0xF9C3 <-> 0x7167-0x7233
866	         0xF9C4 <-> 0x7166
867	         0xF9C5 <-> 0x7234
868	         0xF9C6 <-> 0x7240
869	  0xF9C7-0xF9D1 <-> 0x7235-0x723F
870	  0xF9D2-0xF9D5 <-> 0x7241-0x7244

872	 3. Big5 Level 2 correspondence to CNS 11643-1992 Plane 3:

874	         0xF9D6 <-> 0x4337         # ETen-specific Chinese
875	         0xF9D7 <-> 0x4F50         # ETen-specific Chinese
876	         0xF9D8 <-> 0x444E         # ETen-specific Chinese
877	         0xF9D9 <-> 0x504A         # ETen-specific Chinese
878	         0xF9DA <-> 0x2C5D         # ETen-specific Chinese
879	         0xF9DB <-> 0x3D7E         # ETen-specific Chinese
880	         0xF9DC <-> 0x4B5C         # ETen-specific Chinese

882	References

884	   [ASCII] American National Standards Institute, "Coded character set
885	   -- 7-bit American National Standard Code for Information
886	   Interchange", ANSI X3.4-1986.

888	   [BIG5] Institute for Information Industry, "Chinese Coded
889	   Character Set in Computer ", March, 1984

891	   [CJK] Ken Lunde, On-line documentation of Chinese/Japanese/Korean
892	   Information Processing, 1995, available at:
893	   ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf

895	   [CNS-5205] "Information processing -- 7-Bit Coded Character Set For
896	   Information Interchange", CNS-5205.

898	   [CNS-11643] "Chinese Standard Interchange Code", CNS-11643 version
899	   1992; "Standard Interchange Code for Generally-Used Chinese
900	   Characters", CNS 11643 version 1986.

902	   [GB-1988] "7-bit Coding Character Set for Information Interchange",
903	   GB 1988-80.

905	   [GB-2312] "Coding of Chinese Ideogram Set for Information Interchange
906	   Basic Set",  GB 2312-80.

908	   [GB-7589] "Code of Chinese Ideograms Set for Information Interchange,
909	   the 2nd Supplementary Set", UDC 681.3.048, GB 7589-87.

911	   [GB-7590] "Code of Chinese Ideogram Set for Information Interchange,
912	   the 4th Supplementary Set",UDC 681.3.048, GB 7590-87.

914	   [GB-8565] "Information Processing Coded Character Sets for Text
915	   Communication", UDC 681.3,  GB 8565-88.

917	   [GB-12345] "Code of Chinese Ideogram Set for Information Interchange
918	   Supplementary Set", GB/T 12345-90.

920	   [GB-13000]  "Information technology--Universal Multiple-Octet Coded
921	   Character Set(UCS)---Part 1: Architecture and Basic Multilingual Plane",
922	   GB13000.1

924	   [GB-13131] "Code of Chinese Ideogram Set for Information Interchange,
925	   the 3rd Supplementary Set", GB 13131-91.

927	   [GB-13132] "Code of Chinese Ideogram Set for Information Interchange,
928	   the 5th Supplementary Set", GB 13132-91.

930	   [ISO-646] International Organization for Standardization (ISO),
931	   "Information technology -- ISO 7-bit coded character set for
932	   information interchange", International Standard, Ref. No. ISO/IEC
933	   646:1991.

935	   [ISO-2022] International Organization for Standardization (ISO),
936	   "Information processing -- ISO 7-bit and 8-bit coded character sets
937	   -- Code extension techniques", International Standard, Ref. No. ISO
938	   2022-1986 (E).

940	   [ISO-10021] Information Technology - Text communication -
941	   Message-Oriented Text Interchange Systems (MOTIS), ISO 10021,
942	   October 1988.

944	   [ISO-10646] ISO/IEC 10646-1:1993(E) Information Technology--Universal
945	   Multiple-octet Coded Character Set (UCS)---Part 1: Architecture and
946	   Basic Multilingual Plane"

948	   [ISOREG] International Organization for Standardization (ISO),
949	   "International Register of Coded Character Sets To Be Used With
950	   Escape Sequences".

952	   [MIME-1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet
953	   Mail Extensions) Part One: Mechanisms for Specifying and Describing
954	   the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
955	   September 1993.

957	   [MIME-2] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
958	   Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
959	   University of Tennessee, September 1993.

961	   [RFC-822] Crocker, D., "Standard for the Format of ARPA Internet Text
962	   Messages", STD 11, RFC 822, University of Delaware, August 1982.

964	   [RFC-1036] Horton M., and Adams, R., "Standard for Interchange of
965	   USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for
966	   Seismic Studies, December 1987.

968	   [RFC-1468] Murai J., Crispin  M.   and  E. van  der  Poel,   Japanese
969	   Character Encoding for Internet Messages, June 1993.

971	   [RFC-1557] Choi U., Chon K. and Park H.,  Korean  Character  Encoding
972	   for Internet Messages, December 1993.

974	   [RFC-1641] Goldsmith D., and Davis M., "Using Unicode with MIME", RFC
975	   1641, Taligent Inc., July 1994

977	   [RFC-1642] Goldsmith D., and Davis M.," UTF-7, A Mail-Safe Transformation
978	   Format of Unicode", July 1994

980	   [RFC-1700] Reynolds J., and Postel J., "Assigned Numbers",RFC 1700,
981	   STD 2, ISI, October 1994

983	   [SMTP] Postel, Jonathan B. "Simple Mail Transfer Protocol", STD 10,
984	   RFC 821, USC/Information Sciences Institute, August 1982.

986	   [SMTPEXT] Klensin, J.; Freed, N.; Rose, M.; Stefferud, E.; and
987	   Crocker, D., "SMTP Service Extensions", RFC 1651, July 1994.

989	   [Unicode 1.1] "The Unicode Standard, Version 1.1",
990	   Addison-Wesley, Reading, MA (to be published; the contents
991	   of this standard is currently available by combining
992	   [Unicode92], [Unicode93], and [Unicode4]).

994	   [Unicode92] The Unicode Consortium, "The Unicode Standard -
995	   Worldwide Character Encoding - Version 1.0", Volume 1,
996	   Addison-Wesley, Reading, MA, 1992 (ISBN 0-201-56788-1).

998	   [Unicode93] The Unicode Consortium, "The Unicode Standard -
999	   Worldwide Character Encoding - Version 1.0", Volume 2,
1000	   Addison-Wesley, Reading, MA, 1992 (ISBN 0-201-60845-6).

1002	   [Unicode4] The Unicode Consortium, "The Unicode Standard -
1003	   Version 1.1 (Prepublication Edition)", Unicode Technical
1004	   Report #4 (avaliable from the Unicode Consortium).