< draft-freed-charset-regist-02.txt   draft-freed-charset-regist-03.txt >
Network Working Group Ned Freed, Innosoft Network Working Group Ned Freed, Innosoft
Internet Draft Jon Postel, ISI Internet Draft Jon Postel, ISI
Obsoletes: 2278 <draft-freed-charset-regist-02.txt> Obsoletes: 2278 <draft-freed-charset-regist-03.txt>
IANA Charset IANA Charset
Registration Procedures Registration Procedures
May 2000 July 2000
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC 2026. with all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working groups. Note that other groups may also distribute working
documents as Internet-Drafts. documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
skipping to change at page 2, line ? skipping to change at page 2, line ?
documents at any time. It is inappropriate to use Internet- documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as Drafts as reference material or to cite them other than as
"work in progress." "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed The list of Internet-Draft Shadow Directories can be accessed
at http://www.ietf.org/shadow.html. at http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2000). All Rights Copyright (C) The Internet Society (2000). All Rights
Reserved. Reserved.
1. Abstract 1. Abstract
MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various
other Internet protocols are capable of using many different other Internet protocols are capable of using many different
charsets. This in turn means that the ability to label charsets. This in turn means that the ability to label
different charsets is essential. different charsets is essential.
skipping to change at page 2, line ? skipping to change at page 2, line ?
2. Definitions and Notation 2. Definitions and Notation
The following sections define terms used in this document. The following sections define terms used in this document.
2.1. Requirements Notation 2.1. Requirements Notation
This document occasionally uses terms that appear in capital This document occasionally uses terms that appear in capital
letters. When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD letters. When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD
NOT", and "MAY" appear capitalized, they are being used to NOT", and "MAY" appear capitalized, they are being used to
indicate particular requirements of this specification. A indicate particular requirements of this specification. A
discussion of the meanings of these terms appears in [RFC- discussion of the meanings of these terms appears in
2119]. [RFC-2119].
2.2. Character 2.2. Character
A member of a set of elements used for the organisation, A member of a set of elements used for the organisation,
control, or representation of data. control, or representation of data.
2.3. Charset 2.3. Charset
The term "charset" (referred to as a "character set" in The term "charset" (referred to as a "character set" in
previous versions of this document) is used here to refer to a previous versions of this document) is used here to refer to a
skipping to change at page 3, line 8 skipping to change at page 3, line 8
Note that unconditional and unambiguous conversion in the Note that unconditional and unambiguous conversion in the
other direction is not required, in that not all characters other direction is not required, in that not all characters
may be representable by a given charset and a charset may may be representable by a given charset and a charset may
provide more than one sequence of octets to represent a provide more than one sequence of octets to represent a
particular sequence of characters. particular sequence of characters.
This definition is intended to allow charsets to be defined in This definition is intended to allow charsets to be defined in
a variety of different ways, from simple single-table mappings a variety of different ways, from simple single-table mappings
such as US-ASCII to complex table switching methods such as such as US-ASCII to complex table switching methods such as
those that use ISO 2022's techniques, to be used as charsets. those that use ISO 2022's techniques. However, the definition
However, the definition associated with a charset name must associated with a charset name must fully specify the mapping
fully specify the mapping to be performed. In particular, use to be performed. In particular, use of external profiling
of external profiling information to determine the exact information to determine the exact mapping is not permitted.
mapping is not permitted.
HISTORICAL NOTE: The term "character set" was originally used HISTORICAL NOTE: The term "character set" was originally used
in MIME to describe such straightforward schemes as US-ASCII in MIME to describe such straightforward schemes as US-ASCII
and ISO-8859-1 which consist of a small set of characters and and ISO-8859-1 which consist of a small set of characters and
a simple one-to-one mapping from single octets to single a simple one-to-one mapping from single octets to single
characters. Multi-octet character encoding schemes and characters. Multi-octet character encoding schemes and
switching techniques make the situation much more complex. As switching techniques make the situation much more complex. As
such, the definition of this term was revised to emphasize such, the definition of this term was revised to emphasize
both the conversion aspect of the process, and the term itself both the conversion aspect of the process, and the term itself
has been changed to "charset" to emphasize that it is not, has been changed to "charset" to emphasize that it is not,
skipping to change at page 3, line 38 skipping to change at page 3, line 37
A Coded Character Set (CCS) is a one-to-one mapping from a set A Coded Character Set (CCS) is a one-to-one mapping from a set
of abstract characters to a set of integers. Examples of coded of abstract characters to a set of integers. Examples of coded
character sets are ISO 10646 [ISO-10646], US-ASCII [US-ASCII], character sets are ISO 10646 [ISO-10646], US-ASCII [US-ASCII],
and the ISO-8859 series [ISO-8859]. and the ISO-8859 series [ISO-8859].
2.5. Character Encoding Scheme 2.5. Character Encoding Scheme
A Character Encoding Scheme (CES) is a mapping from a Coded A Character Encoding Scheme (CES) is a mapping from a Coded
Character Set or several coded character sets to a set of Character Set or several coded character sets to a set of
octet sequences. A given CES is typically associated with a octet sequences. A given CES is sometimes associated with a
single CCS; for example, UTF-8 applies only to ISO 10646. single CCS; for example, UTF-8 applies only to ISO 10646.
3. Charset Registration Requirements 3. Charset Registration Requirements
Registered charsets are expected to conform to a number of Registered charsets are expected to conform to a number of
requirements as described below. requirements as described below.
3.1. Required Characteristics 3.1. Required Characteristics
Registered charsets MUST conform to the definition of a Registered charsets MUST conform to the definition of a
"charset" given above. In addition, charsets intended for use "charset" given above. In addition, charsets intended for use
in MIME content types under the "text" top-level type MUST in MIME content types under the "text" top-level type MUST
conform to the restrictions on that type described in RFC conform to the restrictions on that type described in RFC
2045. All registered charsets MUST note whether or not they 2045. All registered charsets MUST note whether or not they
are suitable for use in MIME text. are suitable for use in MIME text.
All charsets which are constructed as a composition of a CCS All charsets which are constructed as a composition of one or
and a CES MUST either include the CCS and CES they are based more CCS's and a CES MUST either include the CCS's and CES
on in their registration or else cite a definition of their they are based on in their registration or else cite a
CCS and CES that appears elsewhere. definition of their CCS's and CES that appears elsewhere.
All registered charsets MUST be specified in a stable, openly All registered charsets MUST be specified in a stable, openly
available specification. Registration of charsets whose available specification. Registration of charsets whose
specifications aren't stable and openly available is specifications aren't stable and openly available is
forbidden. forbidden.
3.2. New Charsets 3.2. New Charsets
This registration mechanism is not intended to be a vehicle This registration mechanism is not intended to be a vehicle
for the design and definition of entirely new charsets. This for the design and definition of entirely new charsets. This
skipping to change at page 5, line 21 skipping to change at page 5, line 21
MUST contain no more than 40 characters (including the "cs" MUST contain no more than 40 characters (including the "cs"
prefix) chosen from from the printable subset of US-ASCII. prefix) chosen from from the printable subset of US-ASCII.
Only one name beginning with "cs" may be assigned to a single Only one name beginning with "cs" may be assigned to a single
charset. If no name of this form is explicitly defined IANA charset. If no name of this form is explicitly defined IANA
will assign an alias consisting of "cs" prepended to the will assign an alias consisting of "cs" prepended to the
primary charset name. primary charset name.
Finally, charsets being registered for use with the "text" Finally, charsets being registered for use with the "text"
media type MUST have a primary name that conforms to the more media type MUST have a primary name that conforms to the more
restrictive syntax of the charset field in MIME encoded-words restrictive syntax of the charset field in MIME encoded-words
[RFC-2047, RFC-2184] and MIME extended parameter values [RFC- [RFC-2047, RFC-2184] and MIME extended parameter values
2184]. A combined ABNF definition for such names is as [RFC-2184]. A combined ABNF definition for such names is as
follows: follows:
mime-charset = 1*mime-charset-chars mime-charset = 1*mime-charset-chars
mime-charset-chars = ALPHA / DIGIT / mime-charset-chars = ALPHA / DIGIT /
"!" / "#" / "$" / "%" / "&" / "!" / "#" / "$" / "%" / "&" /
"'" / "+" / "-" / "^" / "_" / "'" / "+" / "-" / "^" / "_" /
"`" / "{" / "}" / "~" "`" / "{" / "}" / "~"
ALPHA = "A".."Z" ; Case insensitive ASCII Letter ALPHA = "A".."Z" ; Case insensitive ASCII Letter
DIGIT = "0".."9" ; Numeric digit DIGIT = "0".."9" ; Numeric digit
3.4. Functionality Requirement 3.4. Functionality Requirement
Charsets MUST function as actual charsets: Registration of Charsets MUST function as actual charsets: Registration of
things that are better thought of as a transfer encoding, as a things that are better thought of as a transfer encoding, as a
media type, or as a collection of separate entities of another media type, or as a collection of separate entities of another
type, is not allowed. For example, although HTML could type, is not allowed. For example, although HTML could
theoretically be thought of as a charset, it is really better theoretically be thought of as a charset, it is really better
skipping to change at page 7, line 18 skipping to change at page 7, line 18
Send the proposed charset registration to the "ietf- Send the proposed charset registration to the "ietf-
charsets@iana.org" mailing list. (Information about joining charsets@iana.org" mailing list. (Information about joining
this list is available on the IANA Website, this list is available on the IANA Website,
http://www.iana.org.) This mailing list has been established http://www.iana.org.) This mailing list has been established
for the sole purpose of reviewing proposed charset for the sole purpose of reviewing proposed charset
registrations. Proposed charsets are not formally registered registrations. Proposed charsets are not formally registered
and must not be used; the "x-" prefix specified in RFC 2045 and must not be used; the "x-" prefix specified in RFC 2045
can be used until registration is complete. can be used until registration is complete.
The posting of a charset to the list initiates a two week
public review process.
The intent of the public posting is to solicit comments and The intent of the public posting is to solicit comments and
feedback on the definition of the charset and the name chosen feedback on the definition of the charset and the name chosen
for it over a two week period. for it.
4.2. Charset Reviewer 4.2. Charset Reviewer
When the two week period has passed and the registration When the two week period has passed and the registration
proposer is convinced that consensus has been achieved, the proposer is convinced that consensus has been achieved, the
registration application should be submitted to IANA and the registration application should be submitted to IANA and the
charset reviewer. The charset reviewer, who is appointed by charset reviewer. The charset reviewer, who is appointed by
the IETF Applications Area Director(s), either approves the the IETF Applications Area Director(s), either approves the
request for registration or rejects it. Rejection may occur request for registration or rejects it. Rejection may occur
because of significant objections raised on the list or because of significant objections raised on the list or
objections raised externally. If the charset reviewer objections raised externally. If the charset reviewer
considers the registration sufficiently important and considers the registration sufficiently important and
controversial, a last call for comments may be issued to the controversial, a last call for comments may be issued to the
full IETF. The charset reviewer may also recommend standards full IETF. The charset reviewer may also recommend standards
track processing (before or after registration) when that track processing (before or after registration) when that
appears appropriate and the level of specification of the appears appropriate and the level of specification of the
charset is adequate. charset is adequate.
Decisions made by the reviewer must be posted to the ietf- The charset reviewer must reach a decision and post it to the
charsets mailing list within 14 days. Decisions made by the ietf-charsets mailing list within two weeks. Decisions made by
reviewer may be appealed to the IESG. the reviewer may be appealed to the IESG.
4.3. IANA Registration 4.3. IANA Registration
Provided that the charset registration has either passed Provided that the charset registration has either passed
review or has been successfully appealed to the IESG, the IANA review or has been successfully appealed to the IESG, the IANA
will register the charset, assign a MIBenum value, and make will register the charset, assign a MIBenum value, and make
its registration available to the community. its registration available to the community.
5. Location of Registered Charset List 5. Location of Registered Charset List
skipping to change at page 8, line 44 skipping to change at page 8, line 44
(All aliases must also be suitable for use as the value of (All aliases must also be suitable for use as the value of
a MIME content-type parameter.) a MIME content-type parameter.)
Suitability for use in MIME text: Suitability for use in MIME text:
Published specification(s): Published specification(s):
(A specification for the charset MUST be (A specification for the charset MUST be
openly available that accurately describes what openly available that accurately describes what
is being registered. If a charset is defined as is being registered. If a charset is defined as
a composition of a CCS and a CES then these defintions a composition of one or more CCS's and a CES then these
MUST either be included or referenced.) defintions MUST either be included or referenced.)
ISO 10646 equivalency table: ISO 10646 equivalency table:
(A URL to a specification of how to translate from (A URI to a specification of how to translate from
this charset to ISO 10646 and vice versa SHOULD be this charset to ISO 10646 and vice versa SHOULD be
provided.) provided.)
Additional information: Additional information:
Person & email address to contact for further information: Person & email address to contact for further information:
Intended usage: Intended usage:
(One of COMMON, LIMITED USE or OBSOLETE) (One of COMMON, LIMITED USE or OBSOLETE)
skipping to change at page 10, line 15 skipping to change at page 10, line 15
10. References 10. References
[ISO-2022] [ISO-2022]
International Standard -- Information Processing -- International Standard -- Information Processing --
Character Code Structure and Extension Techniques, Character Code Structure and Extension Techniques,
ISO/IEC 2022:1994, 4th ed. ISO/IEC 2022:1994, 4th ed.
[ISO-8859] [ISO-8859]
International Standard -- Information Processing -- 8-bit International Standard -- Information Processing -- 8-bit
Single-Byte Coded Graphic Character Sets Single-Byte Coded Graphic Character Sets
- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987, 1st ed. - Part 1: Latin Alphabet No. 1, ISO 8859-1:1998, 1st ed.
- Part 2: Latin Alphabet No. 2, ISO 8859-2:1987, 1st ed. - Part 2: Latin Alphabet No. 2, ISO 8859-2:1999, 1st ed.
- Part 3: Latin Alphabet No. 3, ISO 8859-3:1988, 1st ed. - Part 3: Latin Alphabet No. 3, ISO 8859-3:1999, 1st ed.
- Part 4: Latin Alphabet No. 4, ISO 8859-4:1988, 1st ed. - Part 4: Latin Alphabet No. 4, ISO 8859-4:1998, 1st ed.
- Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1988, 1st - Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1999, 2nd
ed. ed.
- Part 6: Latin/Arabic Alphabet, ISO 8859-6:1987, 1st ed. - Part 6: Latin/Arabic Alphabet, ISO 8859-6:1999, 1st ed.
- Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed. - Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed.
- Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1988, 1st ed. - Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1999, 1st ed.
- Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1989, 1st - Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1999, 2nd
ed. ed.
International Standard -- Information Technology -- 8-bit International Standard -- Information Technology -- 8-bit
Single-Byte Coded Graphic Character Sets Single-Byte Coded Graphic Character Sets
- Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1992, - Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1998,
2nd ed.
International Standard -- Information Technology -- 8-bit
Single-Byte Coded Graphic Character Sets
- Part 13: Latin Alphabet No. 7, ISO/IEC 8859-10:1998,
1st ed.
International Standard -- Information Technology -- 8-bit
Single-Byte Coded Graphic Character Sets
- Part 14: Latin Alphabet No. 8 (Celtic), ISO/IEC
8859-10:1998, 1st ed.
International Standard -- Information Technology -- 8-bit
Single-Byte Coded Graphic Character Sets
- Part 15: Latin Alphabet No. 9, ISO/IEC 8859-10:1999,
1st ed. 1st ed.
[ISO-10646] [ISO-10646]
ISO/IEC 10646-1:1993(E), "Information technology -- ISO/IEC 10646-1:1993(E), "Information technology --
Universal Multiple-Octet Coded Character Set (UCS) -- Universal Multiple-Octet Coded Character Set (UCS) --
Part 1: Architecture and Basic Multilingual Plane", Part 1: Architecture and Basic Multilingual Plane",
JTC1/SC2, 1993. JTC1/SC2, 1993.
[RFC-1590] [RFC-1590]
Postel, J., "Media Type Registration Procedure", RFC Postel, J., "Media Type Registration Procedure", RFC
 End of changes. 22 change blocks. 
41 lines changed or deleted 54 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/