<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc rfcedstyle="yes"?>
<?rfc subcompact="no"?>
<?rfc symrefs="yes"?>
<?rfc comments="yes" ?>
<?rfc inline="yes" ?>

<rfc ipr="trust200902" category="bcp" obsoletes="2978" docName='draft-iana-charset-reg-procedure-01'>
  <front>
    <title abbrev="IANA Charset Registration">
      IANA Charset Registration Procedures
    </title>

    <author initials="M." surname="Mcfadden" fullname="Mark Mcfadden">
      <organization>IANA</organization>
      <address>
        <email>mark.mcfadden@icann.org</email>
      </address>
    </author>
    <author initials="A." surname="Melnikov" fullname="Alexey Melnikov" role="editor">
      <organization>Isode Ltd</organization>
      <address>
        <postal>
          <street>14 Castle Mews</street>
          <city>Hampton</city>
          <region>Middlesex</region>
          <code>TW12 2NP</code>
          <country>UK</country>
        </postal>
        <email>Alexey.Melnikov@isode.com</email>
      </address>
    </author>

    <date year="2015"/>

    <keyword>Charset</keyword>

    <abstract>
      <t>
        Multipurpose Internet Mail Extensions (MIME) (RFC-2045, RFC-2046,
        RFC-2047, RFC-2231) and various other Internet protocols are capable
        of using many different charsets.  This in turn means that the
        ability to label different charsets is essential.
      </t>

      <t>
        This document obsoletes the IANA Charset Registration Procedures
        originally defined in <xref target="RFC2978"/>.
        Specifically, this document completely revises the registration
        procedures and the charset registries.  The charset registry is now
        divided into three parts with separate registration procedures for
        each.
      </t>

      <t>
        Note: The charset registration procedure exists solely to associate a
        specific name or names with a given charset and to give an indication
        of whether or not a given charset can be used in MIME text objects.
        In particular, the general applicability and appropriateness of a
        given registered charset to a particular application is a protocol
        issue, not a registration issue, and is not dealt with by this
        registration procedure.
      </t>
    </abstract>

  </front>
  <middle>
    
    
    <section title="Definitions and Notation">

      <t>
      The following sections define terms used in this document.
      </t>

      <section title="Requirements Notation">

          <t>
          The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
          "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
          this document are to be interpreted as described in <xref target="RFC2119"/>.
          </t>
        
      </section>

      <section title="Character">

          <t>
          A member of a set of elements used for the organization, control, or
          representation of data.
          </t>

      </section>

      <section title="Charset">

          <t>
          The term "charset" (referred to as a "character set" in previous
          versions of this document) is used here to refer to a method of
          converting a sequence of octets into a sequence of characters.  This
          conversion may also optionally produce additional control information
          such as directionality indicators.
          </t>

          <t>
          Note that unconditional and unambiguous conversion in the other
          direction is not required, in that not all characters may be
          representable by a given charset and a charset may provide more than
          one sequence of octets to represent a particular sequence of
          characters.
          </t>

          <t>
          This definition is intended to allow charsets to be defined in a
          variety of different ways, from simple single-table mappings such as
          US-ASCII <xref target="RFC0020"/> to complex table switching methods such as those that use
          ISO 2022's <xref target="ISO-2022"/> techniques.  However, the definition associated with a
          charset name must fully specify the mapping to be performed.  In
          particular, use of external profiling information to determine the
          exact mapping is not permitted.
        </t>

          <t>
          HISTORICAL NOTE: The term "character set" was originally used in MIME
          to describe such straightforward schemes as US-ASCII and ISO-8859-1 <xref target="ISO-8859"/>
          which consist of a small set of characters and a simple one-to-one
          mapping from single octets to single characters.  Multi-octet
          character encoding schemes and switching techniques make the
          situation much more complex.  As such, the definition of this term
          was revised to emphasize both the conversion aspect of the process,
          and the term itself has been changed to "charset" to emphasize that
          it is not, after all, just a set of characters.  A discussion of
          these issues as well as specification of standard terminology for use
          in the IETF appears in <xref target="RFC2130"/>.
          </t>

      </section>
      
      <section title="Coded Character Set">

          <t>
          A Coded Character Set (CCS) is a one-to-one mapping from a set of
          abstract characters to a set of integers.  Examples of coded
          character sets are ISO 10646 <xref target="ISO-10646"/>, US-ASCII <xref target="RFC0020"/>, and
          the ISO-8859 series <xref target="ISO-8859"/>.
          </t>
        
      </section>

      <section title="Character Encoding Scheme">

          <t>
          A Character Encoding Scheme (CES) is a mapping from a Coded Character
          Set or several coded character sets to a set of octet sequences.  A
          given CES is sometimes associated with a single CCS; for example,
          UTF-8 <xref target="RFC3629"/> applies only to ISO 10646.
          </t>

      </section>

    </section>

    <section title="Charset Registration Requirements">

        <t>
        Registered charsets are expected to conform to a number of
        requirements as described below.
        </t>

      <section title="Required Characteristics">

        <t>
        Registered charsets MUST conform to the definition of a "charset"
        given above.  In addition, charsets intended for use in MIME content
        types under the "text" top-level media type MUST conform to the
        restrictions on that type described in <xref target="RFC2045"/>.
<!--////Was the aobve restriction relaxed/removed for HTTP?-->        
          
        All registered
        charsets MUST note whether or not they are suitable for use in MIME
        text.
        </t>

        <t>
        All charsets which are constructed as a composition of one or more
        CCS's and a CES MUST either include the CCS's and CES they are based
        on in their registration or else cite a definition of their CCS's and
        CES that appears elsewhere.
        </t>

        <t>
        All registered charsets MUST be specified in a stable, openly
        available specification.  Registration of charsets whose
        specifications aren't stable and openly available is forbidden.
        </t>

      </section>

      <section title="New Charsets">

        <t>
        This registration mechanism is not intended to be a vehicle for the
        design and definition of entirely new charsets.  This is due to the
        fact that the registration process does NOT contain adequate review
        mechanisms for such undertakings.
        </t>

        <t>
        As such, only charsets defined by other processes and standards
        bodies, or specific profiles or combinations of such charsets, are
        eligible for registration.
        </t>

      </section>

      <section title="Naming Requirements" anchor="syntax">

        <t>
          One or more names MUST be assigned to all registered charsets.
          Multiple names for the same charset are permitted, but if multiple
          names are assigned a single primary name for the charset MUST be
          identified.  All other names are considered to be aliases for the
          primary name and use of the primary name is preferred over use of any
          of the aliases.
        </t>

        <t>
          Each assigned name MUST uniquely identify a single charset.  All
          charset names MUST be suitable for use as the value of a MIME content
          type charset parameter and hence MUST conform to MIME parameter value
          syntax (see Section 5.1 of RFC 2045).  This applies even if the specific charset being registered
          is not suitable for use with the "text" media type.  All charsets
          MUST be assigned a name that provides a display string for the
          associated "MIBenum" value defined below.  These "MIBenum" values are

<!--////
          [CHARMIB]  IANA Character Set MIB: http://www.iana.org/assignments/ianacharset-mib
-->

          defined by and used in the Printer MIB <xref target="RFC1759"/>.
          [[RFC 1759 got obsoleted by RFC 3805 and MIBEnum is no longer there.
          Should we point to http://www.iana.org/assignments/ianacharset-mib instead?]]

          Such names MUST
          begin with the letters "cs" and MUST contain no more than 40
          characters (including the "cs" prefix) chosen from from the printable
          subset of US-ASCII.  Only one name beginning with "cs" may be
          assigned to a single charset.  If no name of this form is explicitly
          defined IANA will assign an alias consisting of "cs" prepended to the
          primary charset name.
        </t>

        <t>
          Finally, charsets being registered for use with the "text" media type
          MUST have a primary name that conforms to the more restrictive syntax
          of the charset field in MIME encoded-words <xref target="RFC2047"/> <xref target="RFC2231"/> and
          MIME extended parameter values <xref target="RFC2231"/>.  A combined ABNF <xref target="RFC5234"/>
          definition for such names is as follows:"</t>

<figure><artwork type="ABNF">
<![CDATA[
    mime-charset = 1*mime-charset-chars
    mime-charset-chars = ALPHA / DIGIT /
               "!" / "#" / "$" / "%" / "&" /
               "+" / "-" / "^" / "_" / "`" / 
               "{" / "}" / "~"
    ALPHA = "A".."Z"           ; Case insensitive ASCII Letter
    DIGIT = "0".."9"           ; Numeric digit
]]>
</artwork></figure>

      </section>

      <section title="Functionality Requirement">

        <t>
        Charsets MUST function as actual charsets: Registration of things
        that are better thought of as a transfer encoding, as a media type <xref target="RFC2046"/>,
        or as a collection of separate entities of another type, is not
<!--////Add ref to HTML4.0-->
        allowed.  For example, although HTML could theoretically be thought
        of as a charset, it is really better thought of as a media type and
        as such it cannot be registered as a charset.
        </t>

      </section>

      <section title="Usage and Implementation Requirements">

        <t>
          Use of a large number of charsets in a given protocol may hamper
          interoperability.  However, the use of a large number of undocumented
          and/or unlabeled charsets hampers interoperability even more.
        </t>

        <t>
          A charset should therefore be registered ONLY if it adds significant
          functionality that is valuable to a large community, OR if it
          documents existing practice in a large community.  Note that charsets
          registered for the second reason should be explicitly marked as being
          of limited or specialized use and should only be used in Internet
          messages with prior bilateral agreement.
        </t>

      </section>

      <section title="Publication Requirements">

        <t>
          Charset registrations MAY be published in RFCs, however, RFC
          publication is not required to register a new charset.
        </t>

        <t>
          The registration of a charset does not imply endorsement, approval,
          or recommendation by the IANA, IESG, or IETF, or even certification
          that the specification is adequate.  It is expected that
          applicability statements for particular applications will be
          published from time to time that recommend implementation of, and
          support for, charsets that have proven particularly useful in those
          contexts.
        </t>

        <t>
          Charset registrations SHOULD include a specification of mapping from
          the charset into ISO 10646 (Unicode) <xref target="Unicode7.0"/> if specification of such a mapping is
          feasible.
        </t>

      </section>

      <section title="MIBenum Requirements">

        <t>
          Each registered charset MUST also be assigned a unique enumerated
          integer value.  These "MIBenum" values are defined by and used in the
<!--////Again, update the reference as above?-->
          Printer MIB <xref target="RFC1759"/>."</t>

        <t>
          A MIBenum value for each charset will be assigned by IANA at the time
          of registration.  MIBenum values are not assigned by the person
          registering the charset.
        </t>

      </section>
      
    </section>

    <section title="The Charset Registry" anchor="charset-reg">

        <t>
        The following procedure has been implemented by the IANA for review
        and approval of new charsets.  In <xref target="RFC2978"/> an Expert Review process
        was used to add new charsets into the registry.  This document
        changes that model by creating a new charset registry with three new
        subregistries.  For each of the new registries, the registration
        procedures and initial registrations are provided.
        </t>

      <section title="The Recommended charset registry">

        <t>
        The first sub-registry of the full charset registry is the
        "recommended" charset registry.
        </t>

        <t>
        New registrations in the "recommended" charset registry require
        "Standards Action" as defined by <xref target="RFC5226"/>.  Specifically, the charset
        MUST have a standards track RFC that defines the charset itself and
        MUST ALSO have a standards track RFC recommending its use.
        </t>

        <t>
        In the RFC that defines the charset, the document MUST have a single
        recommended MIME charset label following the "mime-charset" syntax defined in <xref target="syntax"/>.
        It MUST also state whether it is suitable for MIME text and have a reference
        to a formal specification or translation table to Unicode <xref target="Unicode7.0"/>.
        </t>

        <t>
        There is one, initial entry in the Recommended charset registry:
        UTF-8 <xref target="RFC3629"/>.
        </t>

      </section>

      <section title="The Widely-used Open Standard charset registry">

        <t>
        The second sub-registry of the full charset registry is the "Widely-used Open Standard"
        charset registry.
        </t>

        <t>
        New registrations in the "Widely-used Open Standard" charset registry
        require "Expert Review" as defined by <xref target="RFC5226"/>.
        In <xref target="reg-template"/> of this
        document a template is provided that allows proposals for new
        charsets in this subregistry.
        </t>

        <t>
        In the template that describes the charset, the template MUST provide
        a single recommended MIME charset label following the "mime-charset" syntax defined in <xref target="syntax"/>.
        It MUST ALSO state whether it is suitable for MIME text and have a
        reference to a formal specification or translation table to Unicode.
        </t>

        <t>
        The following charsets are to be moved from the historic charset
        registry into the new "Widely-used Open Standard" subregistry:
<!--////TBD-->
        INSERT A LIST OF CHARSET NAMES HERE.  [[GUIDANCE IS REQUIRED FOR THIS ENTRY]]
        </t>

        <section title="Submitting &quot;Widely-used Open Standard&quot; charset Proposals to the IETF Community">

        <t>
        Send the proposed "Widely-used Open Standard" charset proposal to the
        "ietf-charsets@iana.org" mailing list.  (Information about joining
        this list is available on the IANA Website, http://www.iana.org.)
        This mailing list has been established for the sole purpose of
        reviewing proposed charset registrations.  Proposed charsets are not
<!--////Is use of "x-" for this registry still a good idea?-->        
        formally registered and must not be used; the "x-" prefix specified
        in <xref target="RFC2045"/> can be used until registration is complete.
        </t>

        <t>
        The posting of a charset to the list initiates a two week public
        review process.
        </t>

        <t>
        The intent of the public posting is to solicit comments and feedback
        on the definition of the charset and the name chosen for it.
        </t>

        </section>
        
<!--////Is this registration template generic for all 3 subregistries? -->

        <section title="IANA Charset Registration Template" anchor="reg-template">

        <t>
        To: ietf-charsets@iana.org
        </t>

        <t>
        Subject: Registration of new charset [names]
        </t>

        <t>
        Charset name:<vspace blankLines='1'/>

        (All names must be suitable for use as the value of a MIME Content-Type parameter, see Section 5.1 of RFC 2045.)

        </t>

        <t>
        Charset aliases:<vspace blankLines='1'/>

        (All aliases must also be suitable for use as the value of a MIME
        content-type parameter.)
        </t>

        <t>
        Suitability for use in MIME text:
        </t>

        <t>
        Published specification(s):<vspace blankLines='1'/>

        (A specification for the charset MUST be openly available that
        accurately describes what is being registered.  If a charset is
        defined as a composition of one or more CCS's and a CES then these
        definitions MUST either be included or referenced.)
        </t>

        <t>
        ISO 10646 equivalency table:<vspace blankLines='1'/>

        (A URI to a specification of how to translate from this charset to
        ISO 10646 and vice versa SHOULD be provided.)
        </t>

        <t>
        Additional information:
        </t>

        <t>
        Person &amp; email address to contact for further information:
        </t>

        <t>
        Intended usage:<vspace blankLines='1'/>

        (One of COMMON, LIMITED USE or OBSOLETE)
        </t>

        </section>
        
        <section title="Charset Reviewer">

        <t>
        When the two week period has passed and the registration proposer is
        convinced that consensus has been achieved, the registration
        application should be submitted to IANA and the charset reviewer.
        The charset reviewer, who is appointed by the IETF Applications Area
        Director(s), either approves the request for registration or rejects
        it.  Rejection may occur because of significant objections raised on
        the list or objections raised externally.  If the charset reviewer
        considers the registration sufficiently important and controversial,
        a last call for comments may be issued to the full IETF.  The charset
        reviewer may also recommend standards track processing (before or
        after registration) when that appears appropriate and the level of
        specification of the charset is adequate.
        </t>

        <t>
        The charset reviewer must reach a decision and post it to the ietf-charsets
        mailing list within two weeks.  Decisions made by the
        reviewer may be appealed to the IESG.
        </t>

        </section>
        
        <section title="IANA Registration of &quot;Widely-used Open Standard&quot; charsets">
        
        <t>
        Provided that the charset registration has either passed review or
        has been successfully appealed to the IESG, the IANA will register
        the charset, assign a MIBenum value and make its registration
        available to the community.
        </t>

        </section>

      </section>

      <section title="The Other charset subregistry">
        
        <t>
        The third subregistry is for all other charsets.  Registration of
        charsets in the "other" charset subregistry is done on a
        "First Come, First Served" basis as defined by <xref target="RFC5226"/>.
        
<!--////Should the document say that all remaining currently registered charsets will be migrated to this subregistry?-->
          
<!--////Are new registrations in this subregistry required to use the same registration template?-->          

        </t>

      </section>

    </section>

    <section title="IANA Considerations" anchor="iana-cons">

      <t>
      This document requests that IANA completely revise the existing
      charset registry.
      The new registry shold be divided into three subregistries.  These
      subregistries are: "Recommended charsets", "Widely-used Open Standard charsets"
      and "Other charsets".
      </t>

<!--////A lot of instructions are repeated earlier in this document. They shoudln't be!!!-->
      
      <t>
      The registration procedure for the "Recommended charset" subregistry
      is Standards Action required.  IANA is directed to move the following
      entries from the <xref target="RFC2978"/> legacy registry to this subregistry:
      UTF-8 <xref target="RFC3629"/>.
      </t>
      
      <t>
<!--////Update the list as above. Or just reference the above text.-->        
      The registration procedure for the "Widely-used Open Standard
      charset" subregistry is Expert Review.  IANA is directed to move the
      following entries from the <xref target="RFC2978"/> legacy registry to this
      subregistry: INSERT A LIST OF CHARSET NAMES HERE.  [[GUIDANCE IS
      REQUIRED FOR THIS ENTRY]]
      </t>
      
      <t>
<!--////Chris Newman will help with these?-->        
      The registration procedure for the "Other charset" subregistry is
      First Come First Served.  IANA is directed to move the following entries from the <xref target="RFC2978"/>
      legacy registry to this subregistry: INSERT A LIST OF CHARSET
      NAMES HERE.  [[GUIDANCE IS REQUIRED FOR THIS ENTRY]]
      </t>

  <!--////Change this if this turns out not to be true (see question above)-->
      <t>In all cases the registration template specified in <xref target="reg-template"/> must be used.</t>
      
      
      <section title="Publication of Registered Charset List">

<!--///It might not be a good idea to tell IANA what to use for the format: it is IANA's internal issue.-->
    <t>
    This document directs IANA to create a new XML-based registry for
    charset registrations.  This registry will be divided into three
    subregistries as specified in <xref target="charset-reg"/> of this document."</t>
      
    <t>
    New charset registrations will be published in the new, XML-based
    registry.  The proposed charset will use the approval process
    appropriate for the indended, designated subregistry.
    </t>
      
    <t>
    Legacy charset registrations will be converted to the new XML
    registry.  The instructions for converting the legacy registrations
    into entries in the new subregistries are documented in <xref target="iana-cons"/> of this document.
    </t>
      
    <t>
    HISTORICAL NOTE: Previously, charset registrations were posted in the
    anonymous FTP file
    "ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets" and all
    registered charsets were listed in the periodically issued
    "Assigned Numbers" RFC.
    </t>
    
      </section>
    
    </section>

    <section title="Security Considerations" anchor="seccons">

      <t>
      The conversion of this IANA registry - and the changes made to the
      registration procedures for the new subregistries - introduces no
      known security considerations.  Security issues that relate to
      charsets are dealt with in the RFCs that describe the protocols that
      use those charsets.
      </t>

    </section>

    <section title="Acknowledgements">

      <t>This document is a revision of RFC 2978 by Ned Freed and Jon Postel
      and is largely based on their original text.
      </t>

    </section>

  </middle>
  <back>
    <references title="Normative References">

      <?rfc include="reference.RFC.0020"?>
      <?rfc include="reference.RFC.1759"?><!-- Printer MIB -->
      
      <?rfc include="reference.RFC.2119"?><!-- Keywords -->

      <!-- MIME: -->
      <?rfc include="reference.RFC.2045"?>
      <?rfc include="reference.RFC.2046"?>
      <?rfc include="reference.RFC.2047"?>
      
      <?rfc include="reference.RFC.2231"?>
      <?rfc include="reference.RFC.3629"?><!-- UTF-8 -->
      <?rfc include="reference.RFC.5226"?>
      <?rfc include="reference.RFC.5234"?>

<!--///Can we make this reference Unicode version independent?-->
<reference anchor="Unicode7.0" target="http://www.unicode.org/versions/Unicode7.0.0/">
  <front>
    <title>The Unicode Standard, Version 7.0.0</title>
    <author>
      <organization>The Unicode Consortium</organization>
    </author>
    <date year="2014" />
  </front>
</reference>
    
    </references>

    <references title="Informative References">

      <?rfc include="reference.RFC.2978"?>
      <?rfc include="reference.RFC.2130"?>

<reference anchor="ISO-2022">
<front>
<title>
Information technology - Character code structure and extension techniques
</title>
<author>
<organization>International Organization for Standardization</organization>
</author>
<date month="" year="1994"/>
</front>
<seriesInfo name="ISO" value="Standard 2022"/>
</reference>
      
<!--///Need to update the reference?-->
<reference anchor="ISO-10646">
<front>
<title>
Information Technology - Universal Multiple-octet coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane
</title>
<author>
<organization>International Organization for Standardization</organization>
</author>
<date month="May" year="1993"/>
</front>
<seriesInfo name="ISO" value="Standard 10646-1"/>
</reference>

<reference anchor="ISO-8859">
<front>
<title>
Information processing - 8-bit single-byte coded graphic character sets - Part 1: Latin alphabet No. 1 (1987) - Part 2: Latin alphabet No. 2 (1987) - Part 3: Latin alphabet No. 3 (1988) - Part 4: Latin alphabet No. 4 (1988) - Part 5: Latin/Cyrillic alphabet (1988) - Part 6: Latin/Arabic alphabet (1987) - Part 7: Latin/Greek alphabet (1987) - Part 8: Latin/Hebrew alphabet (1988) - Part 9: Latin alphabet No. 5 (1989) - Part 10: Latin alphabet No. 6 (1992)
</title>
<author>
<organization>International Organization for Standardization</organization>
</author>
<date month="" year="1992"/>
</front>
<seriesInfo name="ISO" value="Standard 8859"/>
</reference>
      
    </references>
 
    
    <section title="Changes Since RFC 2978">

<!--////What was in this Appendix which I never saw (due to file truncation)
      Appendix A.  Changes to RFC 2978 . . . . . . . . . . . . . . . . . 13
?      
-->
      <t>Created 3 new subregistries with different IANA registration procedures
      instead of a single existing one.</t>
      
      <t>
      Updated references, split them into Normative and Informative. Erratum 357.
      </t>

      <t>
      Disallow single quotes in charset names (as per RFC 2231). Erratum 1912.
      Note that vertical bar and backslash characters were prohibited in RFC 2978
      (a change from RFC 2278), but the change was never noted in RFC 2978.
      </t>

    </section>

  </back>
</rfc>
