<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC '' 
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
]>


<rfc category="info" ipr="trust200902" docName="draft-sullivan-dns-zone-codepoint-pples-00">
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
  <?rfc toc="yes" ?>
  <?rfc symrefs="yes" ?>
  <?rfc sortrefs="yes"?>
  <?rfc iprnotified="no" ?>
  <?rfc strict="yes" ?>
  <?rfc compact="yes" ?>
  <?rfc comments="yes"?>
  <?rfc inline="yes"?>
  <front>
    <title abbrev="Root Zone Code Point Principles">Principles for
    Unicode Code Point Inclusion in Labels in the DNS Root</title>
    <author initials="A." surname="Sullivan" fullname="Andrew Sullivan">
      <organization>Dyn, Inc.</organization>
      <address>
        <postal>
          <street>150 Dow St</street>
          <city>Manchester</city>
          <region>NH</region>
          <code>03101</code>
          <country>U.S.A.</country>
        </postal>
        <email>asullivan@dyn.com</email>
      </address>
    </author>
    <author initials="D." surname="Thaler" fullname="Dave Thaler">
      <organization>Microsoft</organization>
      <address>
        <postal>
          <street>One Microsoft Way</street>
          <city>Redmond</city>
          <region>WA</region>
          <code>98052</code>
          <country>U.S.A.</country>
        </postal>
        <email>dthaler@microsoft.com</email>
      </address>
    </author>
    <author initials="O." surname="Kolkman" fullname="Olaf Kolkman">
      <organization>NLnet Labs</organization>
      <address>
        <postal>
          <street>Science Park 400</street>
          <city>Amsterdam</city>
          <code>1098 XH</code>
          <country>The Netherlands</country>
        </postal>
        <email>olaf@NLnetLabs.nl</email>
      </address>
    </author>
    <date year="2012"/>
    <abstract>
      <t>
        Traditionally, the management of the DNS root zone permitted
        only "alphabetic" labels.  As long as the root zone included
        only ASCII characters, and as long as there was only one form
        of a label, the restriction plainly meant that only the
        letters A-Z and a-z were permitted.  The advent of internationalized
        labels using IDNA2008 presents some complications for the
        restriction.  One of the complications is the meaning of the
        term "alphabetic" when applied to the Unicode code points in
        U-labels.  This memo presents a set of principles that can be
        used to determine whether a Unicode code point may be wisely
        included in the repertoire of permissible code points in a
        U-label in a zone.
      </t>
    </abstract>
  </front>
  <middle>
    <section title="Background and Introduction">
      <t>In recent communications (<xref target="IABCOMM1" /> and
      <xref target="IABCOMM2" />), the IAB has emphasized the
      importance of conservatism in allocating labels conforming to
      IDNA2008 (<xref target="RFC5890" />, <xref target="RFC5891" />,
      <xref target="RFC5892" />, <xref target="RFC5893" />, <xref
      target="RFC5894" />, <xref target="RFC5895" />) inside the root
      zone.  Traditional LDH-labels (see <xref target="RFC5890" /> for
      definitions of IDNA terms) in the root zone used only alphabetic
      characters (i.e., ASCII a-z or A-Z).  Matters are more
      complicated with U-labels, however.  The IAB communications
      recommended that U-labels permit only code points with a
      General_Category (gc) of Ll (Lowercase_Letter), Lo
      (Other_Letter), or Lm (Modifier_Letter), but noted that for
      practical considerations other code points might be permitted on
      a case-by-case basis.  In what follows we will use the Unicode
      notation; e.g., gc=Ll.</t>

      <t>The IAB recommendation does, however, present some problems
      that need to be addressed.  First, it is by no means clear that
      all of the code points with gc=Lo or gc=Lm and which are
      permitted under IDNA2008 are appropriate for the root zone.  To
      take but one example, the code point U+02BC MODIFIER LETTER
      APOSTROPHE has gc=Lm.  In practically every rendering (we are
      unaware of an exception), U+02BC is indistinguishable from
      U+2019 RIGHT SINGLE QUOTATION MARK, which has gc=Pf
      (Final_Punctuation).  U+02BC will also be read by large numbers
      of people as being the same character as U+0027 APOSTROPHE,
      which has gc=Po (Other_Punctuation).  U+02BC is PROTOCOL VALID
      (PVALID) under IDNA2008 (see <xref target="RFC5892" />), whereas
      both other code points are DISALLOWED.  So, to begin with, it is
      plain that not every code point with gc in {Ll, Lo, Lm} is
      consistent with any conservatism principle.</t>

      <t>To make matters worse, some languages are dependent on code
      points with gc=Mc (Spacing_Mark) or gc=Mn (Nonspacing_Mark).
      This dependency is particularly common in Indic languages,
      though not exclusive to them.  (At the risk of vastly
      oversimplifying, the overarching issue is mostly the interaction
      of complex writing systems and the way Unicode works.)  To
      restrict users of those languages only to code points with gc in
      {Ll, Lo, Lm} would be extremely limiting.  While DNS labels are
      not words, or sentences, or phrases (as noted in <xref
      target="RFC4690" />), they are intended as useful mnemonics.
      Mnemonics that diverge wildly from the usual conventions in a
      language are likely to attract strong objections, particularly
      in the root.  The objections might drag the discussion away from
      sound management of the shared DNS root zone and towards
      discussions of cultural hegemony.  That sort of discussion
      itself might present risks for the operation of the root
      zone.</t>

      <t>For reasons of sound management, it is not desirable to
      decide whether to permit a given code point only when an
      application containing that code point is pending.  That
      approach reduces predictability and is bound to appear subject
      to special pleas.  It is better instead to come up with a set of
      principles for guiding decisions about code points.  These
      principles can then function as meta-rules, determining the
      rules for inclusion of any code point (from those permitted by
      IDNA) in labels in the root.  The principles might also be
      adopted by other zones that are shared by much of the Internet.
      Such a set of principles follows in the sections below.  Each
      section includes remarks on the extent to which the principle
      could be wisely adopted by zones other than the root.</t>

      <section title="Terminology" anchor="terms">
        <t>Terms relevant to IDNA2008 can be found in <xref
        target="RFC5890" />.  Other relevant internationalization
        terms are defined in <xref target="RFC6365" />.</t>

        <t>This memo does not propose a protocol standard, and the use
        of words like "should" follow the ordinary English meaning,
        and not that laid out in <xref target="RFC2119" />.</t>
      </section>

    </section>

    <section title="Conservatism Principle" anchor="conservatism">
      <t>The root zone is, by definition, the one DNS zone that must
      be shared by everybody.  Therefore, any decision to permit a
      code point in the root zone should be as conservative as
      practicable.  Doubts should always be resolved in favor of
      rejecting a code point for inclusion rather than in favor of
      including it, in order to minimize risk.</t>

      <t>This principle is easily (and wisely) adoptable by any zone.  
      It is also the one that is most likely to yield the safest
      result.</t>  
    </section>
    
    <section title="Inclusion Principle" anchor="inclusion">
      <t>Just as IDNA2008 starts from the principle that the Unicode
      range is excluded, and then adds code points according to
      derived properties of the code points, so the root zone should
      only permit inclusion of a code point if it is known to be
      safe.  The default treatment of a code point should be that it
      is excluded.</t>

      <t>This principle is easily (and wisely) adoptable by any zone.</t>
    </section>

    <section title="Simplicity Principle" anchor="simplicity">
      <t>The rules for determining whether a code point is to be
      included should be simple enough that they are readily
      understood by someone with a moderate background in the DNS and
      Unicode issues.  This principle does not mean that a completely
      naive person needs to be able to understand the rationale for
      why a code point is included, but it does mean that the reason
      for inclusion of very peculiar code points, even if the code
      points are safe in themselves, will be too difficult to
      understand and will therefore be rejected.</t>

      <t>The meaning of "simple" or "readily understood" is context
      dependent.  For instance, the root zone has to serve everyone in
      the world; for practical purposes, this means that the reasons
      for including a code point need to be comprehensible even to
      people who cannot use the script where the code point is found.
      In a zone that permits a very small subset of Unicode characters
      (for instance, only those needed to write a single language) and
      that supports a clearly-delineated linguistic community (for
      instance, the speakers of a single language with well-understood
      written conventions), more complicated rules might be
      acceptable.</t>
    </section>

    <section title="Predictability Principle" anchor="predictability">
      <t>The rules for determining whether a code point is to be
      included should be predictable enough that those with the
      requisite understanding of DNS, IDNA, and Unicode would all
      generally reach the same conclusion.  This is not a requirement
      for algorithmic treatment of code points (the difficulties with
      the Unicode Letter and Mark categories illustrate why that would
      be too difficult).  It is rather to say that the consistent
      application of professional judgment is likely to yield the same
      results; combined with the principle in <xref target="conservatism"
      />, when results are not predictable the anomalous code point
      would not be included.</t>

      <t>Just as in <xref target="simplicity" />, this principle is not
      easily extended to zones lower than the root because what is
      predictable within a given language community is possibly very
      surprising across languages.</t>
    </section>

      <section title="Stability Principle" anchor="stability">
        <t>Once a code point is permitted, it is at least
        very hard to stop permitting that code point.  In general, the
        list of code points to be permitted should change very slowly,
        if at all, and usually only in the direction of permitting an
        addition as time and experience indicates that inclusion of
        such a code point is both safe and consistent with these
        principles.</t>

        <t>This principle likely extends to every delegation-centric
        domain: if one delegation is permitted to use a code point, it
        is very hard to see why others might not.</t>
      </section>

    <section title="Letter Principle" anchor="letters">
      <t>In keeping with the spirit of the note in <xref
      target="RFC1123" /> that top-level labels "will be alphabetic",
      the rules should not include code points that are not normally
      used to write words, or that are in some cases normally used for
      purposes other than writing words.  This is not the same as
      using Unicode's General_Category to include only letters.  But
      it is a restriction that expands the possible class of
      included code points beyond the Unicode letters, but only expands so
      far as to include the things that are normally used the way
      letters are.  Under this principle, code points with (for
      example) gc=Mn might be included -- but only those that are used
      to write words and not (for instance) musical symbols.  This
      principle should be applied as narrowly as possible; as <xref
      target="RFC4690" /> says, "While DNS labels may conveniently be
      used to express words in many circumstances, the goal is not to
      express words (or sentences or phrases), but to permit the
      creation of unambiguous labels with good mnemonic value."</t>
      
      <t>Because the root zone must be shared by everyone, this
      principle is more important in it than in zones that are
      intended for use by clearly-defined linguistic communities.</t>
    </section>


    <section title="Conclusion">

      <t>The foregoing principles could be applied generally when
      considering any range of Unicode code points for possible
      inclusion in the root zone.  It is worth observing that doing
      anything (especially in light of <xref target="stability" />)
      implicitly disadvantages communities with a writing system not
      yet well understood and not represented in the technical and
      policy communities involved in the discussion.  That
      disadvantage is to be guarded against as much as practical, but
      is effectively impossible to prevent (while still taking action)
      in light of imperfect human knowledge.</t>
    </section>

    <section title="Security Considerations">
      <t>The principles outlined in this memo are partly intended to
      reduce the possibility of confusion among different labels.
      While these principles may contribute to reduction of risk, they
      are not sufficient to provide a comprehensive
      internationalization policy for zone management.</t>
    </section>

    <section title="IANA Considerations">
      <t>None.  RFC Editor: this section may be removed on
      publication.</t>
    </section>

    <section title="Acknowledgements">
      <t>The authors thank the participants in the IAB
      Internationalization programme for the discussion of the ideas
      in this memo.</t>
    </section>
  </middle>
  <back>
    <references title="Informative References">

<!-- BEGIN INCLUDED references file draft-sullivan-dns-zone-codepoint-pples-00.xml-informative -->


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='IABCOMM1'>

<front>
<title>IAB Statement: 'The interpretation of rules in the ICANN gTLD Applicant Guidebook.'</title>
<author>
<organization>Internet Architecture Board</organization>
<address>
<email>iab@iab.org</email></address></author>
<date year='2012' month='February' /></front>

<format type='TXT' target='http://www.iab.org/documents/correspondence-reports-documents/2012-2/iab-statement-the-interpretation-of-rules-in-the-icann-gtld-applicant-guidebook/' />
</reference>

<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='IABCOMM2'>

<front>
<title>Response to ICANN questions concerning 'The interpretation of rules in the ICANN gTLD Applicant Guidebook'</title>
<author>
<organization>Internet Architecture Board</organization>
<address>
<email>iab@iab.org</email></address></author>
<date year='2012' month='March' /></front>

<format type='TXT' target='http://www.iab.org/documents/correspondence-reports-documents/2012-2/iab-statement-the-interpretation-of-rules-in-the-icann-gtld-applicant-guidebook/' />
</reference>

<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC1123'>

<front>
<title>Requirements for Internet Hosts - Application and Support</title>
<author initials='R.' surname='Braden' fullname='Robert Braden'>
<organization>University of Southern California (USC), Information Sciences Institute</organization>
<address>
<postal>
<street>4676 Admiralty Way</street>
<city>Marina del Rey</city>
<region>CA</region>
<code>90292-6695</code>
<country>US</country></postal>
<phone>+1 213 822 1511</phone>
<email>Braden@ISI.EDU</email></address></author>
<date year='1989' month='October' /></front>

<seriesInfo name='STD' value='3' />
<seriesInfo name='RFC' value='1123' />
<format type='TXT' octets='245503' target='http://www.rfc-editor.org/rfc/rfc1123.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC2119'>

<front>
<title abbrev='RFC Key Words'>Key words for use in RFCs to Indicate Requirement Levels</title>
<author initials='S.' surname='Bradner' fullname='Scott Bradner'>
<organization>Harvard University</organization>
<address>
<postal>
<street>1350 Mass. Ave.</street>
<street>Cambridge</street>
<street>MA 02138</street></postal>
<phone>- +1 617 495 3864</phone>
<email>sob@harvard.edu</email></address></author>
<date year='1997' month='March' />
<area>General</area>
<keyword>keyword</keyword>
<abstract>
<t>
   In many standards track documents several words are used to signify
   the requirements in the specification.  These words are often
   capitalized.  This document defines these words as they should be
   interpreted in IETF documents.  Authors who follow these guidelines
   should incorporate this phrase near the beginning of their document:

<list>
<t>
      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in
      RFC 2119.
</t></list></t>
<t>
   Note that the force of these words is modified by the requirement
   level of the document in which they are used.
</t></abstract></front>

<seriesInfo name='BCP' value='14' />
<seriesInfo name='RFC' value='2119' />
<format type='TXT' octets='4723' target='http://www.rfc-editor.org/rfc/rfc2119.txt' />
<format type='HTML' octets='17491' target='http://xml.resource.org/public/rfc/html/rfc2119.html' />
<format type='XML' octets='5777' target='http://xml.resource.org/public/rfc/xml/rfc2119.xml' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC4690'>

<front>
<title>Review and Recommendations for Internationalized Domain Names (IDNs)</title>
<author initials='J.' surname='Klensin' fullname='J. Klensin'>
<organization /></author>
<author initials='P.' surname='Faltstrom' fullname='P. Faltstrom'>
<organization /></author>
<author initials='C.' surname='Karp' fullname='C. Karp'>
<organization /></author>
<author>
<organization>IAB</organization></author>
<date year='2006' month='September' />
<abstract>
<t>This note describes issues raised by the deployment and use of Internationalized Domain Names.  It describes problems both at the time of registration and for use of those names in the DNS.  It recommends that IETF should update the RFCs relating to IDNs and a framework to be followed in doing so, as well as summarizing and identifying some work that is required outside the IETF.  In particular, it proposes that some changes be investigated for the Internationalizing Domain Names in Applications (IDNA) standard and its supporting tables, based on experience gained since those standards were completed.  This memo provides information for the Internet community.</t></abstract></front>

<seriesInfo name='RFC' value='4690' />
<format type='TXT' octets='100929' target='http://www.rfc-editor.org/rfc/rfc4690.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC5890'>

<front>
<title>Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework</title>
<author initials='J.' surname='Klensin' fullname='J. Klensin'>
<organization /></author>
<date year='2010' month='August' />
<abstract>
<t>This document is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications (IDNA), superseding the earlier version.  It describes the document collection and provides definitions and other material that are common to the set. [STANDARDS-TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='5890' />
<format type='TXT' octets='54245' target='http://www.rfc-editor.org/rfc/rfc5890.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC5891'>

<front>
<title>Internationalized Domain Names in Applications (IDNA): Protocol</title>
<author initials='J.' surname='Klensin' fullname='J. Klensin'>
<organization /></author>
<date year='2010' month='August' />
<abstract>
<t>This document is the revised protocol definition for Internationalized Domain Names (IDNs).  The rationale for changes, the relationship to the older specification, and important terminology are provided in other documents.  This document specifies the protocol mechanism, called Internationalized Domain Names in Applications (IDNA), for registering and looking up IDNs in a way that does not require changes to the DNS itself.  IDNA is only meant for processing domain names, not free text. [STANDARDS-TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='5891' />
<format type='TXT' octets='38105' target='http://www.rfc-editor.org/rfc/rfc5891.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC5892'>

<front>
<title>The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)</title>
<author initials='P.' surname='Faltstrom' fullname='P. Faltstrom'>
<organization /></author>
<date year='2010' month='August' />
<abstract>
<t>This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN).&lt;/t>&lt;t> It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). [STANDARDS-TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='5892' />
<format type='TXT' octets='187370' target='http://www.rfc-editor.org/rfc/rfc5892.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC5893'>

<front>
<title>Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)</title>
<author initials='H.' surname='Alvestrand' fullname='H. Alvestrand'>
<organization /></author>
<author initials='C.' surname='Karp' fullname='C. Karp'>
<organization /></author>
<date year='2010' month='August' />
<abstract>
<t>The use of right-to-left scripts in Internationalized Domain Names (IDNs) has presented several challenges.  This memo provides a new Bidi rule for Internationalized Domain Names for Applications (IDNA) labels, based on the encountered problems with some scripts and some shortcomings in the 2003 IDNA Bidi criterion. [STANDARDS-TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='5893' />
<format type='TXT' octets='38870' target='http://www.rfc-editor.org/rfc/rfc5893.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC5894'>

<front>
<title>Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale</title>
<author initials='J.' surname='Klensin' fullname='J. Klensin'>
<organization /></author>
<date year='2010' month='August' />
<abstract>
<t>Several years have passed since the original protocol for Internationalized Domain Names (IDNs) was completed and deployed.  During that time, a number of issues have arisen, including the need to update the system to deal with newer versions of Unicode.  Some of these issues require tuning of the existing protocols and the tables on which they depend.  This document provides an overview of a revised system and provides explanatory material for its components.  This document is not an Internet Standards Track specification; it is published for informational purposes.</t></abstract></front>

<seriesInfo name='RFC' value='5894' />
<format type='TXT' octets='115174' target='http://www.rfc-editor.org/rfc/rfc5894.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC5895'>

<front>
<title>Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008</title>
<author initials='P.' surname='Resnick' fullname='P. Resnick'>
<organization /></author>
<author initials='P.' surname='Hoffman' fullname='P. Hoffman'>
<organization /></author>
<date year='2010' month='September' />
<abstract>
<t>In the original version of the Internationalized Domain Names in Applications (IDNA) protocol, any Unicode code points taken from user input were mapped into a set of Unicode code points that "made sense", and then encoded and passed to the domain name system (DNS).  The IDNA2008 protocol (described in RFCs 5890, 5891, 5892, and 5893) presumes that the input to the protocol comes from a set of "permitted" code points, which it then encodes and passes to the DNS, but does not specify what to do with the result of user input.  This document describes the actions that can be taken by an implementation between receiving user input and passing permitted code points to the new IDNA protocol.  This document is not an Internet Standards Track specification; it is published for informational purposes.</t></abstract></front>

<seriesInfo name='RFC' value='5895' />
<format type='TXT' octets='16556' target='http://www.rfc-editor.org/rfc/rfc5895.txt' />
</reference>


<?xml version='1.0' encoding='UTF-8'?>

<reference anchor='RFC6365'>

<front>
<title>Terminology Used in Internationalization in the IETF</title>
<author initials='P.' surname='Hoffman' fullname='P. Hoffman'>
<organization /></author>
<author initials='J.' surname='Klensin' fullname='J. Klensin'>
<organization /></author>
<date year='2011' month='September' />
<abstract>
<t>This document provides a list of terms used in the IETF when discussing internationalization.  The purpose is to help frame discussions of internationalization in the various areas of the IETF and to help introduce the main concepts to IETF participants.  This memo documents an Internet Best Current Practice.</t></abstract></front>

<seriesInfo name='BCP' value='166' />
<seriesInfo name='RFC' value='6365' />
<format type='TXT' octets='103155' target='http://www.rfc-editor.org/rfc/rfc6365.txt' />
</reference>

<!-- END INCLUDED references file draft-sullivan-dns-zone-codepoint-pples-00.xml-informative -->
    </references>
  </back>
</rfc>