idnits 2.17.1 

draft-sullivan-dns-zone-codepoint-pples-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 5, 2012) is 4335 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                        A. Sullivan
3	Internet-Draft                                                 Dyn, Inc.
4	Intended status: Informational                                 D. Thaler
5	Expires: December 7, 2012                                      Microsoft
6	                                                              O. Kolkman
7	                                                              NLnet Labs
8	                                                            June 5, 2012

10	 Principles for Unicode Code Point Inclusion in Labels in the DNS Root
11	               draft-sullivan-dns-zone-codepoint-pples-00

13	Abstract

15	   Traditionally, the management of the DNS root zone permitted only
16	   "alphabetic" labels.  As long as the root zone included only ASCII
17	   characters, and as long as there was only one form of a label, the
18	   restriction plainly meant that only the letters A-Z and a-z were
19	   permitted.  The advent of internationalized labels using IDNA2008
20	   presents some complications for the restriction.  One of the
21	   complications is the meaning of the term "alphabetic" when applied to
22	   the Unicode code points in U-labels.  This memo presents a set of
23	   principles that can be used to determine whether a Unicode code point
24	   may be wisely included in the repertoire of permissible code points
25	   in a U-label in a zone.

27	Status of this Memo

29	   This Internet-Draft is submitted in full conformance with the
30	   provisions of BCP 78 and BCP 79.

32	   Internet-Drafts are working documents of the Internet Engineering
33	   Task Force (IETF).  Note that other groups may also distribute
34	   working documents as Internet-Drafts.  The list of current Internet-
35	   Drafts is at http://datatracker.ietf.org/drafts/current/.

37	   Internet-Drafts are draft documents valid for a maximum of six months
38	   and may be updated, replaced, or obsoleted by other documents at any
39	   time.  It is inappropriate to use Internet-Drafts as reference
40	   material or to cite them other than as "work in progress."

42	   This Internet-Draft will expire on December 7, 2012.

44	Copyright Notice

46	   Copyright (c) 2012 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents
51	   (http://trustee.ietf.org/license-info) in effect on the date of
52	   publication of this document.  Please review these documents
53	   carefully, as they describe your rights and restrictions with respect
54	   to this document.  Code Components extracted from this document must
55	   include Simplified BSD License text as described in Section 4.e of
56	   the Trust Legal Provisions and are provided without warranty as
57	   described in the Simplified BSD License.

59	Table of Contents

61	   1.  Background and Introduction . . . . . . . . . . . . . . . . . . 3
62	     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
63	   2.  Conservatism Principle  . . . . . . . . . . . . . . . . . . . . 4
64	   3.  Inclusion Principle . . . . . . . . . . . . . . . . . . . . . . 4
65	   4.  Simplicity Principle  . . . . . . . . . . . . . . . . . . . . . 4
66	   5.  Predictability Principle  . . . . . . . . . . . . . . . . . . . 5
67	   6.  Stability Principle . . . . . . . . . . . . . . . . . . . . . . 5
68	   7.  Letter Principle  . . . . . . . . . . . . . . . . . . . . . . . 6
69	   8.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
70	   9.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
71	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
72	   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 7
73	   12. Informative References  . . . . . . . . . . . . . . . . . . . . 7
74	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 8

76	1.  Background and Introduction

78	   In recent communications ([IABCOMM1] and [IABCOMM2]), the IAB has
79	   emphasized the importance of conservatism in allocating labels
80	   conforming to IDNA2008 ([RFC5890], [RFC5891], [RFC5892], [RFC5893],
81	   [RFC5894], [RFC5895]) inside the root zone.  Traditional LDH-labels
82	   (see [RFC5890] for definitions of IDNA terms) in the root zone used
83	   only alphabetic characters (i.e., ASCII a-z or A-Z).  Matters are
84	   more complicated with U-labels, however.  The IAB communications
85	   recommended that U-labels permit only code points with a
86	   General_Category (gc) of Ll (Lowercase_Letter), Lo (Other_Letter), or
87	   Lm (Modifier_Letter), but noted that for practical considerations
88	   other code points might be permitted on a case-by-case basis.  In
89	   what follows we will use the Unicode notation; e.g., gc=Ll.

91	   The IAB recommendation does, however, present some problems that need
92	   to be addressed.  First, it is by no means clear that all of the code
93	   points with gc=Lo or gc=Lm and which are permitted under IDNA2008 are
94	   appropriate for the root zone.  To take but one example, the code
95	   point U+02BC MODIFIER LETTER APOSTROPHE has gc=Lm.  In practically
96	   every rendering (we are unaware of an exception), U+02BC is
97	   indistinguishable from U+2019 RIGHT SINGLE QUOTATION MARK, which has
98	   gc=Pf (Final_Punctuation).  U+02BC will also be read by large numbers
99	   of people as being the same character as U+0027 APOSTROPHE, which has
100	   gc=Po (Other_Punctuation).  U+02BC is PROTOCOL VALID (PVALID) under
101	   IDNA2008 (see [RFC5892]), whereas both other code points are
102	   DISALLOWED.  So, to begin with, it is plain that not every code point
103	   with gc in {Ll, Lo, Lm} is consistent with any conservatism
104	   principle.

106	   To make matters worse, some languages are dependent on code points
107	   with gc=Mc (Spacing_Mark) or gc=Mn (Nonspacing_Mark).  This
108	   dependency is particularly common in Indic languages, though not
109	   exclusive to them.  (At the risk of vastly oversimplifying, the
110	   overarching issue is mostly the interaction of complex writing
111	   systems and the way Unicode works.)  To restrict users of those
112	   languages only to code points with gc in {Ll, Lo, Lm} would be
113	   extremely limiting.  While DNS labels are not words, or sentences, or
114	   phrases (as noted in [RFC4690]), they are intended as useful
115	   mnemonics.  Mnemonics that diverge wildly from the usual conventions
116	   in a language are likely to attract strong objections, particularly
117	   in the root.  The objections might drag the discussion away from
118	   sound management of the shared DNS root zone and towards discussions
119	   of cultural hegemony.  That sort of discussion itself might present
120	   risks for the operation of the root zone.

122	   For reasons of sound management, it is not desirable to decide
123	   whether to permit a given code point only when an application
124	   containing that code point is pending.  That approach reduces
125	   predictability and is bound to appear subject to special pleas.  It
126	   is better instead to come up with a set of principles for guiding
127	   decisions about code points.  These principles can then function as
128	   meta-rules, determining the rules for inclusion of any code point
129	   (from those permitted by IDNA) in labels in the root.  The principles
130	   might also be adopted by other zones that are shared by much of the
131	   Internet.  Such a set of principles follows in the sections below.
132	   Each section includes remarks on the extent to which the principle
133	   could be wisely adopted by zones other than the root.

135	1.1.  Terminology

137	   Terms relevant to IDNA2008 can be found in [RFC5890].  Other relevant
138	   internationalization terms are defined in [RFC6365].

140	   This memo does not propose a protocol standard, and the use of words
141	   like "should" follow the ordinary English meaning, and not that laid
142	   out in [RFC2119].

144	2.  Conservatism Principle

146	   The root zone is, by definition, the one DNS zone that must be shared
147	   by everybody.  Therefore, any decision to permit a code point in the
148	   root zone should be as conservative as practicable.  Doubts should
149	   always be resolved in favor of rejecting a code point for inclusion
150	   rather than in favor of including it, in order to minimize risk.

152	   This principle is easily (and wisely) adoptable by any zone.  It is
153	   also the one that is most likely to yield the safest result.

155	3.  Inclusion Principle

157	   Just as IDNA2008 starts from the principle that the Unicode range is
158	   excluded, and then adds code points according to derived properties
159	   of the code points, so the root zone should only permit inclusion of
160	   a code point if it is known to be safe.  The default treatment of a
161	   code point should be that it is excluded.

163	   This principle is easily (and wisely) adoptable by any zone.

165	4.  Simplicity Principle

167	   The rules for determining whether a code point is to be included
168	   should be simple enough that they are readily understood by someone
169	   with a moderate background in the DNS and Unicode issues.  This
170	   principle does not mean that a completely naive person needs to be
171	   able to understand the rationale for why a code point is included,
172	   but it does mean that the reason for inclusion of very peculiar code
173	   points, even if the code points are safe in themselves, will be too
174	   difficult to understand and will therefore be rejected.

176	   The meaning of "simple" or "readily understood" is context dependent.
177	   For instance, the root zone has to serve everyone in the world; for
178	   practical purposes, this means that the reasons for including a code
179	   point need to be comprehensible even to people who cannot use the
180	   script where the code point is found.  In a zone that permits a very
181	   small subset of Unicode characters (for instance, only those needed
182	   to write a single language) and that supports a clearly-delineated
183	   linguistic community (for instance, the speakers of a single language
184	   with well-understood written conventions), more complicated rules
185	   might be acceptable.

187	5.  Predictability Principle

189	   The rules for determining whether a code point is to be included
190	   should be predictable enough that those with the requisite
191	   understanding of DNS, IDNA, and Unicode would all generally reach the
192	   same conclusion.  This is not a requirement for algorithmic treatment
193	   of code points (the difficulties with the Unicode Letter and Mark
194	   categories illustrate why that would be too difficult).  It is rather
195	   to say that the consistent application of professional judgment is
196	   likely to yield the same results; combined with the principle in
197	   Section 2, when results are not predictable the anomalous code point
198	   would not be included.

200	   Just as in Section 4, this principle is not easily extended to zones
201	   lower than the root because what is predictable within a given
202	   language community is possibly very surprising across languages.

204	6.  Stability Principle

206	   Once a code point is permitted, it is at least very hard to stop
207	   permitting that code point.  In general, the list of code points to
208	   be permitted should change very slowly, if at all, and usually only
209	   in the direction of permitting an addition as time and experience
210	   indicates that inclusion of such a code point is both safe and
211	   consistent with these principles.

213	   This principle likely extends to every delegation-centric domain: if
214	   one delegation is permitted to use a code point, it is very hard to
215	   see why others might not.

217	7.  Letter Principle

219	   In keeping with the spirit of the note in [RFC1123] that top-level
220	   labels "will be alphabetic", the rules should not include code points
221	   that are not normally used to write words, or that are in some cases
222	   normally used for purposes other than writing words.  This is not the
223	   same as using Unicode's General_Category to include only letters.
224	   But it is a restriction that expands the possible class of included
225	   code points beyond the Unicode letters, but only expands so far as to
226	   include the things that are normally used the way letters are.  Under
227	   this principle, code points with (for example) gc=Mn might be
228	   included -- but only those that are used to write words and not (for
229	   instance) musical symbols.  This principle should be applied as
230	   narrowly as possible; as [RFC4690] says, "While DNS labels may
231	   conveniently be used to express words in many circumstances, the goal
232	   is not to express words (or sentences or phrases), but to permit the
233	   creation of unambiguous labels with good mnemonic value."

235	   Because the root zone must be shared by everyone, this principle is
236	   more important in it than in zones that are intended for use by
237	   clearly-defined linguistic communities.

239	8.  Conclusion

241	   The foregoing principles could be applied generally when considering
242	   any range of Unicode code points for possible inclusion in the root
243	   zone.  It is worth observing that doing anything (especially in light
244	   of Section 6) implicitly disadvantages communities with a writing
245	   system not yet well understood and not represented in the technical
246	   and policy communities involved in the discussion.  That disadvantage
247	   is to be guarded against as much as practical, but is effectively
248	   impossible to prevent (while still taking action) in light of
249	   imperfect human knowledge.

251	9.  Security Considerations

253	   The principles outlined in this memo are partly intended to reduce
254	   the possibility of confusion among different labels.  While these
255	   principles may contribute to reduction of risk, they are not
256	   sufficient to provide a comprehensive internationalization policy for
257	   zone management.

259	10.  IANA Considerations

261	   None.  RFC Editor: this section may be removed on publication.

263	11.  Acknowledgements

265	   The authors thank the participants in the IAB Internationalization
266	   programme for the discussion of the ideas in this memo.

268	12.  Informative References

270	   [IABCOMM1]
271	              Internet Architecture Board, "IAB Statement: 'The
272	              interpretation of rules in the ICANN gTLD Applicant
273	              Guidebook.'", February 2012.

275	   [IABCOMM2]
276	              Internet Architecture Board, "Response to ICANN questions
277	              concerning 'The interpretation of rules in the ICANN gTLD
278	              Applicant Guidebook'", March 2012.

280	   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
281	              and Support", STD 3, RFC 1123, October 1989.

283	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
284	              Requirement Levels", BCP 14, RFC 2119, March 1997.

286	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
287	              Recommendations for Internationalized Domain Names
288	              (IDNs)", RFC 4690, September 2006.

290	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
291	              Applications (IDNA): Definitions and Document Framework",
292	              RFC 5890, August 2010.

294	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
295	              Applications (IDNA): Protocol", RFC 5891, August 2010.

297	   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
298	              Internationalized Domain Names for Applications (IDNA)",
299	              RFC 5892, August 2010.

301	   [RFC5893]  Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
302	              Internationalized Domain Names for Applications (IDNA)",
303	              RFC 5893, August 2010.

305	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
306	              Applications (IDNA): Background, Explanation, and
307	              Rationale", RFC 5894, August 2010.

309	   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
310	              Internationalized Domain Names in Applications (IDNA)
311	              2008", RFC 5895, September 2010.

313	   [RFC6365]  Hoffman, P. and J. Klensin, "Terminology Used in
314	              Internationalization in the IETF", BCP 166, RFC 6365,
315	              September 2011.

317	Authors' Addresses

319	   Andrew Sullivan
320	   Dyn, Inc.
321	   150 Dow St
322	   Manchester, NH  03101
323	   U.S.A.

325	   Email: asullivan@dyn.com

327	   Dave Thaler
328	   Microsoft
329	   One Microsoft Way
330	   Redmond, WA  98052
331	   U.S.A.

333	   Email: dthaler@microsoft.com

335	   Olaf Kolkman
336	   NLnet Labs
337	   Science Park 400
338	   Amsterdam  1098 XH
339	   The Netherlands

341	   Email: olaf@NLnetLabs.nl