idnits 2.17.1 

draft-klensin-idna-rfc5891bis-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC5894, but the
     abstract doesn't seem to directly say this.  It does mention RFC5894
     though, so this could be OK.

  -- The draft header indicates that this document updates RFC5890, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC5891, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC5890, updated by this document, for
     RFC5378 checks: 2008-10-14)

     (Using the creation date from RFC5891, updated by this document, for
     RFC5378 checks: 2008-05-22)

     (Using the creation date from RFC5894, updated by this document, for
     RFC5378 checks: 2008-05-13)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 6, 2019) is 1749 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Unicode' is mentioned on line 457, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-LGR3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-MSR4'

  ** Downref: Normative reference to an Informational RFC: RFC 1591

  -- Duplicate reference: RFC5891, mentioned in 'RFC5891Erratum', was also
     mentioned in 'RFC5891'.

  ** Downref: Normative reference to an Informational RFC: RFC 5894

  -- No information found for draft-lgr-procedure-20mar13-en - is the name
     correct?

  -- Duplicate reference: RFC5890, mentioned in 'RFC-Editor-5890Errata', was
     also mentioned in 'RFC5890'.


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Klensin
3	Internet-Draft
4	Updates: 5890, 5891, 5894 (if approved)                       A. Freytag
5	Intended status: Standards Track                             ASMUS, Inc.
6	Expires: January 7, 2020                                    July 6, 2019

8	    Internationalized Domain Names in Applications (IDNA): Registry
9	                    Restrictions and Recommendations
10	                    draft-klensin-idna-rfc5891bis-02

12	Abstract

14	   The IDNA specifications for internationalized domain names combine
15	   rules that determine the labels that are allowed in the DNS without
16	   violating the protocol itself and an assignment of responsibility,
17	   consistent with earlier specifications, for determining the labels
18	   that are allowed in particular zones.  Conformance to IDNA by
19	   registries and other implementations requires both parts.  Experience
20	   strongly suggests that the language describing those responsibilities
21	   was insufficiently clear to promote safe and interoperable use of the
22	   specifications and that more details and discussion of circumstances
23	   would have been helpful.  Without making any substantive changes to
24	   IDNA, this specification updates two of the core IDNA documents (RFC
25	   5980 and 5891) and the IDNA explanatory document (RFC 5894) to
26	   provide that guidance and to correct some technical errors in the
27	   descriptions.

29	Status of This Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at https://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on January 7, 2020.

46	Copyright Notice

48	   Copyright (c) 2019 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (https://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
64	   2.  Registry Restrictions in IDNA2008 . . . . . . . . . . . . . .   4
65	   3.  Progressive Subsets of Allowed Characters . . . . . . . . . .   5
66	   4.  Considerations for For-Profit Domains . . . . . . . . . . . .   7
67	   5.  Other corrections and updates . . . . . . . . . . . . . . . .   9
68	     5.1.  Updates to RFC 5890 . . . . . . . . . . . . . . . . . . .   9
69	     5.2.  Updates to RFC 5891 . . . . . . . . . . . . . . . . . . .  10
70	   6.  Related Discussions . . . . . . . . . . . . . . . . . . . . .  10
71	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
72	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  11
73	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
74	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
75	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  11
76	     10.2.  Informative References . . . . . . . . . . . . . . . . .  12
77	   Appendix A.  Change Log . . . . . . . . . . . . . . . . . . . . .  14
78	     A.1.  Changes from version -00 (2017-03-11) to -01  . . . . . .  14
79	     A.2.  Changes from version -01 (2017-09-12) to -02  . . . . . .  14
80	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  15

82	1.  Introduction

84	   Parts of the specifications for Internationalized Domain Names in
85	   Applications (IDNA) [RFC5890] [RFC5891] [RFC5894] (collectively
86	   known, along with RFC 5892 [RFC5892], RFC 5893 [RFC5893] and updates
87	   to them, as "IDNA2008" (or just "IDNA") impose a requirement that
88	   domain name system (DNS) registries restrict the characters they
89	   allow in domain name labels (see Section 2 below), and the contents
90	   and structure of those labels.  That requirement and restriction are
91	   consistent with the "trustee for the community" requirements of the
92	   original specification for DNS naming and authority [RFC1591].  The
93	   restrictions are intended to limit the permitted characters and
94	   strings to those for which the registries or their advisers have a
95	   thorough understanding and for which they are willing to take
96	   responsibility.

98	   That provision is centrally important because it recognized that
99	   historical relationships and variations among scripts and writing
100	   systems, the continuing evolution of those systems, differences in
101	   the uses of characters among languages (and locations) that use the
102	   same script, and so on make it impossible for a single list of
103	   characters and simple rules to be able to generate an "if we use
104	   these, we will be safe from confusion and various attacks" guideline.

106	   Instead, the algorithm and rules of RFC 5981 and 5982 eliminate many
107	   of the most dangerous and otherwise problematic cases, but cannot
108	   eliminate the need for registries and registrars to understand what
109	   they are doing and taking responsibility for the decisions they make.

111	   The way in which the IDNA2008 specifications expressed these
112	   requirements may have obscured the intention that they actually are
113	   requirements.  Section 2.3.2.3 of the Definitions document [RFC5890]
114	   mentions the need for the restrictions, indicates that they are
115	   mandatory, and points the reader to section 4.3 of the Protocol
116	   document [RFC5891], which in turn points to Section 3.2 of the
117	   Rationale document [RFC5894], with each document providing further
118	   detail, discussion, and clarification.

120	   At the same time, the Internet has evolved significantly since the
121	   management assumptions for the DNS were established with RFC 1591 and
122	   earlier.  In particular, the management and use of domain names have
123	   gone through several transformations.  Recounting of those changes is
124	   beyond the scope of this document but one of them has had significant
125	   practical impact on the degree to which the requirement for registry
126	   knowledge and responsibility is observed in practice.  When RFC 1591
127	   was written, the assumption was that domains at all levels of the DNS
128	   would be operated in the best interest of the registrants in the
129	   domain and of the Internet as a whole.  There were no notions about
130	   domains being operated for a profit and with a business model that
131	   made them more profitable the more names that could be registered (or
132	   even, under some circumstances, reserved and not registered) or that
133	   domains would be considered more successful based on the number of
134	   names registered and delegated from them.  While rarely reflected in
135	   the DNS protocols, the distinction between domains operated in those
136	   ways and ones that are operated for, e.g., use within an enterprise
137	   or otherwise as a service have become very important today.  See
138	   Section 4 for a discussion on how those issues affect this
139	   specification.

141	   This specification is intended to unify and clarify these
142	   requirements for registry decisions and responsibility and to
143	   emphasize the importance of registry restrictions at all levels of
144	   the DNS.  It also makes a specific recommendation for character
145	   repertoire subsetting intermediate between the code points allowed by
146	   RFC 5891 and 5892 and those allowed by individual registries.  It
147	   does not alter the basic IDNA2008 protocols and rules themselves in
148	   any way.

150	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
151	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
152	   document are to be interpreted as described in RFC 2119 [RFC2119].

154	2.  Registry Restrictions in IDNA2008

156	   As mentioned above, IDNA2008 specifies that the registries for each
157	   zone in the DNS that supports IDN labels are required to develop and
158	   apply their own rules to restrict the allowable labels, including
159	   limiting characters they allow to be used in labels in that zone.
160	   The chosen list MUST BE smaller than the collection of code points
161	   specified as "PVALID", "CONTEXTJ", and "CONTEXTO" by the rules
162	   established by the protocols themselves.  The latter two categories,
163	   and labels containing any characters that are normally part of a
164	   script written right to left [RFC5893], require that additional
165	   rules, specified in the protocols and known as "contextual rules" and
166	   "bidi rules", be applied.  The entire collection of rules and
167	   restrictions required by the IDNA2008 protocols themselves are known
168	   as "protocol restrictions".

170	   As mentioned above, registries may apply (and generally are required
171	   to apply) additional rules to further restrict the list of permitted
172	   code points, contextual rules (perhaps applied to normally PVALID
173	   code points) that apply additional restrictions, and/or restrictions
174	   on labels.  The most obvious of those restrictions include provisions
175	   for restricting suggested new registrations based on conflicts with
176	   labels already registered in the zone and specifications of what
177	   constitutes such conflicts based on the properties of the labels in
178	   question.  They further include prohibitions on code points and
179	   labels that are not consistent with the intended function of the zone
180	   or the subtree in which it is embedded (see Section 3) or limitations
181	   on where in a label allowable code points may be placed.

183	   These per-registry (or per-zone) rules are commonly known as
184	   "registry restrictions" to distinguish them from the protocol
185	   restrictions described above.  By necessity, the latter are somewhat
186	   generic, having to cater both to the union of the needs for all
187	   zones, as well as to the most permissive zones.  In consequence,
188	   additional Registry restrictions are essential to provide for the
189	   necessary security in the face of the tremendous variations and
190	   differences in writing systems, their ongoing evolution and
191	   development, as well as the human ability to recognize and
192	   distinguish characters in different scripts around the world and
193	   under different circumstances.

195	3.  Progressive Subsets of Allowed Characters

197	   The algorithm and rules of RFC 5891 and 5892 set an absolute upper
198	   bound on the code points that can be used in domain name labels;
199	   registries MUST NOT include code points unless they are allowed by
200	   those rules.  Each registry that intends to allow IDN registrations
201	   MUST then determine which code points will be allowed by that
202	   registry.  It SHOULD also consider additional rules, including
203	   contextual and whole label restrictions that provide further
204	   protection for registrants and users.  For example, the widely-used
205	   principle that bars labels containing characters from more than one
206	   script is not an IDNA2008 requirement.  It has been adopted by many
207	   registries but, as Section 4.4 of RFC 5890 indicates, there may be
208	   circumstances in which is it not required or appropriate.

210	   In formulating their own rules, registries SHOULD normally consult
211	   carefully-developed consensus recommendations about global maximum
212	   repertoires to be used such as the ICANN Maximal Starting Repertoire
213	   4 (MSR-4) for the Development of Label Generation Rules for the Root
214	   Zone [ICANN-MSR4] (or its successor documents).  Additional
215	   recommendations of similar quality about particular scripts or
216	   languages exist, including, but not limited to, the RFCs for Cyrillic
217	   [RFC5992] or Arabic Language [RFC5564] or script-based repertoires
218	   from the approved ICANN Root Zone Label Generation Rules (LGR-3)
219	   [ICANN-LGR3] (or its successor documents).  Many of these
220	   recommendations also cover rules about relationships among code
221	   points that may be particularly important for complex scripts and
222	   recommendations on how to deal with alternate representations of the
223	   same or apparently the same labels.

225	   It is the responsibility of the registry to determine which, if any,
226	   of those recommendations are applicable and to further subset or
227	   extend them as needed.  For example, several of the recommendations
228	   are designed for the root zone and therefore exclude digits and
229	   U+002D HYPHEN-MINUS; this restriction is not generally appropriate
230	   for other zones.  On the other hand, some zones may be designed to
231	   not cater for all users of a given script, but perhaps only for the
232	   needs of selected languages, in which case a more selective
233	   repertoire may be appropriate.

235	   In making these determinations, a registry SHOULD follow the IAB
236	   guidance in RFC 6912 [RFC6912].  Those guidelines include a number of
237	   principles for use in making decisions about allowable code points.
238	   In addition, that document notes that the closer a particular zone is
239	   to the root, the more restrictive the space of permitted labels
240	   should be.  RFC 5894 provides some suggestions for any registry that
241	   may decide to reduce opportunities for confusion or attacks by
242	   constructing policies that disallow characters used in historic
243	   writing systems (whether these be archaic scripts or extensions of
244	   modern scripts for historic or obsolete orthographies) or characters
245	   whose use is restricted to specialized, or highly technical contexts.
246	   These suggestions were among the principles guiding the design of
247	   ICANN's Maximal Starting Repertoires [LGR-Procedure].

249	   Particularly for a zone for which all labels to be delegated are not
250	   for the use of the same organization or enterprise, a registry
251	   decision to allow only those code points in the full repertoire of
252	   the MSR (plus digits and hyphen) would already avoid a number of
253	   issues inherent in a more permissive policy like "use anything
254	   permitted by IDNA2008", while still supporting the native languages
255	   and scripts for the vast majority of users today.  However, it is
256	   unlikely, by itself, to fully satisfy the mandate set out above for
257	   three reasons.

259	   1.  The MSR, like the set of code points permissible under IDNA2008
260	       itself, was conceived merely as an upper bound on permissible
261	       letter code points (it excludes digits and the hyphen).  It was
262	       always intended to be used as a starting point for setting
263	       registry policy, with the expectation that some of the code
264	       points in the MSR would not be included in the final registry
265	       policy, whether for lack of actual usage, or for being inherently
266	       problematic.

268	   2.  It was recognized that many scripts require contextual rules for
269	       many more code points than are covered by CONTEXTO or CONTEXTJ
270	       rules defined in IDNA2008.  This is particularly true for
271	       combining marks, typically used to encode diacritics, tone marks,
272	       vowel signs and the like.  While, theoretically, any combining
273	       mark may occur in any context in Unicode, in practice rendering
274	       and other software that users rely on in viewing or entering
275	       labels will not support arbitrary combining sequences, or indeed
276	       arbitrary combinations of code points, in the case of complex
277	       scripts.

279	       Contextual rules are required to limit allowable code point
280	       sequences to those that can be expected to be rendered reliably.
281	       Identifying those requires knowledge about the way code points
282	       are used in a script, whence the mandate for registries to only
283	       support code points they understand.  In this, some of the other
284	       recommendations, such as the Informational RFCs for specific
285	       scripts (e.g., Cyrillic [RFC5992]) or languages (e.g., Arabic
286	       [RFC5564] or Chinese [RFC4713]), or the Root Zone LGRs developed
287	       by ICANN, may provide useful guidance.

289	   3.  Third, because of the widely accepted practice of limiting any
290	       given label to a single script, a universal repertoire, such as
291	       the MSR, would have to be divided on a per script basis into
292	       subrepertoires to make it useful, with some of those repertoires
293	       overlapping, for example, in the case of East Asian shared usage
294	       of the Han ideographs.

296	   Registries choosing to make exceptions and allow code points that
297	   recommendations such as the MSR do not allow should make such
298	   decisions only with great care and only if they have considerable
299	   understanding of, and great confidence in, their appropriateness.
300	   The obvious exception from the MSR would be to allow digits and the
301	   hyphen.  Neither were allowed by the MSR, but only because they are
302	   not allowed in the Root Zone.

304	   Nothing in this document permits a registry to allow code points or
305	   labels that are disallowed or otherwise prohibited by IDNA2008.

307	4.  Considerations for For-Profit Domains

309	   As discussed in the Introduction (Section 1), the distributed
310	   administrative structure of the DNS today can be described by
311	   dividing zones into two categories depending on how they are
312	   administered and for whom.  These categories are not precise -- some
313	   zones may not fall neatly into one category or the other -- but are
314	   useful in understanding the practical applicability of this
315	   specification.  They are:

317	      Zones operating primarily or exclusively within an organization or
318	      enterprise and responsible to that organization or enterprise.
319	      DNS operations, including registrations and delegations, will
320	      typically occur in support of the purpose of that organization or
321	      enterprise rather than being its primary purpose.

323	      Zones operating primarily on a for-profit basis in which most
324	      delegations of subdomains are to entities with little or no no
325	      affiliation with the registry operator other than contractual
326	      agreements about operation of those subdomains.  These zones are
327	      often known as "public domains" or with similar terms, but those
328	      terms often have other semantics and may not cover all cases.

330	   Rules requiring strict registry responsibility, including either
331	   thorough understanding of scripts and related issues in domain name
332	   labels being considered for registration or local naming rules that
333	   have the same effect, typically come naturally to registries for
334	   zones of the first type.  Registration of labels that would prove
335	   problematic for any reason hurts the relevant organization or
336	   enterprise or its customers.  More generally, there are strong
337	   incentives to be extremely conservative about labels that might be
338	   registered and few, if any, incentives favoring adventures into
339	   labels that might be considered clever, much less ones that are hard
340	   to type, render, or, where it is relevant to users, remember
341	   correctly.

343	   By contrast, in a for-profit zone in which the profits are limited to
344	   selling names, there may be perceived incentives to register whatever
345	   names would-be registrants "want" or fears that any restrictions will
346	   cut into the available namespace.  In such situations, restrictions
347	   are unlikely to be applied unless they meet at least one of two
348	   criteria: (i) they are easy to apply and can be applied
349	   algorithmically or otherwise automatically and/or (ii) there is clear
350	   evidence that the particular label would cause harm.

352	   As suggested above, the two categories above are not precise.  In
353	   particular, there may be domains that, despite being set up to
354	   operate at a profit, are sufficiently conservative about their
355	   operations to more closely resemble the first group in practice than
356	   the second one.

358	   The requirement of IDNA that is discussed at length elsewhere in this
359	   specification stands: IDNA (and IDNs generally) would work better and
360	   Internet users would be better protected and more secure if
361	   registries and registrars (of any type) confined their registrations
362	   to scripts and code point sequences that they understood thoroughly.
363	   While the IETF rarely gives advice to those who choose to violate
364	   IETF Standards, some advice to zones in the second category above may
365	   be in order.  That advice is that significant conservatism in what is
366	   allowed to be registered, even for reservation purposes, and even
367	   more conservatism about what labels are actually entered into zones
368	   and delegated, is the best option for the Internet and its users.  If
369	   practical considerations do not allow that much conservatism, then it
370	   is desirable to consult and utilize the many lists and tables that
371	   have been, and continue to be, developed to advise on what might be
372	   sensible for particular scripts (such as ICANN's efforts for script-
373	   specific "generation rules" [[CREF1: Reference??? ]]) and lists of
374	   code points or code point relationships that may be particularly
375	   problematic and that should be treated with extra caution or
376	   prohibited entirely such as the proposed "troublesome character" list
377	   [Freytag-troublesome].  See also Section 6 below.

379	5.  Other corrections and updates

381	   After the initial IDNA2008 documents were published (and RFC 5892 was
382	   updated for Unicode 6.0 by RFC 6452 [RFC6452]) several errors or
383	   instances of confusing text were noted.  For the convenience of the
384	   community, the relevant corrections for RFC 5890 and 5891 are noted
385	   below and update the corresponding documents.  There are no errata
386	   for RFC 5893 or 5894 as of the date this document was published.
387	   Because further updates to RFC 5892 would require addressing other
388	   pending issues, the outstanding erratum for that document is not
389	   considered here.  For consistency with the original documents,
390	   references to Unicode 5.0 are preserved in this document.

392	   Readers should note that an update to RFC 5892 that is primarily
393	   concerned with the review process for new versions of Unicode but
394	   that makes some additional patches
395	   [ID.draft-klensin-idna-unicode-review] is in progress.  Its status
396	   should be checked in conjunction with application of the present
397	   specification.

399	5.1.  Updates to RFC 5890

401	   The outstanding errata against RFC 5890 (Errata ID 4695, 4696, 4823,
402	   and 4824 [RFC-Editor-5890Errata]) are all associated with the same
403	   issue, the number of Unicode characters that can be associated with a
404	   maximum-length (63 octet) A-label.  In retrospect and contrary to
405	   some of the suggestions in the errata, that value should not be
406	   expressed in octets because RFC 5890 and the other IDNA 2008
407	   documents are otherwise careful to not specify Unicode encoding forms
408	   but, instead, work exclusively with Unicode code points.
409	   Consequently the relevant material in RFC 5890 should be corrected as
410	   follows:

412	   Section 2.3.2.1

414	      Old:  expansion of the A-label form to a U-label may produce
415	         strings that are much longer than the normal 63 octet DNS limit
416	         (potentially up to 252 characters).

418	      New:  expansion of the A-label form to a U-label may produce
419	         strings that are much longer than the normal 63 octet DNS limit
420	         (See Section 4.2).

422	      Comment:  If the length limit is going to be a source of confusion
423	         or careful calculations, it should appear in only one place.

425	   Section 4.2
426	      Old:  Because A-labels (the form actually used in the DNS) are
427	         potentially much more compressed than UTF-8 (and UTF-8 is, in
428	         general, more compressed that UTF-16 or UTF-32), U-labels that
429	         obey all of the relevant symmetry (and other) constraints of
430	         these documents may be quite a bit longer, potentially up to
431	         252 characters (Unicode code points).

433	      New:  A-labels (the form actually used in the DNS) and the
434	         Punycode algorithm used as part of the process to produce them
435	         [RFC3492] are strings that are potentially much more compressed
436	         than any standard Unicode Encoding Form.  [[CREF2: Do we need a
437	         reference for this here??]] A 63 octet A-label cannot represent
438	         more than 58 Unicode code points (four octet overhead and the
439	         requirement that at least one character lie outside the ASCII
440	         range) but implementations allocating buffer space for the
441	         conversion should allow significantly more space depending on
442	         the encoding form they are using.

444	5.2.  Updates to RFC 5891

446	   Errata ID 3969: Improve reference for combining marks  There is only
447	      one erratum for RFC 5891, Errata ID 3969 [RFC5891Erratum].
448	      Combining marks are explained in the cited section, but not, as
449	      the text indicates, exactly defined.

451	      Old:  The Unicode string MUST NOT begin with a combining mark or
452	         combining character (see The Unicode Standard, Section 2.11
453	         [Unicode] for an exact definition).

455	      New:  The Unicode string MUST NOT begin with a combining mark or
456	         combining character (see The Unicode Standard, Section 2.11
457	         [Unicode] for an explanation and Section 3.6, definition D52)
458	         for an exact definition).

460	      Comment:  When RFC 5891 is actually updated, the references in the
461	         text should be updated to the current version of Unicode and
462	         the section numbers checked.

464	6.  Related Discussions

466	   This document is one of a series of measures that have been suggested
467	   to address IDNA issues raised in other documents, including
468	   mechanisms for dealing with combining sequences and single-code point
469	   characters with the same appearance that normalization neither
470	   combines nor decomposes as IDNA2008 assumed [IDNA-Unicode], including
471	   the IAB response to that issue [IAB-2015], and to take a higher-level
472	   view of issues, demands, and proposals for new uses of the DNS.
473	   Those documents also include a discussion of issues with IDNA and
474	   character graphemes for which abstractions exist in Unicode in
475	   precomposed form but that can be generated from combining sequences
476	   and a suggested registry of code points known to be problematic
477	   [Freytag-troublesome].  The discussion of combining sequences and
478	   non-decomposing characters is intended to lay the foundation for an
479	   actual update to the IDNA code points document [RFC5892].  Such an
480	   update will presumably also address the existing errata against that
481	   document.

483	7.  Security Considerations

485	   As discussed in IAB recommendations about internationalized domain
486	   names [RFC4690], [RFC6912], and elsewhere, poor choices of strings
487	   for DNS labels can lead to opportunities for attacks, user confusion,
488	   and other issues less directly related to security.  This document
489	   clarifies the importance of registries carefully establishing design
490	   policies for the labels they will allow and that having such policies
491	   and taking responsibility for them is a requirement, not an option.
492	   If that clarification is useful in practice, the result should be an
493	   improvement in security.

495	8.  Acknowledgments

497	   Many thanks to Patrik Faltstrom who provided an important review on
498	   the initial version.

500	9.  IANA Considerations

502	   [[CREF3: RFC Editor: Please remove this section before publication.]]

504	   This memo includes no requests to or actions for IANA.  In
505	   particular, it does not contain any provisions that would alter any
506	   IDNA-related registries or tables.

508	10.  References

510	10.1.  Normative References

512	   [ICANN-LGR3]
513	              ICANN, "Root Zone Label Generation Rules (LGR-1)", July
514	              2019,
515	              <https://www.icann.org/news/announcement-2-2019-04-25-en>.

517	   [ICANN-MSR4]
518	              ICANN, "Maximal Starting Repertoire Version 4 (MSR-4) for
519	              the Development of Label Generation Rules for the Root
520	              Zone", January 2019,
521	              <https://www.icann.org/news/announcement-2019-02-07-en>.

523	   [RFC1591]  Postel, J., "Domain Name System Structure and Delegation",
524	              RFC 1591, DOI 10.17487/RFC1591, March 1994,
525	              <https://www.rfc-editor.org/info/rfc1591>.

527	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
528	              Requirement Levels", BCP 14, RFC 2119,
529	              DOI 10.17487/RFC2119, March 1997,
530	              <https://www.rfc-editor.org/info/rfc2119>.

532	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
533	              Applications (IDNA): Definitions and Document Framework",
534	              RFC 5890, DOI 10.17487/RFC5890, August 2010,
535	              <https://www.rfc-editor.org/info/rfc5890>.

537	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
538	              Applications (IDNA): Protocol", RFC 5891,
539	              DOI 10.17487/RFC5891, August 2010,
540	              <https://www.rfc-editor.org/info/rfc5891>.

542	   [RFC5891Erratum]
543	              "RFC 5891, "Internationalized Domain Names in Applications
544	              (IDNA): Protocol"", Errata ID 3969, April 2014,
545	              <http://www.rfc-editor.org/errata_search.php?rfc=5891>.

547	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
548	              Applications (IDNA): Background, Explanation, and
549	              Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
550	              <https://www.rfc-editor.org/info/rfc5894>.

552	10.2.  Informative References

554	   [Freytag-troublesome]
555	              Freytag, A., Klensin, J., and A. Sullivan, "Those
556	              Troublesome Characters: A Registry of Unicode Code Points
557	              Needing Special Consideration When Used in Network
558	              Identifiers", June 2017, <draft-freytag-troublesome-
559	              characters-01>.

561	   [IAB-2015]
562	              Internet Architecture Board (IAB), "IAB Statement on
563	              Identifiers and Unicode 7.0.0", February 2015,
564	              <https://www.iab.org/documents/
565	              correspondence-reports-documents/2015-2/
566	              iab-statement-on-identifiers-and-unicode-7-0-0/>.

568	   [ID.draft-klensin-idna-unicode-review]
569	              Klensin, J. and P. Faltstrom, "IDNA Review for New Unicode
570	              Versions", June 2019, <https://datatracker.ietf.org/doc/
571	              draft-klensin-idna-unicode-review/>.

573	   [IDNA-Unicode]
574	              Klensin, J. and P. Falstrom, "IDNA Update for Unicode
575	              7.0.0", September 2017, <draft-klensin-idna-5892upd-
576	              unicode70-05>.

578	   [LGR-Procedure]
579	              Internet Corporation for Assigned Names and Numbers
580	              (ICANN), "Procedure to Develop and Maintain the Label
581	              Generation Rules for the Root Zone in Respect of IDNA
582	              Labels", March 2013,
583	              <https://www.icann.org/en/system/files/files/
584	              draft-lgr-procedure-20mar13-en.pdf>.

586	   [RFC-Editor-5890Errata]
587	              RFC Editor, "RFC Errata: RFC 5890, "Internationalized
588	              Domain Names for Applications (IDNA): Definitions and
589	              Document Framework", August 2010", Note to RFC
590	              Editor: Please figure out how you would like this
591	              referenced and make it so., Captured 2017-09-10, 2016,
592	              <https://www.rfc-editor.org/errata_search.php?rfc=5890>.

594	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
595	              for Internationalized Domain Names in Applications
596	              (IDNA)", RFC 3492, DOI 10.17487/RFC3492, March 2003,
597	              <https://www.rfc-editor.org/info/rfc3492>.

599	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
600	              Recommendations for Internationalized Domain Names
601	              (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006,
602	              <https://www.rfc-editor.org/info/rfc4690>.

604	   [RFC4713]  Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin,
605	              "Registration and Administration Recommendations for
606	              Chinese Domain Names", RFC 4713, DOI 10.17487/RFC4713,
607	              October 2006, <https://www.rfc-editor.org/info/rfc4713>.

609	   [RFC5564]  El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
610	              "Linguistic Guidelines for the Use of the Arabic Language
611	              in Internet Domains", RFC 5564, DOI 10.17487/RFC5564,
612	              February 2010, <https://www.rfc-editor.org/info/rfc5564>.

614	   [RFC5892]  Faltstrom, P., Ed., "The Unicode Code Points and
615	              Internationalized Domain Names for Applications (IDNA)",
616	              RFC 5892, DOI 10.17487/RFC5892, August 2010,
617	              <https://www.rfc-editor.org/info/rfc5892>.

619	   [RFC5893]  Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
620	              for Internationalized Domain Names for Applications
621	              (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010,
622	              <https://www.rfc-editor.org/info/rfc5893>.

624	   [RFC5992]  Sharikov, S., Miloshevic, D., and J. Klensin,
625	              "Internationalized Domain Names Registration and
626	              Administration Guidelines for European Languages Using
627	              Cyrillic", RFC 5992, DOI 10.17487/RFC5992, October 2010,
628	              <https://www.rfc-editor.org/info/rfc5992>.

630	   [RFC6452]  Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code
631	              Points and Internationalized Domain Names for Applications
632	              (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452,
633	              November 2011, <https://www.rfc-editor.org/info/rfc6452>.

635	   [RFC6912]  Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman,
636	              "Principles for Unicode Code Point Inclusion in Labels in
637	              the DNS", RFC 6912, DOI 10.17487/RFC6912, April 2013,
638	              <https://www.rfc-editor.org/info/rfc6912>.

640	Appendix A.  Change Log

642	   RFC Editor: Please remove this appendix before publication.

644	A.1.  Changes from version -00 (2017-03-11) to -01

646	   o  Added Acknowledgments and adjusted references.

648	   o  Filled in Section 5 with updates to respond to errata.

650	   o  Added Section 6 to discuss relationships to other documents.

652	   o  Modified the Abstract to note specifically updated documents.

654	   o  Several small editorial changes and corrections.

656	A.2.  Changes from version -01 (2017-09-12) to -02

658	   After pause of nearly 34 months due to inability to get this draft
659	   processed, including nearly a year waiting for a new directorate to
660	   actually do anything of substance about fundamental IDNA issues, the
661	   -02 version is being posted in the hope of getting a new start.
662	   Specific changes include:

664	   o  Added a new section, Section 4, and some introductory material to
665	      address the very practical issue that domains run on a for-profit
666	      basis are unlikely to follow the very strict "understand what you
667	      are registering" requirement if they support IDNs at all and
668	      expect to profit from them.

670	   o  Added a pointer to draft-klensin-idna-unicode-review to the
671	      discussion of other work.

673	   o  Editorial corrections and changes.

675	Authors' Addresses

677	   John C Klensin
678	   1770 Massachusetts Ave, Ste 322
679	   Cambridge, MA  02140
680	   USA

682	   Phone: +1 617 245 1457
683	   Email: john-ietf@jck.com

685	   Asmus Freytag
686	   ASMUS, Inc.

688	   Email: asmus@unicode.org