idnits 2.17.1 

draft-klensin-idna-rfc5891bis-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC5894, but the
     abstract doesn't seem to directly say this.  It does mention RFC5894
     though, so this could be OK.

  -- The draft header indicates that this document updates RFC5890, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC5891, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC5890, updated by this document, for
     RFC5378 checks: 2008-10-14)

     (Using the creation date from RFC5891, updated by this document, for
     RFC5378 checks: 2008-05-22)

     (Using the creation date from RFC5894, updated by this document, for
     RFC5378 checks: 2008-05-13)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 13, 2020) is 1382 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-LGR3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-MSR4'

  ** Downref: Normative reference to an Informational RFC: RFC 1591

  -- Duplicate reference: RFC5891, mentioned in 'RFC5891Erratum', was also
     mentioned in 'RFC5891'.

  ** Downref: Normative reference to an Informational RFC: RFC 5894

  ** Downref: Normative reference to an Informational RFC: RFC 6912

  -- Duplicate reference: RFC5890, mentioned in 'RFC-Editor-5890Errata', was
     also mentioned in 'RFC5890'.


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Klensin
3	Internet-Draft
4	Updates: 5890, 5891, 5894 (if approved)                       A. Freytag
5	Intended status: Standards Track                             ASMUS, Inc.
6	Expires: January 14, 2021                                  July 13, 2020

8	    Internationalized Domain Names in Applications (IDNA): Registry
9	                    Restrictions and Recommendations
10	                    draft-klensin-idna-rfc5891bis-06

12	Abstract

14	   The IDNA specifications for internationalized domain names combine
15	   rules that determine the labels that are allowed in the DNS without
16	   violating the protocol itself and an assignment of responsibility,
17	   consistent with earlier specifications, for determining the labels
18	   that are allowed in particular zones.  Conformance to IDNA by
19	   registries and other implementations requires both parts.  Experience
20	   strongly suggests that the language describing those responsibilities
21	   was insufficiently clear to promote safe and interoperable use of the
22	   specifications and that more details and discussion of circumstances
23	   would have been helpful.  Without making any substantive changes to
24	   IDNA, this specification updates two of the core IDNA documents (RFCs
25	   5890 and 5891) and the IDNA explanatory document (RFC 5894) to
26	   provide that guidance and to correct some technical errors in the
27	   descriptions.

29	Status of This Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at https://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on January 14, 2021.

46	Copyright Notice

48	   Copyright (c) 2020 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (https://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
64	   2.  Registry Restrictions in IDNA2008 . . . . . . . . . . . . . .   4
65	   3.  Progressive Subsets of Allowed Characters . . . . . . . . . .   5
66	   4.  Considerations for Domains Operated Primarily for the
67	       Financial Benefit of the Registry Owner or Operator
68	       Organization  . . . . . . . . . . . . . . . . . . . . . . . .   7
69	   5.  Other corrections and updates . . . . . . . . . . . . . . . .   9
70	     5.1.  Updates to RFC 5890 . . . . . . . . . . . . . . . . . . .   9
71	     5.2.  Updates to RFC 5891 . . . . . . . . . . . . . . . . . . .  10
72	   6.  Related Discussions . . . . . . . . . . . . . . . . . . . . .  11
73	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
74	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  11
75	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
76	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  12
77	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  12
78	     10.2.  Informative References . . . . . . . . . . . . . . . . .  13
79	   Appendix A.  Change Log . . . . . . . . . . . . . . . . . . . . .  15
80	     A.1.  Changes from version -00 (2017-03-11) to -01  . . . . . .  15
81	     A.2.  Changes from version -01 (2017-09-12) to -02  . . . . . .  15
82	     A.3.  Changes from version -02 (2019-07-06) to -03  . . . . . .  16
83	     A.4.  Changes from version -03 (2019-07-22) to -04  . . . . . .  16
84	     A.5.  Changes from version -04 (2019-08-02) to -05  . . . . . .  16
85	     A.6.  Changes from version -05 (2019-08-29) to -06  . . . . . .  16
86	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

88	1.  Introduction

90	   Parts of the specifications for Internationalized Domain Names in
91	   Applications (IDNA) [RFC5890] [RFC5891] [RFC5894] (collectively
92	   known, along with RFC 5892 [RFC5892], RFC 5893 [RFC5893] and updates
93	   to them, as "IDNA2008" (or just "IDNA") impose a requirement that
94	   domain name system (DNS) registries restrict the characters they
95	   allow in domain name labels (see Section 2 below), and the contents
96	   and structure of those labels.  That requirement and restriction are
97	   consistent with the "duty to serve the community" described in the
98	   original specification for DNS naming and authority [RFC1591].  The
99	   restrictions are intended to limit the permitted characters and
100	   strings to those for which the registries or their advisers have a
101	   thorough understanding and for which they are willing to take
102	   responsibility.

104	   That provision is centrally important because it recognized that
105	   historical relationships and variations among scripts and writing
106	   systems, the continuing evolution of those systems, differences in
107	   the uses of characters among languages (and locations) that use the
108	   same script, and so on make it impossible for a single list of
109	   characters and simple rules to be able to generate an "if we use
110	   these, we will be safe from confusion and various attacks" guideline.

112	   Instead, the algorithm and rules of RFCs 5891 and 5892 eliminate many
113	   of the most dangerous and otherwise problematic cases, but cannot
114	   eliminate the need for registries and registrars to understand what
115	   they are doing and taking responsibility for the decisions they make.

117	   The way in which the IDNA2008 specifications expressed these
118	   requirements may have under emphasized the intention that they
119	   actually are requirements.  Section 2.3.2.3 of the Definitions
120	   document [RFC5890] mentions the need for the restrictions, indicates
121	   that they are mandatory, and points the reader to section 4.3 of the
122	   Protocol document [RFC5891], which in turn points to Section 3.2 of
123	   the Rationale document [RFC5894], with each document providing
124	   further detail, discussion, and clarification.

126	   At the same time, the Internet has evolved significantly since the
127	   management assumptions for the DNS were established with RFC 1591 and
128	   earlier.  In particular, the management and use of domain names have
129	   gone through several transformations.  Recounting of those changes is
130	   beyond the scope of this document but one of them has had significant
131	   practical impact on the degree to which the requirement for registry
132	   knowledge and responsibility is observed in practice.  When RFC 1591
133	   was written, the assumption was that domains at all levels of the DNS
134	   would be operated in the best interest of the registrants in the
135	   domain and of the Internet as a whole.  There were no notions about
136	   domains being operated for a profit, much less with a business model
137	   that made them more profitable the more names that could be
138	   registered (or even, under some circumstances, reserved and not
139	   registered).  At the time RFC 1591 was written, there was also no
140	   notion that domains would be considered more successful based on the
141	   number of names registered and delegated from them.  While rarely
142	   reflected in the DNS protocols, the distinction between domains
143	   operated primarily as a revenue source of the organizations operating
144	   the registry and ones that are operated for, e.g., use within an
145	   enterprise or otherwise as a service have become very important
146	   today.  See Section 4 for a discussion on how those issues affect
147	   this specification.

149	   This specification is intended to unify and clarify these
150	   requirements for registry decisions and responsibility and to
151	   emphasize the importance of registry restrictions at all levels of
152	   the DNS.  It also makes a specific recommendation for character
153	   repertoire subsetting that is intermediate between the code points
154	   allowed by RFCs 5891 and 5892 and those allowed by individual
155	   registries.  It does not alter the basic IDNA2008 protocols and rules
156	   themselves in any way.

158	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
159	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
160	   document are to be interpreted as described in RFC 2119 [RFC2119].

162	2.  Registry Restrictions in IDNA2008

164	   As mentioned above, IDNA2008 specifies that the registries for each
165	   zone in the DNS that supports IDN labels are required to develop and
166	   apply their own rules to restrict the allowable labels, including
167	   limiting characters they allow to be used in labels in that zone.
168	   The chosen list MUST be a subset of the collection of code points
169	   specified as "PVALID", "CONTEXTJ", and "CONTEXTO" by the rules
170	   established by the protocols themselves.  Labels containing any
171	   characters from the two CONTEXT categories or any characters that are
172	   normally part of a script written right to left [RFC5893] require
173	   that additional rules, specified in the protocols and known as
174	   "contextual rules" and "bidi rules", be applied.  The entire
175	   collection of rules and restrictions required by the IDNA2008
176	   protocols themselves are known as "protocol restrictions".

178	   As mentioned above, registries may apply (and generally are required
179	   to apply) additional rules to further restrict the list of permitted
180	   code points, contextual rules (perhaps applied to normally PVALID
181	   code points) that apply additional restrictions, and/or restrictions
182	   on labels as distinct from code points.  The most obvious of those
183	   restrictions include provisions for restricting suggested new
184	   registrations based on conflicts with labels already registered in
185	   the zone, so as to avoid homograph attacks [Gabrilovich2002] and
186	   other issues.  The specifications of what constitutes such conflicts,
187	   as well as the definition of "conflict" based on the properties of
188	   the labels in question, is the responsibility of each registry.  They
189	   further include prohibitions on code points and labels that are not
190	   consistent with the intended function of the zone, the subtree in
191	   which the zone is embedded (see Section 3), or limitations on where
192	   allowable code points may be placed in a label.

194	   These per-registry (or per-zone) rules are commonly known as
195	   "registry restrictions" to distinguish them from the protocol
196	   restrictions described above.  By necessity, protocol restrictions
197	   are somewhat generic, having to cater both to the union of the needs
198	   for all zones as well as to the desires of the most permissive zones.
199	   In consequence, additional registry restrictions are essential to
200	   provide for the necessary security in the face of the tremendous
201	   variations and differences in writing systems and their ongoing
202	   evolution and development, as well as the human ability to recognize
203	   and distinguish characters in different scripts around the world and
204	   under different circumstances.

206	3.  Progressive Subsets of Allowed Characters

208	   The algorithm and rules of RFCs 5891 and 5892 determine the set of
209	   code points that are possible for inclusion in domain name labels;
210	   registries MUST NOT permit code points in labels unless they are part
211	   of that set.  Labels that contain code points that are normally
212	   written from right to left MUST also conform to the requirements of
213	   RFC 5893.  Each registry that intends to allow IDN registrations MUST
214	   then determine the strict subset of that set of code points that will
215	   be allowed by that registry.  It SHOULD also consider additional
216	   rules, including contextual and whole label restrictions that provide
217	   further protection for registrants and users.  For example, the
218	   widely-used principle that bars labels containing characters from
219	   more than one script is not an IDNA2008 requirement.  It has been
220	   adopted by many registries but there may be circumstances in which is
221	   it not required or appropriate.

223	   In formulating their own rules, registries should normally consult
224	   carefully-developed consensus recommendations about global maximum
225	   repertoires to be used such as the ICANN Maximal Starting Repertoire
226	   4 (MSR-4) for the Development of Label Generation Rules for the Root
227	   Zone [ICANN-MSR4] (or its successor documents).  Additional
228	   recommendations of similar quality about particular scripts or
229	   languages exist, including, but not limited to, the RFCs for Cyrillic
230	   [RFC5992], Arabic Language [RFC5564], or script-based repertoires
231	   from the approved ICANN Root Zone Label Generation Rules (LGR-3)
232	   [ICANN-LGR3] (or its successor documents).  Many of these
233	   recommendations also cover rules about relationships among code
234	   points that may be particularly important for complex scripts.  They
235	   also interact with recommendations about how labels that appear to be
236	   the same should be handled.

238	   It is the responsibility of the registry to determine which, if any,
239	   of those recommendations are applicable and to further subset or
240	   extend them as needed.  For example, several of the recommendations
241	   are designed for the root zone and therefore exclude digits and
242	   U+002D HYPHEN-MINUS; this restriction is not generally appropriate
243	   for other zones.  On the other hand, some zones may be designed to
244	   not cater for all users of a given script, but perhaps only for the
245	   needs of selected languages, in which case a more selective
246	   repertoire may be appropriate.

248	   In making these determinations, a registry SHOULD follow the IAB
249	   guidance in RFC 6912 [RFC6912].  Those guidelines include a number of
250	   principles for use in making decisions about allowable code points.
251	   In addition, that document notes that the closer a particular zone is
252	   to the root, the more restrictive the space of permitted labels
253	   should be.  RFC 5894 provides some suggestions for any registry that
254	   may decide to reduce opportunities for confusion or attacks by
255	   constructing policies that disallow characters used in historic
256	   writing systems (whether these be archaic scripts or extensions of
257	   modern scripts for historic or obsolete orthographies) or characters
258	   whose use is restricted to specialized, or highly technical contexts.
259	   These suggestions were among the principles guiding the design of
260	   ICANN's Maximal Starting Repertoires (MSR) [LGR-Procedure].

262	   A registry decision to allow only those code points in the full
263	   repertoire of the MSR (plus digits and hyphen) would already avoid a
264	   number of issues inherent in a more permissive policy such as "use
265	   anything permitted by IDNA2008", while still supporting the native
266	   languages and scripts for the vast majority of users today.  However,
267	   it is unlikely, by itself, to fully satisfy the mandate set out above
268	   for three reasons.

270	   1.  The MSR, like the set of code points permissible under IDNA2008
271	       itself, was conceived merely as a boundary condition on
272	       permissible letter code points (it excludes digits and the
273	       hyphen).  It was always intended to be used as a starting point
274	       for setting registry policy, with the expectation that some of
275	       the code points in the MSR would not be included in the final
276	       registry policy, whether for lack of actual usage, or for being
277	       inherently problematic.

279	   2.  It was recognized that many scripts require contextual rules for
280	       many more code points than are covered by CONTEXTO or CONTEXTJ
281	       rules defined in IDNA2008.  This is particularly true for
282	       combining marks, typically used to encode diacritics, tone marks,
283	       vowel signs and the like.  While, theoretically, any combining
284	       mark may occur in any context in Unicode, in practice rendering
285	       and other software that users rely on in viewing or entering
286	       labels will not support arbitrary combining sequences, or indeed
287	       arbitrary combinations of code points, in the case of complex
288	       scripts.

290	       Contextual rules are needed in order to limit allowable code
291	       point sequences to those that can be expected to be rendered
292	       reliably.  Identifying those requires knowledge about the way
293	       code points are used in a script, whence the mandate for
294	       registries to only support code points they understand.  In this,
295	       some of the other recommendations, such as the Informational RFCs
296	       for specific scripts (e.g., Cyrillic [RFC5992]) or languages
297	       (e.g., Arabic [RFC5564] or Chinese [RFC4713]), or the Root Zone
298	       LGRs developed by ICANN, may provide useful guidance.

300	   3.  Third, because of the widely accepted practice of limiting any
301	       given label to a single script, a universal repertoire, such as
302	       the MSR, would have to be divided on a per-script basis into
303	       subrepertoires to make it useful, with some of those repertoires
304	       overlapping, for example, in the case of East Asian shared usage
305	       of the Han ideographs.

307	   Registries choosing to make exceptions -- allow code points that
308	   recommendations such as the MSR do not allow -- should make such
309	   decisions only with great care and only if they have considerable
310	   understanding of, and great confidence in, their appropriateness.
311	   The obvious exception from the MSR would be to allow digits and the
312	   hyphen.  Neither were allowed by the MSR, but only because they are
313	   not allowed in the Root Zone.

315	   Nothing in this document permits a registry to allow code points or
316	   labels that are disallowed or otherwise prohibited by IDNA2008.

318	4.  Considerations for Domains Operated Primarily for the Financial
319	    Benefit of the Registry Owner or Operator Organization

321	   As discussed in the Introduction (Section 1), the distributed
322	   administrative structure of the DNS today can be described by
323	   dividing zones into two categories depending on how they are
324	   administered and for whom.  These categories are not precise -- some
325	   zones may not fall neatly into one category or the other -- but are
326	   useful in understanding the practical applicability of this
327	   specification.  They are:

329	      Zones operating primarily or exclusively within a country,
330	      organization, or enterprise and responsible to the Internet users
331	      in that country or the management of the organization or
332	      enterprise.  DNS operations, including registrations and
333	      delegations, will typically occur in support of the purpose of
334	      that country, organization or enterprise rather than being its
335	      primary purpose.

337	      Zones operating primarily as all or part of a business of selling
338	      names for the financial benefit of entities responsible for the
339	      registry.  For these domains, most delegations of subdomains are
340	      to entities with little or no affiliation with the registry
341	      operator other than contractual agreements about operation of
342	      those subdomains.  These zones are often known as "public domains"
343	      or with similar terms, but those terms often have other semantics
344	      and may not cover all cases.  In particular, a country code domain
345	      operated primarily in the interest of registrants and Internet
346	      users and in service to the broader Internet community is often
347	      considered a "public domain" but would fall into the first
348	      category, not the second.

350	   Rules requiring strict registry responsibility, including either
351	   thorough understanding of scripts and related issues in domain name
352	   labels being considered for registration or local naming rules that
353	   have the same effect, typically come naturally to registries for
354	   zones of the first type.  Registration of labels that would prove
355	   problematic for any reason hurts the relevant organization or
356	   enterprise or its customers or users within the relevant country and
357	   more broadly.  More generally, there are strong incentives to be
358	   extremely conservative about labels that might be registered and few,
359	   if any, incentives favoring adventures into labels that might be
360	   considered clever, much less ones that are hard to type, render, or,
361	   where it is relevant to users, remember correctly.

363	   By contrast, in a zone in which the profits are derived exclusively,
364	   or almost exclusively, from selling or reserving (including
365	   "blocking") names, there may be perceived incentives to register
366	   whatever names would-be registrants "want" or fears that any
367	   restrictions will cut into the available namespace.  In such
368	   situations, restrictions are unlikely to be applied unless they meet
369	   at least one of two criteria: (i) they are easy to apply and can be
370	   applied algorithmically or otherwise automatically and/or (ii) there
371	   is clear evidence that the particular label would cause harm.

373	   As suggested above, the two categories above are not precise.  In
374	   particular, there may be domains that, despite being set up to
375	   operate to produce revenue about actual costs, are sufficiently
376	   conservative about their operations to more closely resemble the
377	   first group in practice than the second one.

379	   The requirement of IDNA that is discussed at length elsewhere in this
380	   specification stands: IDNA (and IDNs generally) would work better and
381	   Internet users would be better protected and more secure if
382	   registries and registrars (of any type) confined their registrations
383	   to scripts and code point sequences that they understood thoroughly.
384	   While the IETF rarely gives advice to those who choose to violate
385	   IETF Standards, some advice to zones in the second category above may
386	   be in order.  That advice is that significant conservatism in what is
387	   allowed to be registered, even for reservation purposes, and even
388	   more conservatism about what labels are actually entered into zones
389	   and delegated, is the best option for the Internet and its users.  If
390	   practical considerations do not allow that much conservatism, then it
391	   is desirable to consult and utilize the many lists and tables that
392	   have been, and continue to be, developed to advise on what might be
393	   sensible for particular scripts and languages.  These include ICANN's
394	   twin efforts of creating per-script Root Zone Label Generation Rules
395	   [RZ-LGR-3] and Second Level Reference Label Generation Rules
396	   [SL-REF-LGR] (the latter of which may be per language).  They also
397	   include other lists of code points or code point relationships that
398	   may be particularly problematic and that should be treated with extra
399	   caution or prohibited entirely such as the proposed "troublesome
400	   character" list [Freytag-troublesome].  See also Section 6 below.

402	5.  Other corrections and updates

404	   After the initial IDNA2008 documents were published (and RFC 5892 was
405	   updated for Unicode 6.0 by RFC 6452 [RFC6452]) several errors or
406	   instances of confusing text were noted.  For the convenience of the
407	   community, the relevant corrections for RFCs 5890 and 5891 are noted
408	   below and update the corresponding documents.  There are no errata
409	   for RFC 5893 or 5894 as of the date this document was published.
410	   Because further updates to RFC 5892 would require addressing other
411	   pending issues, the outstanding erratum for that document is not
412	   considered here.  For consistency with the original documents,
413	   references to Unicode 5.0 are preserved in this document.

415	5.1.  Updates to RFC 5890

417	   The outstanding errata against RFC 5890 (Errata ID 4695, 4696, 4823,
418	   and 4824 [RFC-Editor-5890Errata]) are all associated with the same
419	   issue, the number of Unicode characters that can be associated with a
420	   maximum-length (63 octet) A-label.  In retrospect and contrary to
421	   some of the suggestions in the errata, that value should not be
422	   expressed in octets because RFC 5890 and the other IDNA 2008
423	   documents are otherwise careful to not specify Unicode encoding forms
424	   but, instead, work exclusively with Unicode code points.
425	   Consequently the relevant material in RFC 5890 should be corrected as
426	   follows:

428	   Section 2.3.2.1
429	      Old:  expansion of the A-label form to a U-label may produce
430	         strings that are much longer than the normal 63 octet DNS limit
431	         (potentially up to 252 characters).

433	      New:  expansion of the A-label form to a U-label may produce
434	         strings that are much longer than the normal 63 octet DNS limit
435	         (See Section 4.2).

437	      Comment:  If the length limit is going to be a source of confusion
438	         or careful calculations, it should appear in only one place.

440	   Section 4.2

442	      Old:  Because A-labels (the form actually used in the DNS) are
443	         potentially much more compressed than UTF-8 (and UTF-8 is, in
444	         general, more compressed that UTF-16 or UTF-32), U-labels that
445	         obey all of the relevant symmetry (and other) constraints of
446	         these documents may be quite a bit longer, potentially up to
447	         252 characters (Unicode code points).

449	      New:  A-labels (the form actually used in the DNS) and the
450	         Punycode algorithm used as part of the process to produce them
451	         [RFC3492] are strings that are potentially much more compressed
452	         than any standard Unicode Encoding Form.  A 63 octet A-label
453	         cannot represent more than 58 Unicode code points (four octet
454	         overhead and the requirement that at least one character lie
455	         outside the ASCII range) but implementations allocating buffer
456	         space for the conversion should allow significantly more space
457	         (i.e., extra octets) depending on the encoding form they are
458	         using.

460	5.2.  Updates to RFC 5891

462	   Errata ID 3969: Improve reference for combining marks.  There is only
463	      one erratum for RFC 5891, Errata ID 3969 [RFC5891Erratum].
464	      Combining marks are explained in the cited section, but not, as
465	      the text indicates, exactly defined.

467	      Old:  The Unicode string MUST NOT begin with a combining mark or
468	         combining character (see The Unicode Standard, Section 2.11
469	         [UnicodeA] for an exact definition).

471	      New:  The Unicode string MUST NOT begin with a combining mark or
472	         combining character (see The Unicode Standard, Section 2.11
473	         [UnicodeA] for an explanation and Section 3.6, definition D52
474	         [UnicodeB]) for an exact definition).

476	      Comment:  When RFC 5891 is actually updated, the references in the
477	         text should be updated to the current version of Unicode and
478	         the section numbers checked.

480	6.  Related Discussions

482	   This document is one of a series of measures that have been suggested
483	   to address IDNA issues raised in other documents and discussions.
484	   Those other discussions and associated documents include suggested
485	   mechanisms for dealing with combining sequences and single-code point
486	   characters with the same appearance, ones that normalization neither
487	   combines nor decomposes as IDNA2008 assumed.  That topic was
488	   discussed further in [IDNA-Unicode] and in the IAB response to that
489	   issue [IAB-2015].  Those and other documents also discuss issues with
490	   IDNA and character graphemes for which abstractions exist in Unicode
491	   in precomposed form but that can be generated from combining
492	   sequences.  Another approach is a suggested registry of code points
493	   known to be problematic [Freytag-troublesome].  In combination, the
494	   various discussions of combining sequences and non-decomposing
495	   characters may lay the foundation for an actual update to the IDNA
496	   code points document [RFC5892].  Such an update would presumably also
497	   address the existing errata against that document.

499	   At a much higher-level, discussions are ongoing to consider issues,
500	   demands, and proposals for new uses of the DNS.

502	7.  Security Considerations

504	   As discussed in IAB recommendations about internationalized domain
505	   names [RFC4690], [RFC6912], and elsewhere, poor choices of strings
506	   for DNS labels can lead to opportunities for attacks, user confusion,
507	   and other issues less directly related to security.  This document
508	   clarifies the importance of registries carefully establishing design
509	   policies for the labels they will allow and that having such policies
510	   and taking responsibility for them is a requirement, not an option.
511	   If that clarification is useful in practice, the result should be an
512	   improvement in security.

514	8.  Acknowledgments

516	   Many thanks to Patrik Faltstrom who provided an important review on
517	   the initial version, to Jaap Akkerhuis, Don Eastlake, Barry Leiba,
518	   and Alessandro Vesely who did reviews that improved the text and to
519	   Pete Resnick who acted as document shepherd and did an additional
520	   careful review.

522	9.  IANA Considerations

524	   [[CREF1: RFC Editor: Please remove this section before publication.]]

526	   This memo includes no requests to or actions for IANA.  In
527	   particular, it does not contain any provisions that would alter any
528	   IDNA-related registries or tables.

530	10.  References

532	10.1.  Normative References

534	   [ICANN-LGR3]
535	              ICANN, "Root Zone Label Generation Rules (LGR-1)", July
536	              2019,
537	              <https://www.icann.org/news/announcement-2-2019-04-25-en>.

539	   [ICANN-MSR4]
540	              ICANN, "Maximal Starting Repertoire Version 4 (MSR-4) for
541	              the Development of Label Generation Rules for the Root
542	              Zone", January 2019,
543	              <https://www.icann.org/news/announcement-2019-02-07-en>.

545	   [RFC1591]  Postel, J., "Domain Name System Structure and Delegation",
546	              RFC 1591, DOI 10.17487/RFC1591, March 1994,
547	              <https://www.rfc-editor.org/info/rfc1591>.

549	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
550	              Requirement Levels", BCP 14, RFC 2119,
551	              DOI 10.17487/RFC2119, March 1997,
552	              <https://www.rfc-editor.org/info/rfc2119>.

554	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
555	              Applications (IDNA): Definitions and Document Framework",
556	              RFC 5890, DOI 10.17487/RFC5890, August 2010,
557	              <https://www.rfc-editor.org/info/rfc5890>.

559	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
560	              Applications (IDNA): Protocol", RFC 5891,
561	              DOI 10.17487/RFC5891, August 2010,
562	              <https://www.rfc-editor.org/info/rfc5891>.

564	   [RFC5891Erratum]
565	              "RFC 5891, "Internationalized Domain Names in Applications
566	              (IDNA): Protocol"", Errata ID 3969, April 2014,
567	              <http://www.rfc-editor.org/errata_search.php?rfc=5891>.

569	   [RFC5893]  Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
570	              for Internationalized Domain Names for Applications
571	              (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010,
572	              <https://www.rfc-editor.org/info/rfc5893>.

574	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
575	              Applications (IDNA): Background, Explanation, and
576	              Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
577	              <https://www.rfc-editor.org/info/rfc5894>.

579	   [RFC6912]  Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman,
580	              "Principles for Unicode Code Point Inclusion in Labels in
581	              the DNS", RFC 6912, DOI 10.17487/RFC6912, April 2013,
582	              <https://www.rfc-editor.org/info/rfc6912>.

584	10.2.  Informative References

586	   [Freytag-troublesome]
587	              Freytag, A., Klensin, J., and A. Sullivan, "Those
588	              Troublesome Characters: A Registry of Unicode Code Points
589	              Needing Special Consideration When Used in Network
590	              Identifiers", June 2017, <draft-freytag-troublesome-
591	              characters-01>.

593	   [Gabrilovich2002]
594	              Gabrilovich, E. and A. Gontmakher, "The Homograph Attack",
595	              Communications of the ACM 45(2):128, February 2002.

597	   [IAB-2015]
598	              Internet Architecture Board (IAB), "IAB Statement on
599	              Identifiers and Unicode 7.0.0", February 2015,
600	              <https://www.iab.org/documents/correspondence-reports-
601	              documents/2015-2/iab-statement-on-identifiers-and-unicode-
602	              7-0-0/>.

604	   [IDNA-Unicode]
605	              Klensin, J. and P. Faltstrom, "IDNA Update for Unicode
606	              7.0.0", September 2017, <draft-klensin-idna-5892upd-
607	              unicode70-05>.

609	   [LGR-Procedure]
610	              Internet Corporation for Assigned Names and Numbers
611	              (ICANN), "Procedure to Develop and Maintain the Label
612	              Generation Rules for the Root Zone in Respect of IDNA
613	              Labels", March 2013,
614	              <https://www.icann.org/en/system/files/files/draft-lgr-
615	              procedure-20mar13-en.pdf>.

617	   [RFC-Editor-5890Errata]
618	              RFC Editor, "RFC Errata: RFC 5890, "Internationalized
619	              Domain Names for Applications (IDNA): Definitions and
620	              Document Framework", August 2010", Note to RFC
621	              Editor: Please figure out how you would like this
622	              referenced and make it so., Captured 2017-09-10, 2016,
623	              <https://www.rfc-editor.org/errata_search.php?rfc=5890>.

625	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
626	              for Internationalized Domain Names in Applications
627	              (IDNA)", RFC 3492, DOI 10.17487/RFC3492, March 2003,
628	              <https://www.rfc-editor.org/info/rfc3492>.

630	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
631	              Recommendations for Internationalized Domain Names
632	              (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006,
633	              <https://www.rfc-editor.org/info/rfc4690>.

635	   [RFC4713]  Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin,
636	              "Registration and Administration Recommendations for
637	              Chinese Domain Names", RFC 4713, DOI 10.17487/RFC4713,
638	              October 2006, <https://www.rfc-editor.org/info/rfc4713>.

640	   [RFC5564]  El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
641	              "Linguistic Guidelines for the Use of the Arabic Language
642	              in Internet Domains", RFC 5564, DOI 10.17487/RFC5564,
643	              February 2010, <https://www.rfc-editor.org/info/rfc5564>.

645	   [RFC5892]  Faltstrom, P., Ed., "The Unicode Code Points and
646	              Internationalized Domain Names for Applications (IDNA)",
647	              RFC 5892, DOI 10.17487/RFC5892, August 2010,
648	              <https://www.rfc-editor.org/info/rfc5892>.

650	   [RFC5992]  Sharikov, S., Miloshevic, D., and J. Klensin,
651	              "Internationalized Domain Names Registration and
652	              Administration Guidelines for European Languages Using
653	              Cyrillic", RFC 5992, DOI 10.17487/RFC5992, October 2010,
654	              <https://www.rfc-editor.org/info/rfc5992>.

656	   [RFC6452]  Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code
657	              Points and Internationalized Domain Names for Applications
658	              (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452,
659	              November 2011, <https://www.rfc-editor.org/info/rfc6452>.

661	   [RZ-LGR-3]
662	              Internet Corporation for Assigned Names and Numbers, "Root
663	              Zone Label Generation Rules - LGR-3: Overview and Summary,
664	              Version 3", July 2019,
665	              <https://www.icann.org/sites/default/files/lgr/lgr-3-
666	              overview-10jul19-en.pdf>.

668	   [SL-REF-LGR]
669	              Internet Corporation for Assigned Names and Numbers
670	              (ICANN), "Second Level Label Generation Rules", 2019,
671	              <https://www.icann.org/resources/pages/second-level-lgr-
672	              2015-06-21-en>.

674	   [UnicodeA]
675	              The Unicode Consortium, "The Unicode Standard, Version
676	              12.1", May 2019.

678	              Section 2.11

680	   [UnicodeB]
681	              The Unicode Consortium, "The Unicode Standard, Version
682	              12.1", May 2019.

684	              Section 3.6, definition D52

686	Appendix A.  Change Log

688	   RFC Editor: Please remove this appendix before publication.

690	A.1.  Changes from version -00 (2017-03-11) to -01

692	   o  Added Acknowledgments and adjusted references.

694	   o  Filled in Section 5 with updates to respond to errata.

696	   o  Added Section 6 to discuss relationships to other documents.

698	   o  Modified the Abstract to note specifically updated documents.

700	   o  Several small editorial changes and corrections.

702	A.2.  Changes from version -01 (2017-09-12) to -02

704	   After a pause of nearly 34 months due to inability to get this draft
705	   processed, including nearly a year waiting for a new directorate to
706	   actually do anything of substance about fundamental IDNA issues, the
707	   -02 version was posted in the hope of getting a new start.  Specific
708	   changes include:

710	   o  Added a new section, Section 4, and some introductory material to
711	      address the very practical issue that domains run on a for-profit
712	      basis are unlikely to follow the very strict "understand what you
713	      are registering" requirement if they support IDNs at all and
714	      expect to profit from them.

716	   o  Added a pointer to draft-klensin-idna-unicode-review to the
717	      discussion of other work.

719	   o  Editorial corrections and changes.

721	A.3.  Changes from version -02 (2019-07-06) to -03

723	   o  Minor editorial changes in response to shepherd review.

725	   o  Additional references.

727	A.4.  Changes from version -03 (2019-07-22) to -04

729	   o  Editorial changes after AD review and some additional changes to
730	      improve clarity.

732	A.5.  Changes from version -04 (2019-08-02) to -05

734	   o  Small editorial corrections, many to correct glitches found during
735	      IETF Last Call.

737	   o  Updated acknowledgments, particularly to reflect reviews in Last
738	      Call.

740	A.6.  Changes from version -05 (2019-08-29) to -06

742	   Other than some small editorial adjustments, these changes made
743	   after, and reflect, IESG post-last-call review and comments.  To the
744	   extent it was possible to do so without making this document
745	   inconsistent with the other IDNA documents, established IETF,
746	   Unicode, and ICANN community i18n terminology, or well-established
747	   IDNA or i18n practices, the first author believes that the document
748	   responds to all previously-outstanding IESG substantive comments.

750	   o  Fixed a remaining citation issue with a Unicode document.  This
751	      version has not been updated to reflect Unicode 13, but the
752	      document should be adjusted so that all references are
753	      contemporary at the time of publication.

755	   o  Added reference to homograph attacks, and slightly adjusted
756	      discussion of them, per discussion with IESG post-last-call.

758	   o  Removed pointer to RFC 5890 from discussion of mixed-script labels
759	      in Section 3.

761	   o  Rewrote parts of Section 4 to eliminate the term "for-profit" and
762	      clarify the issues.

764	   o  Removed pointer to draft-klensin-idna-unicode-review because RFC
765	      8753 has been published and is therefore no longer pending /
766	      parallel work.

768	   o  Rewrote Section 6 to make the relationships among various
769	      documents and efforts somewhat more clear.

771	   o  References to RFCs 5893 and 6912 moved from Informative to
772	      Normative.

774	Authors' Addresses

776	   John C Klensin
777	   1770 Massachusetts Ave, Ste 322
778	   Cambridge, MA  02140
779	   USA

781	   Phone: +1 617 245 1457
782	   Email: john-ietf@jck.com

784	   Asmus Freytag
785	   ASMUS, Inc.

787	   Email: asmus@unicode.org