idnits 2.17.1 

draft-klensin-idna-rfc5891bis-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC5894, but the
     abstract doesn't seem to directly say this.  It does mention RFC5894
     though, so this could be OK.

  -- The draft header indicates that this document updates RFC5890, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC5891, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC5890, updated by this document, for
     RFC5378 checks: 2008-10-14)

     (Using the creation date from RFC5891, updated by this document, for
     RFC5378 checks: 2008-05-22)

     (Using the creation date from RFC5894, updated by this document, for
     RFC5378 checks: 2008-05-13)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (August 2, 2019) is 1730 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Unicode' is mentioned on line 466, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-LGR3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-MSR4'

  ** Downref: Normative reference to an Informational RFC: RFC 1591

  -- Duplicate reference: RFC5891, mentioned in 'RFC5891Erratum', was also
     mentioned in 'RFC5891'.

  ** Downref: Normative reference to an Informational RFC: RFC 5894

  -- No information found for draft-lgr-procedure-20mar13-en - is the name
     correct?

  -- Duplicate reference: RFC5890, mentioned in 'RFC-Editor-5890Errata', was
     also mentioned in 'RFC5890'.


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Klensin
3	Internet-Draft
4	Updates: 5890, 5891, 5894 (if approved)                       A. Freytag
5	Intended status: Standards Track                             ASMUS, Inc.
6	Expires: February 3, 2020                                 August 2, 2019

8	    Internationalized Domain Names in Applications (IDNA): Registry
9	                    Restrictions and Recommendations
10	                   draft-klensin-idna-rfc5891bis-04

12	Abstract

14	   The IDNA specifications for internationalized domain names combine
15	   rules that determine the labels that are allowed in the DNS without
16	   violating the protocol itself and an assignment of responsibility,
17	   consistent with earlier specifications, for determining the labels
18	   that are allowed in particular zones.  Conformance to IDNA by
19	   registries and other implementations requires both parts.  Experience
20	   strongly suggests that the language describing those responsibilities
21	   was insufficiently clear to promote safe and interoperable use of the
22	   specifications and that more details and discussion of circumstances
23	   would have been helpful.  Without making any substantive changes to
24	   IDNA, this specification updates two of the core IDNA documents (RFC
25	   5980 and 5891) and the IDNA explanatory document (RFC 5894) to
26	   provide that guidance and to correct some technical errors in the
27	   descriptions.

29	Status of This Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at https://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on February 3, 2020.

46	Copyright Notice

48	   Copyright (c) 2019 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (https://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
64	   2.  Registry Restrictions in IDNA2008 . . . . . . . . . . . . . .   4
65	   3.  Progressive Subsets of Allowed Characters . . . . . . . . . .   5
66	   4.  Considerations for For-Profit Domains . . . . . . . . . . . .   7
67	   5.  Other corrections and updates . . . . . . . . . . . . . . . .   9
68	     5.1.  Updates to RFC 5890 . . . . . . . . . . . . . . . . . . .   9
69	     5.2.  Updates to RFC 5891 . . . . . . . . . . . . . . . . . . .  10
70	   6.  Related Discussions . . . . . . . . . . . . . . . . . . . . .  10
71	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
72	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  11
73	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
74	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
75	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  11
76	     10.2.  Informative References . . . . . . . . . . . . . . . . .  12
77	   Appendix A.  Change Log . . . . . . . . . . . . . . . . . . . . .  15
78	     A.1.  Changes from version -00 (2017-03-11) to -01  . . . . . .  15
79	     A.2.  Changes from version -01 (2017-09-12) to -02  . . . . . .  15
80	     A.3.  Changes from version -02 (2019-07-06) to -03  . . . . . .  15
81	     A.4.  Changes from version -03 (2019-07-22) to -04  . . . . . .  15
82	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

84	1.  Introduction

86	   Parts of the specifications for Internationalized Domain Names in
87	   Applications (IDNA) [RFC5890] [RFC5891] [RFC5894] (collectively
88	   known, along with RFC 5892 [RFC5892], RFC 5893 [RFC5893] and updates
89	   to them, as "IDNA2008" (or just "IDNA") impose a requirement that
90	   domain name system (DNS) registries restrict the characters they
91	   allow in domain name labels (see Section 2 below), and the contents
92	   and structure of those labels.  That requirement and restriction are
93	   consistent with the "duty to serve the community" described in the
94	   original specification for DNS naming and authority [RFC1591].  The
95	   restrictions are intended to limit the permitted characters and
96	   strings to those for which the registries or their advisers have a
97	   thorough understanding and for which they are willing to take
98	   responsibility.

100	   That provision is centrally important because it recognized that
101	   historical relationships and variations among scripts and writing
102	   systems, the continuing evolution of those systems, differences in
103	   the uses of characters among languages (and locations) that use the
104	   same script, and so on make it impossible for a single list of
105	   characters and simple rules to be able to generate an "if we use
106	   these, we will be safe from confusion and various attacks" guideline.

108	   Instead, the algorithm and rules of RFC 5981 and 5982 eliminate many
109	   of the most dangerous and otherwise problematic cases, but cannot
110	   eliminate the need for registries and registrars to understand what
111	   they are doing and taking responsibility for the decisions they make.

113	   The way in which the IDNA2008 specifications expressed these
114	   requirements may have under emphasized the intention that they
115	   actually are requirements.  Section 2.3.2.3 of the Definitions
116	   document [RFC5890] mentions the need for the restrictions, indicates
117	   that they are mandatory, and points the reader to section 4.3 of the
118	   Protocol document [RFC5891], which in turn points to Section 3.2 of
119	   the Rationale document [RFC5894], with each document providing
120	   further detail, discussion, and clarification.

122	   At the same time, the Internet has evolved significantly since the
123	   management assumptions for the DNS were established with RFC 1591 and
124	   earlier.  In particular, the management and use of domain names have
125	   gone through several transformations.  Recounting of those changes is
126	   beyond the scope of this document but one of them has had significant
127	   practical impact on the degree to which the requirement for registry
128	   knowledge and responsibility is observed in practice.  When RFC 1591
129	   was written, the assumption was that domains at all levels of the DNS
130	   would be operated in the best interest of the registrants in the
131	   domain and of the Internet as a whole.  There were no notions about
132	   domains being operated for a profit, much less with a business model
133	   that made them more profitable the more names that could be
134	   registered (or even, under some circumstances, reserved and not
135	   registered).  At the time RFC 1501 was written, there was also no
136	   notion that domains would be considered more successful based on the
137	   number of names registered and delegated from them.  While rarely
138	   reflected in the DNS protocols, the distinction between domains
139	   operated in those ways and ones that are operated for, e.g., use
140	   within an enterprise or otherwise as a service have become very
141	   important today.  See Section 4 for a discussion on how those issues
142	   affect this specification.

144	   This specification is intended to unify and clarify these
145	   requirements for registry decisions and responsibility and to
146	   emphasize the importance of registry restrictions at all levels of
147	   the DNS.  It also makes a specific recommendation for character
148	   repertoire subsetting intermediate between the code points allowed by
149	   RFC 5891 and 5892 and those allowed by individual registries.  It
150	   does not alter the basic IDNA2008 protocols and rules themselves in
151	   any way.

153	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
154	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
155	   document are to be interpreted as described in RFC 2119 [RFC2119].

157	2.  Registry Restrictions in IDNA2008

159	   As mentioned above, IDNA2008 specifies that the registries for each
160	   zone in the DNS that supports IDN labels are required to develop and
161	   apply their own rules to restrict the allowable labels, including
162	   limiting characters they allow to be used in labels in that zone.
163	   The chosen list MUST be a subset of the collection of code points
164	   specified as "PVALID", "CONTEXTJ", and "CONTEXTO" by the rules
165	   established by the protocols themselves.  Labels containing any
166	   characters from the two CONTEXT categories or any characters that are
167	   normally part of a script written right to left [RFC5893] require
168	   that additional rules, specified in the protocols and known as
169	   "contextual rules" and "bidi rules", be applied.  The entire
170	   collection of rules and restrictions required by the IDNA2008
171	   protocols themselves are known as "protocol restrictions".

173	   As mentioned above, registries may apply (and generally are required
174	   to apply) additional rules to further restrict the list of permitted
175	   code points, contextual rules (perhaps applied to normally PVALID
176	   code points) that apply additional restrictions, and/or restrictions
177	   on labels as distinct from code points.  The most obvious of those
178	   restrictions include provisions for restricting suggested new
179	   registrations based on conflicts with labels already registered in
180	   the zone and specifications of what constitutes such conflicts based
181	   on the properties of the labels in question.  The definition of
182	   "conflict" is outside the scope of this document.  They further
183	   include prohibitions on code points and labels that are not
184	   consistent with the intended function of the zone, the subtree in
185	   which the zone is embedded (see Section 3), or limitations on where
186	   allowable code points may be placed in a label.

188	   These per-registry (or per-zone) rules are commonly known as
189	   "registry restrictions" to distinguish them from the protocol
190	   restrictions described above.  By necessity, the latter are somewhat
191	   generic, having to cater both to the union of the needs for all zones
192	   as well as to the desires of the most permissive zones.  In
193	   consequence, additional registry restrictions are essential to
194	   provide for the necessary security in the face of the tremendous
195	   variations and differences in writing systems, their ongoing
196	   evolution and development, as well as the human ability to recognize
197	   and distinguish characters in different scripts around the world and
198	   under different circumstances.

200	3.  Progressive Subsets of Allowed Characters

202	   The algorithm and rules of RFC 5891 and 5892 determine the set of
203	   code points that are possible for inclusion in domain name labels;
204	   registries MUST NOT permit code points in labels unless they are part
205	   of that set.  Labels that contain code points that are normally
206	   written from right to left MUST also conform to the requirements of
207	   RFC 5893.  Each registry that intends to allow IDN registrations MUST
208	   then determine the strict subset of that set of code points that will
209	   be allowed by that registry.  It SHOULD also consider additional
210	   rules, including contextual and whole label restrictions that provide
211	   further protection for registrants and users.  For example, the
212	   widely-used principle that bars labels containing characters from
213	   more than one script is not an IDNA2008 requirement.  It has been
214	   adopted by many registries but, as Section 4.4 of RFC 5890 indicates,
215	   there may be circumstances in which is it not required or
216	   appropriate.

218	   In formulating their own rules, registries SHOULD normally consult
219	   carefully-developed consensus recommendations about global maximum
220	   repertoires to be used such as the ICANN Maximal Starting Repertoire
221	   4 (MSR-4) for the Development of Label Generation Rules for the Root
222	   Zone [ICANN-MSR4] (or its successor documents).  Additional
223	   recommendations of similar quality about particular scripts or
224	   languages exist, including, but not limited to, the RFCs for Cyrillic
225	   [RFC5992], Arabic Language [RFC5564], or script-based repertoires
226	   from the approved ICANN Root Zone Label Generation Rules (LGR-3)
227	   [ICANN-LGR3] (or its successor documents).  Many of these
228	   recommendations also cover rules about relationships among code
229	   points that may be particularly important for complex scripts.  They
230	   also interact with recommendations about how labels that appear to
231	   the the same or apparently the same should be handled.

233	   It is the responsibility of the registry to determine which, if any,
234	   of those recommendations are applicable and to further subset or
235	   extend them as needed.  For example, several of the recommendations
236	   are designed for the root zone and therefore exclude digits and
237	   U+002D HYPHEN-MINUS; this restriction is not generally appropriate
238	   for other zones.  On the other hand, some zones may be designed to
239	   not cater for all users of a given script, but perhaps only for the
240	   needs of selected languages, in which case a more selective
241	   repertoire may be appropriate.

243	   In making these determinations, a registry SHOULD follow the IAB
244	   guidance in RFC 6912 [RFC6912].  Those guidelines include a number of
245	   principles for use in making decisions about allowable code points.
246	   In addition, that document notes that the closer a particular zone is
247	   to the root, the more restrictive the space of permitted labels
248	   should be.  RFC 5894 provides some suggestions for any registry that
249	   may decide to reduce opportunities for confusion or attacks by
250	   constructing policies that disallow characters used in historic
251	   writing systems (whether these be archaic scripts or extensions of
252	   modern scripts for historic or obsolete orthographies) or characters
253	   whose use is restricted to specialized, or highly technical contexts.
254	   These suggestions were among the principles guiding the design of
255	   ICANN's Maximal Starting Repertoires (MSR) [LGR-Procedure].

257	   A registry decision to allow only those code points in the full
258	   repertoire of the MSR (plus digits and hyphen) would already avoid a
259	   number of issues inherent in a more permissive policy such as "use
260	   anything permitted by IDNA2008", while still supporting the native
261	   languages and scripts for the vast majority of users today.  However,
262	   it is unlikely, by itself, to fully satisfy the mandate set out above
263	   for three reasons.

265	   1.  The MSR, like the set of code points permissible under IDNA2008
266	       itself, was conceived merely as a boundary condition on
267	       permissible letter code points (it excludes digits and the
268	       hyphen).  It was always intended to be used as a starting point
269	       for setting registry policy, with the expectation that some of
270	       the code points in the MSR would not be included in the final
271	       registry policy, whether for lack of actual usage, or for being
272	       inherently problematic.

274	   2.  It was recognized that many scripts require contextual rules for
275	       many more code points than are covered by CONTEXTO or CONTEXTJ
276	       rules defined in IDNA2008.  This is particularly true for
277	       combining marks, typically used to encode diacritics, tone marks,
278	       vowel signs and the like.  While, theoretically, any combining
279	       mark may occur in any context in Unicode, in practice rendering
280	       and other software that users rely on in viewing or entering
281	       labels will not support arbitrary combining sequences, or indeed
282	       arbitrary combinations of code points, in the case of complex
283	       scripts.

285	       Contextual rules are needed in order to limit allowable code
286	       point sequences to those that can be expected to be rendered
287	       reliably.  Identifying those requires knowledge about the way
288	       code points are used in a script, whence the mandate for
289	       registries to only support code points they understand.  In this,
290	       some of the other recommendations, such as the Informational RFCs
291	       for specific scripts (e.g., Cyrillic [RFC5992]) or languages
292	       (e.g., Arabic [RFC5564] or Chinese [RFC4713]), or the Root Zone
293	       LGRs developed by ICANN, may provide useful guidance.

295	   3.  Third, because of the widely accepted practice of limiting any
296	       given label to a single script, a universal repertoire, such as
297	       the MSR, would have to be divided on a per-script basis into
298	       subrepertoires to make it useful, with some of those repertoires
299	       overlapping, for example, in the case of East Asian shared usage
300	       of the Han ideographs.

302	   Registries choosing to make exceptions -- allow code points that
303	   recommendations such as the MSR do not allow -- should make such
304	   decisions only with great care and only if they have considerable
305	   understanding of, and great confidence in, their appropriateness.
306	   The obvious exception from the MSR would be to allow digits and the
307	   hyphen.  Neither were allowed by the MSR, but only because they are
308	   not allowed in the Root Zone.

310	   Nothing in this document permits a registry to allow code points or
311	   labels that are disallowed or otherwise prohibited by IDNA2008.

313	4.  Considerations for For-Profit Domains

315	   As discussed in the Introduction (Section 1), the distributed
316	   administrative structure of the DNS today can be described by
317	   dividing zones into two categories depending on how they are
318	   administered and for whom.  These categories are not precise -- some
319	   zones may not fall neatly into one category or the other -- but are
320	   useful in understanding the practical applicability of this
321	   specification.  They are:

323	      Zones operating primarily or exclusively within an organization or
324	      enterprise and responsible to that organization or enterprise.
325	      DNS operations, including registrations and delegations, will
326	      typically occur in support of the purpose of that organization or
327	      enterprise rather than being its primary purpose.

329	      Zones operating primarily on a for-profit basis in which most
330	      delegations of subdomains are to entities with little or no
331	      affiliation with the registry operator other than contractual
332	      agreements about operation of those subdomains.  These zones are
333	      often known as "public domains" or with similar terms, but those
334	      terms often have other semantics and may not cover all cases.

336	   Rules requiring strict registry responsibility, including either
337	   thorough understanding of scripts and related issues in domain name
338	   labels being considered for registration or local naming rules that
339	   have the same effect, typically come naturally to registries for
340	   zones of the first type.  Registration of labels that would prove
341	   problematic for any reason hurts the relevant organization or
342	   enterprise or its customers.  More generally, there are strong
343	   incentives to be extremely conservative about labels that might be
344	   registered and few, if any, incentives favoring adventures into
345	   labels that might be considered clever, much less ones that are hard
346	   to type, render, or, where it is relevant to users, remember
347	   correctly.

349	   By contrast, in a for-profit zone in which the profits are limited to
350	   selling names, there may be perceived incentives to register whatever
351	   names would-be registrants "want" or fears that any restrictions will
352	   cut into the available namespace.  In such situations, restrictions
353	   are unlikely to be applied unless they meet at least one of two
354	   criteria: (i) they are easy to apply and can be applied
355	   algorithmically or otherwise automatically and/or (ii) there is clear
356	   evidence that the particular label would cause harm.

358	   As suggested above, the two categories above are not precise.  In
359	   particular, there may be domains that, despite being set up to
360	   operate at a profit, are sufficiently conservative about their
361	   operations to more closely resemble the first group in practice than
362	   the second one.

364	   The requirement of IDNA that is discussed at length elsewhere in this
365	   specification stands: IDNA (and IDNs generally) would work better and
366	   Internet users would be better protected and more secure if
367	   registries and registrars (of any type) confined their registrations
368	   to scripts and code point sequences that they understood thoroughly.
369	   While the IETF rarely gives advice to those who choose to violate
370	   IETF Standards, some advice to zones in the second category above may
371	   be in order.  That advice is that significant conservatism in what is
372	   allowed to be registered, even for reservation purposes, and even
373	   more conservatism about what labels are actually entered into zones
374	   and delegated, is the best option for the Internet and its users.  If
375	   practical considerations do not allow that much conservatism, then it
376	   is desirable to consult and utilize the many lists and tables that
377	   have been, and continue to be, developed to advise on what might be
378	   sensible for particular scripts and languages.  These include ICANN's
379	   twin efforts of creating per-script Root Zone Label Generation Rules
380	   [RZ-LGR-3] and Second Level Reference Label Generation Rules

382	   [SL-REF-LGR] (the latter of which may be per language).  They also
383	   include other lists of code points or code point relationships that
384	   may be particularly problematic and that should be treated with extra
385	   caution or prohibited entirely such as the proposed "troublesome
386	   character" list [Freytag-troublesome].  See also Section 6 below.

388	5.  Other corrections and updates

390	   After the initial IDNA2008 documents were published (and RFC 5892 was
391	   updated for Unicode 6.0 by RFC 6452 [RFC6452]) several errors or
392	   instances of confusing text were noted.  For the convenience of the
393	   community, the relevant corrections for RFC 5890 and 5891 are noted
394	   below and update the corresponding documents.  There are no errata
395	   for RFC 5893 or 5894 as of the date this document was published.
396	   Because further updates to RFC 5892 would require addressing other
397	   pending issues, the outstanding erratum for that document is not
398	   considered here.  For consistency with the original documents,
399	   references to Unicode 5.0 are preserved in this document.

401	   Readers should note that an update to RFC 5892 that is primarily
402	   concerned with the review process for new versions of Unicode but
403	   that makes some additional patches
404	   [ID.draft-klensin-idna-unicode-review] is in progress.  Its status
405	   should be checked in conjunction with application of the present
406	   specification.

408	5.1.  Updates to RFC 5890

410	   The outstanding errata against RFC 5890 (Errata ID 4695, 4696, 4823,
411	   and 4824 [RFC-Editor-5890Errata]) are all associated with the same
412	   issue, the number of Unicode characters that can be associated with a
413	   maximum-length (63 octet) A-label.  In retrospect and contrary to
414	   some of the suggestions in the errata, that value should not be
415	   expressed in octets because RFC 5890 and the other IDNA 2008
416	   documents are otherwise careful to not specify Unicode encoding forms
417	   but, instead, work exclusively with Unicode code points.
418	   Consequently the relevant material in RFC 5890 should be corrected as
419	   follows:

421	   Section 2.3.2.1

423	      Old:  expansion of the A-label form to a U-label may produce
424	         strings that are much longer than the normal 63 octet DNS limit
425	         (potentially up to 252 characters).

427	      New:  expansion of the A-label form to a U-label may produce
428	         strings that are much longer than the normal 63 octet DNS limit
429	         (See Section 4.2).

431	      Comment:  If the length limit is going to be a source of confusion
432	         or careful calculations, it should appear in only one place.

434	   Section 4.2

436	      Old:  Because A-labels (the form actually used in the DNS) are
437	         potentially much more compressed than UTF-8 (and UTF-8 is, in
438	         general, more compressed that UTF-16 or UTF-32), U-labels that
439	         obey all of the relevant symmetry (and other) constraints of
440	         these documents may be quite a bit longer, potentially up to
441	         252 characters (Unicode code points).

443	      New:  A-labels (the form actually used in the DNS) and the
444	         Punycode algorithm used as part of the process to produce them
445	         [RFC3492] are strings that are potentially much more compressed
446	         than any standard Unicode Encoding Form.  A 63 octet A-label
447	         cannot represent more than 58 Unicode code points (four octet
448	         overhead and the requirement that at least one character lie
449	         outside the ASCII range) but implementations allocating buffer
450	         space for the conversion should allow significantly more space
451	         depending on the encoding form they are using.

453	5.2.  Updates to RFC 5891

455	   Errata ID 3969: Improve reference for combining marks.  There is only
456	      one erratum for RFC 5891, Errata ID 3969 [RFC5891Erratum].
457	      Combining marks are explained in the cited section, but not, as
458	      the text indicates, exactly defined.

460	      Old:  The Unicode string MUST NOT begin with a combining mark or
461	         combining character (see The Unicode Standard, Section 2.11
462	         [UnicodeA] for an exact definition).

464	      New:  The Unicode string MUST NOT begin with a combining mark or
465	         combining character (see The Unicode Standard, Section 2.11
466	         [Unicode] for an explanation and Section 3.6, definition D52)
467	         for an exact definition).

469	      Comment:  When RFC 5891 is actually updated, the references in the
470	         text should be updated to the current version of Unicode and
471	         the section numbers checked.

473	6.  Related Discussions

475	   This document is one of a series of measures that have been suggested
476	   to address IDNA issues raised in other documents, including
477	   mechanisms for dealing with combining sequences and single-code point
478	   characters with the same appearance that normalization neither
479	   combines nor decomposes as IDNA2008 assumed [IDNA-Unicode], including
480	   the IAB response to that issue [IAB-2015], and to take a higher-level
481	   view of issues, demands, and proposals for new uses of the DNS.
482	   Those documents also include a discussion of issues with IDNA and
483	   character graphemes for which abstractions exist in Unicode in
484	   precomposed form but that can be generated from combining sequences
485	   and a suggested registry of code points known to be problematic
486	   [Freytag-troublesome].  The discussion of combining sequences and
487	   non-decomposing characters is intended to lay the foundation for an
488	   actual update to the IDNA code points document [RFC5892].  Such an
489	   update will presumably also address the existing errata against that
490	   document.

492	7.  Security Considerations

494	   As discussed in IAB recommendations about internationalized domain
495	   names [RFC4690], [RFC6912], and elsewhere, poor choices of strings
496	   for DNS labels can lead to opportunities for attacks, user confusion,
497	   and other issues less directly related to security.  This document
498	   clarifies the importance of registries carefully establishing design
499	   policies for the labels they will allow and that having such policies
500	   and taking responsibility for them is a requirement, not an option.
501	   If that clarification is useful in practice, the result should be an
502	   improvement in security.

504	8.  Acknowledgments

506	   Many thanks to Patrik Faltstrom who provided an important review on
507	   the initial version.

509	9.  IANA Considerations

511	   [[CREF1: RFC Editor: Please remove this section before publication.]]

513	   This memo includes no requests to or actions for IANA.  In
514	   particular, it does not contain any provisions that would alter any
515	   IDNA-related registries or tables.

517	10.  References

519	10.1.  Normative References

521	   [ICANN-LGR3]
522	              ICANN, "Root Zone Label Generation Rules (LGR-1)", July
523	              2019,
524	              <https://www.icann.org/news/announcement-2-2019-04-25-en>.

526	   [ICANN-MSR4]
527	              ICANN, "Maximal Starting Repertoire Version 4 (MSR-4) for
528	              the Development of Label Generation Rules for the Root
529	              Zone", January 2019,
530	              <https://www.icann.org/news/announcement-2019-02-07-en>.

532	   [RFC1591]  Postel, J., "Domain Name System Structure and Delegation",
533	              RFC 1591, DOI 10.17487/RFC1591, March 1994,
534	              <https://www.rfc-editor.org/info/rfc1591>.

536	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
537	              Requirement Levels", BCP 14, RFC 2119,
538	              DOI 10.17487/RFC2119, March 1997,
539	              <https://www.rfc-editor.org/info/rfc2119>.

541	   [RFC5890]  Klensin, J., "Internationalized Domain Names for
542	              Applications (IDNA): Definitions and Document Framework",
543	              RFC 5890, DOI 10.17487/RFC5890, August 2010,
544	              <https://www.rfc-editor.org/info/rfc5890>.

546	   [RFC5891]  Klensin, J., "Internationalized Domain Names in
547	              Applications (IDNA): Protocol", RFC 5891,
548	              DOI 10.17487/RFC5891, August 2010,
549	              <https://www.rfc-editor.org/info/rfc5891>.

551	   [RFC5891Erratum]
552	              "RFC 5891, "Internationalized Domain Names in Applications
553	              (IDNA): Protocol"", Errata ID 3969, April 2014,
554	              <http://www.rfc-editor.org/errata_search.php?rfc=5891>.

556	   [RFC5894]  Klensin, J., "Internationalized Domain Names for
557	              Applications (IDNA): Background, Explanation, and
558	              Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
559	              <https://www.rfc-editor.org/info/rfc5894>.

561	10.2.  Informative References

563	   [Freytag-troublesome]
564	              Freytag, A., Klensin, J., and A. Sullivan, "Those
565	              Troublesome Characters: A Registry of Unicode Code Points
566	              Needing Special Consideration When Used in Network
567	              Identifiers", June 2017, <draft-freytag-troublesome-
568	              characters-01>.

570	   [IAB-2015]
571	              Internet Architecture Board (IAB), "IAB Statement on
572	              Identifiers and Unicode 7.0.0", February 2015,
573	              <https://www.iab.org/documents/
574	              correspondence-reports-documents/2015-2/
575	              iab-statement-on-identifiers-and-unicode-7-0-0/>.

577	   [ID.draft-klensin-idna-unicode-review]
578	              Klensin, J. and P. Faltstrom, "IDNA Review for New Unicode
579	              Versions", June 2019, <https://datatracker.ietf.org/doc/
580	              draft-klensin-idna-unicode-review/>.

582	   [IDNA-Unicode]
583	              Klensin, J. and P. Falstrom, "IDNA Update for Unicode
584	              7.0.0", September 2017, <draft-klensin-idna-5892upd-
585	              unicode70-05>.

587	   [LGR-Procedure]
588	              Internet Corporation for Assigned Names and Numbers
589	              (ICANN), "Procedure to Develop and Maintain the Label
590	              Generation Rules for the Root Zone in Respect of IDNA
591	              Labels", March 2013,
592	              <https://www.icann.org/en/system/files/files/
593	              draft-lgr-procedure-20mar13-en.pdf>.

595	   [RFC-Editor-5890Errata]
596	              RFC Editor, "RFC Errata: RFC 5890, "Internationalized
597	              Domain Names for Applications (IDNA): Definitions and
598	              Document Framework", August 2010", Note to RFC
599	              Editor: Please figure out how you would like this
600	              referenced and make it so., Captured 2017-09-10, 2016,
601	              <https://www.rfc-editor.org/errata_search.php?rfc=5890>.

603	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
604	              for Internationalized Domain Names in Applications
605	              (IDNA)", RFC 3492, DOI 10.17487/RFC3492, March 2003,
606	              <https://www.rfc-editor.org/info/rfc3492>.

608	   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
609	              Recommendations for Internationalized Domain Names
610	              (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006,
611	              <https://www.rfc-editor.org/info/rfc4690>.

613	   [RFC4713]  Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin,
614	              "Registration and Administration Recommendations for
615	              Chinese Domain Names", RFC 4713, DOI 10.17487/RFC4713,
616	              October 2006, <https://www.rfc-editor.org/info/rfc4713>.

618	   [RFC5564]  El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
619	              "Linguistic Guidelines for the Use of the Arabic Language
620	              in Internet Domains", RFC 5564, DOI 10.17487/RFC5564,
621	              February 2010, <https://www.rfc-editor.org/info/rfc5564>.

623	   [RFC5892]  Faltstrom, P., Ed., "The Unicode Code Points and
624	              Internationalized Domain Names for Applications (IDNA)",
625	              RFC 5892, DOI 10.17487/RFC5892, August 2010,
626	              <https://www.rfc-editor.org/info/rfc5892>.

628	   [RFC5893]  Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
629	              for Internationalized Domain Names for Applications
630	              (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010,
631	              <https://www.rfc-editor.org/info/rfc5893>.

633	   [RFC5992]  Sharikov, S., Miloshevic, D., and J. Klensin,
634	              "Internationalized Domain Names Registration and
635	              Administration Guidelines for European Languages Using
636	              Cyrillic", RFC 5992, DOI 10.17487/RFC5992, October 2010,
637	              <https://www.rfc-editor.org/info/rfc5992>.

639	   [RFC6452]  Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code
640	              Points and Internationalized Domain Names for Applications
641	              (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452,
642	              November 2011, <https://www.rfc-editor.org/info/rfc6452>.

644	   [RFC6912]  Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman,
645	              "Principles for Unicode Code Point Inclusion in Labels in
646	              the DNS", RFC 6912, DOI 10.17487/RFC6912, April 2013,
647	              <https://www.rfc-editor.org/info/rfc6912>.

649	   [RZ-LGR-3]
650	              Internet Corporation for Assigned Names and Numbers, "Root
651	              Zone Label Generation Rules - LGR-3: Overview and Summary,
652	              Version 3", July 2019,
653	              <https://www.icann.org/sites/default/files/lgr/
654	              lgr-3-overview-10jul19-en.pdf>.

656	   [SL-REF-LGR]
657	              Internet Corporation for Assigned Names and Numbers
658	              (ICANN), "Second Level Label Generation Rules", 2019,
659	              <https://www.icann.org/resources/pages/
660	              second-level-lgr-2015-06-21-en>.

662	   [UnicodeA]
663	              The Unicode Consortium, "The Unicode Standard, Version
664	              12.1", May 2019.

666	              Section 2.11

668	Appendix A.  Change Log

670	   RFC Editor: Please remove this appendix before publication.

672	A.1.  Changes from version -00 (2017-03-11) to -01

674	   o  Added Acknowledgments and adjusted references.

676	   o  Filled in Section 5 with updates to respond to errata.

678	   o  Added Section 6 to discuss relationships to other documents.

680	   o  Modified the Abstract to note specifically updated documents.

682	   o  Several small editorial changes and corrections.

684	A.2.  Changes from version -01 (2017-09-12) to -02

686	   After a pause of nearly 34 months due to inability to get this draft
687	   processed, including nearly a year waiting for a new directorate to
688	   actually do anything of substance about fundamental IDNA issues, the
689	   -02 version was posted in the hope of getting a new start.  Specific
690	   changes include:

692	   o  Added a new section, Section 4, and some introductory material to
693	      address the very practical issue that domains run on a for-profit
694	      basis are unlikely to follow the very strict "understand what you
695	      are registering" requirement if they support IDNs at all and
696	      expect to profit from them.

698	   o  Added a pointer to draft-klensin-idna-unicode-review to the
699	      discussion of other work.

701	   o  Editorial corrections and changes.

703	A.3.  Changes from version -02 (2019-07-06) to -03

705	   o  Minor editorial changes in response to shepherd review.

707	   o  Additional references.

709	A.4.  Changes from version -03 (2019-07-22) to -04

711	   o  Editorial changes after AD review and some additional changes to
712	      improve clarity.

714	Authors' Addresses

716	   John C Klensin
717	   1770 Massachusetts Ave, Ste 322
718	   Cambridge, MA  02140
719	   USA

721	   Phone: +1 617 245 1457
722	   Email: john-ietf@jck.com

724	   Asmus Freytag
725	   ASMUS, Inc.

727	   Email: asmus@unicode.org