idnits 2.17.1 draft-klensin-idna-rfc5891bis-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC5894, but the abstract doesn't seem to directly say this. It does mention RFC5894 though, so this could be OK. -- The draft header indicates that this document updates RFC5890, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC5891, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5890, updated by this document, for RFC5378 checks: 2008-10-14) (Using the creation date from RFC5891, updated by this document, for RFC5378 checks: 2008-05-22) (Using the creation date from RFC5894, updated by this document, for RFC5378 checks: 2008-05-13) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 2, 2019) is 1730 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Unicode' is mentioned on line 466, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-LGR3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-MSR4' ** Downref: Normative reference to an Informational RFC: RFC 1591 -- Duplicate reference: RFC5891, mentioned in 'RFC5891Erratum', was also mentioned in 'RFC5891'. ** Downref: Normative reference to an Informational RFC: RFC 5894 -- No information found for draft-lgr-procedure-20mar13-en - is the name correct? -- Duplicate reference: RFC5890, mentioned in 'RFC-Editor-5890Errata', was also mentioned in 'RFC5890'. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft 4 Updates: 5890, 5891, 5894 (if approved) A. Freytag 5 Intended status: Standards Track ASMUS, Inc. 6 Expires: February 3, 2020 August 2, 2019 8 Internationalized Domain Names in Applications (IDNA): Registry 9 Restrictions and Recommendations 10 draft-klensin-idna-rfc5891bis-04 12 Abstract 14 The IDNA specifications for internationalized domain names combine 15 rules that determine the labels that are allowed in the DNS without 16 violating the protocol itself and an assignment of responsibility, 17 consistent with earlier specifications, for determining the labels 18 that are allowed in particular zones. Conformance to IDNA by 19 registries and other implementations requires both parts. Experience 20 strongly suggests that the language describing those responsibilities 21 was insufficiently clear to promote safe and interoperable use of the 22 specifications and that more details and discussion of circumstances 23 would have been helpful. Without making any substantive changes to 24 IDNA, this specification updates two of the core IDNA documents (RFC 25 5980 and 5891) and the IDNA explanatory document (RFC 5894) to 26 provide that guidance and to correct some technical errors in the 27 descriptions. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on February 3, 2020. 46 Copyright Notice 48 Copyright (c) 2019 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 64 2. Registry Restrictions in IDNA2008 . . . . . . . . . . . . . . 4 65 3. Progressive Subsets of Allowed Characters . . . . . . . . . . 5 66 4. Considerations for For-Profit Domains . . . . . . . . . . . . 7 67 5. Other corrections and updates . . . . . . . . . . . . . . . . 9 68 5.1. Updates to RFC 5890 . . . . . . . . . . . . . . . . . . . 9 69 5.2. Updates to RFC 5891 . . . . . . . . . . . . . . . . . . . 10 70 6. Related Discussions . . . . . . . . . . . . . . . . . . . . . 10 71 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 72 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 73 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 74 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 75 10.1. Normative References . . . . . . . . . . . . . . . . . . 11 76 10.2. Informative References . . . . . . . . . . . . . . . . . 12 77 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 15 78 A.1. Changes from version -00 (2017-03-11) to -01 . . . . . . 15 79 A.2. Changes from version -01 (2017-09-12) to -02 . . . . . . 15 80 A.3. Changes from version -02 (2019-07-06) to -03 . . . . . . 15 81 A.4. Changes from version -03 (2019-07-22) to -04 . . . . . . 15 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 84 1. Introduction 86 Parts of the specifications for Internationalized Domain Names in 87 Applications (IDNA) [RFC5890] [RFC5891] [RFC5894] (collectively 88 known, along with RFC 5892 [RFC5892], RFC 5893 [RFC5893] and updates 89 to them, as "IDNA2008" (or just "IDNA") impose a requirement that 90 domain name system (DNS) registries restrict the characters they 91 allow in domain name labels (see Section 2 below), and the contents 92 and structure of those labels. That requirement and restriction are 93 consistent with the "duty to serve the community" described in the 94 original specification for DNS naming and authority [RFC1591]. The 95 restrictions are intended to limit the permitted characters and 96 strings to those for which the registries or their advisers have a 97 thorough understanding and for which they are willing to take 98 responsibility. 100 That provision is centrally important because it recognized that 101 historical relationships and variations among scripts and writing 102 systems, the continuing evolution of those systems, differences in 103 the uses of characters among languages (and locations) that use the 104 same script, and so on make it impossible for a single list of 105 characters and simple rules to be able to generate an "if we use 106 these, we will be safe from confusion and various attacks" guideline. 108 Instead, the algorithm and rules of RFC 5981 and 5982 eliminate many 109 of the most dangerous and otherwise problematic cases, but cannot 110 eliminate the need for registries and registrars to understand what 111 they are doing and taking responsibility for the decisions they make. 113 The way in which the IDNA2008 specifications expressed these 114 requirements may have under emphasized the intention that they 115 actually are requirements. Section 2.3.2.3 of the Definitions 116 document [RFC5890] mentions the need for the restrictions, indicates 117 that they are mandatory, and points the reader to section 4.3 of the 118 Protocol document [RFC5891], which in turn points to Section 3.2 of 119 the Rationale document [RFC5894], with each document providing 120 further detail, discussion, and clarification. 122 At the same time, the Internet has evolved significantly since the 123 management assumptions for the DNS were established with RFC 1591 and 124 earlier. In particular, the management and use of domain names have 125 gone through several transformations. Recounting of those changes is 126 beyond the scope of this document but one of them has had significant 127 practical impact on the degree to which the requirement for registry 128 knowledge and responsibility is observed in practice. When RFC 1591 129 was written, the assumption was that domains at all levels of the DNS 130 would be operated in the best interest of the registrants in the 131 domain and of the Internet as a whole. There were no notions about 132 domains being operated for a profit, much less with a business model 133 that made them more profitable the more names that could be 134 registered (or even, under some circumstances, reserved and not 135 registered). At the time RFC 1501 was written, there was also no 136 notion that domains would be considered more successful based on the 137 number of names registered and delegated from them. While rarely 138 reflected in the DNS protocols, the distinction between domains 139 operated in those ways and ones that are operated for, e.g., use 140 within an enterprise or otherwise as a service have become very 141 important today. See Section 4 for a discussion on how those issues 142 affect this specification. 144 This specification is intended to unify and clarify these 145 requirements for registry decisions and responsibility and to 146 emphasize the importance of registry restrictions at all levels of 147 the DNS. It also makes a specific recommendation for character 148 repertoire subsetting intermediate between the code points allowed by 149 RFC 5891 and 5892 and those allowed by individual registries. It 150 does not alter the basic IDNA2008 protocols and rules themselves in 151 any way. 153 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 154 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 155 document are to be interpreted as described in RFC 2119 [RFC2119]. 157 2. Registry Restrictions in IDNA2008 159 As mentioned above, IDNA2008 specifies that the registries for each 160 zone in the DNS that supports IDN labels are required to develop and 161 apply their own rules to restrict the allowable labels, including 162 limiting characters they allow to be used in labels in that zone. 163 The chosen list MUST be a subset of the collection of code points 164 specified as "PVALID", "CONTEXTJ", and "CONTEXTO" by the rules 165 established by the protocols themselves. Labels containing any 166 characters from the two CONTEXT categories or any characters that are 167 normally part of a script written right to left [RFC5893] require 168 that additional rules, specified in the protocols and known as 169 "contextual rules" and "bidi rules", be applied. The entire 170 collection of rules and restrictions required by the IDNA2008 171 protocols themselves are known as "protocol restrictions". 173 As mentioned above, registries may apply (and generally are required 174 to apply) additional rules to further restrict the list of permitted 175 code points, contextual rules (perhaps applied to normally PVALID 176 code points) that apply additional restrictions, and/or restrictions 177 on labels as distinct from code points. The most obvious of those 178 restrictions include provisions for restricting suggested new 179 registrations based on conflicts with labels already registered in 180 the zone and specifications of what constitutes such conflicts based 181 on the properties of the labels in question. The definition of 182 "conflict" is outside the scope of this document. They further 183 include prohibitions on code points and labels that are not 184 consistent with the intended function of the zone, the subtree in 185 which the zone is embedded (see Section 3), or limitations on where 186 allowable code points may be placed in a label. 188 These per-registry (or per-zone) rules are commonly known as 189 "registry restrictions" to distinguish them from the protocol 190 restrictions described above. By necessity, the latter are somewhat 191 generic, having to cater both to the union of the needs for all zones 192 as well as to the desires of the most permissive zones. In 193 consequence, additional registry restrictions are essential to 194 provide for the necessary security in the face of the tremendous 195 variations and differences in writing systems, their ongoing 196 evolution and development, as well as the human ability to recognize 197 and distinguish characters in different scripts around the world and 198 under different circumstances. 200 3. Progressive Subsets of Allowed Characters 202 The algorithm and rules of RFC 5891 and 5892 determine the set of 203 code points that are possible for inclusion in domain name labels; 204 registries MUST NOT permit code points in labels unless they are part 205 of that set. Labels that contain code points that are normally 206 written from right to left MUST also conform to the requirements of 207 RFC 5893. Each registry that intends to allow IDN registrations MUST 208 then determine the strict subset of that set of code points that will 209 be allowed by that registry. It SHOULD also consider additional 210 rules, including contextual and whole label restrictions that provide 211 further protection for registrants and users. For example, the 212 widely-used principle that bars labels containing characters from 213 more than one script is not an IDNA2008 requirement. It has been 214 adopted by many registries but, as Section 4.4 of RFC 5890 indicates, 215 there may be circumstances in which is it not required or 216 appropriate. 218 In formulating their own rules, registries SHOULD normally consult 219 carefully-developed consensus recommendations about global maximum 220 repertoires to be used such as the ICANN Maximal Starting Repertoire 221 4 (MSR-4) for the Development of Label Generation Rules for the Root 222 Zone [ICANN-MSR4] (or its successor documents). Additional 223 recommendations of similar quality about particular scripts or 224 languages exist, including, but not limited to, the RFCs for Cyrillic 225 [RFC5992], Arabic Language [RFC5564], or script-based repertoires 226 from the approved ICANN Root Zone Label Generation Rules (LGR-3) 227 [ICANN-LGR3] (or its successor documents). Many of these 228 recommendations also cover rules about relationships among code 229 points that may be particularly important for complex scripts. They 230 also interact with recommendations about how labels that appear to 231 the the same or apparently the same should be handled. 233 It is the responsibility of the registry to determine which, if any, 234 of those recommendations are applicable and to further subset or 235 extend them as needed. For example, several of the recommendations 236 are designed for the root zone and therefore exclude digits and 237 U+002D HYPHEN-MINUS; this restriction is not generally appropriate 238 for other zones. On the other hand, some zones may be designed to 239 not cater for all users of a given script, but perhaps only for the 240 needs of selected languages, in which case a more selective 241 repertoire may be appropriate. 243 In making these determinations, a registry SHOULD follow the IAB 244 guidance in RFC 6912 [RFC6912]. Those guidelines include a number of 245 principles for use in making decisions about allowable code points. 246 In addition, that document notes that the closer a particular zone is 247 to the root, the more restrictive the space of permitted labels 248 should be. RFC 5894 provides some suggestions for any registry that 249 may decide to reduce opportunities for confusion or attacks by 250 constructing policies that disallow characters used in historic 251 writing systems (whether these be archaic scripts or extensions of 252 modern scripts for historic or obsolete orthographies) or characters 253 whose use is restricted to specialized, or highly technical contexts. 254 These suggestions were among the principles guiding the design of 255 ICANN's Maximal Starting Repertoires (MSR) [LGR-Procedure]. 257 A registry decision to allow only those code points in the full 258 repertoire of the MSR (plus digits and hyphen) would already avoid a 259 number of issues inherent in a more permissive policy such as "use 260 anything permitted by IDNA2008", while still supporting the native 261 languages and scripts for the vast majority of users today. However, 262 it is unlikely, by itself, to fully satisfy the mandate set out above 263 for three reasons. 265 1. The MSR, like the set of code points permissible under IDNA2008 266 itself, was conceived merely as a boundary condition on 267 permissible letter code points (it excludes digits and the 268 hyphen). It was always intended to be used as a starting point 269 for setting registry policy, with the expectation that some of 270 the code points in the MSR would not be included in the final 271 registry policy, whether for lack of actual usage, or for being 272 inherently problematic. 274 2. It was recognized that many scripts require contextual rules for 275 many more code points than are covered by CONTEXTO or CONTEXTJ 276 rules defined in IDNA2008. This is particularly true for 277 combining marks, typically used to encode diacritics, tone marks, 278 vowel signs and the like. While, theoretically, any combining 279 mark may occur in any context in Unicode, in practice rendering 280 and other software that users rely on in viewing or entering 281 labels will not support arbitrary combining sequences, or indeed 282 arbitrary combinations of code points, in the case of complex 283 scripts. 285 Contextual rules are needed in order to limit allowable code 286 point sequences to those that can be expected to be rendered 287 reliably. Identifying those requires knowledge about the way 288 code points are used in a script, whence the mandate for 289 registries to only support code points they understand. In this, 290 some of the other recommendations, such as the Informational RFCs 291 for specific scripts (e.g., Cyrillic [RFC5992]) or languages 292 (e.g., Arabic [RFC5564] or Chinese [RFC4713]), or the Root Zone 293 LGRs developed by ICANN, may provide useful guidance. 295 3. Third, because of the widely accepted practice of limiting any 296 given label to a single script, a universal repertoire, such as 297 the MSR, would have to be divided on a per-script basis into 298 subrepertoires to make it useful, with some of those repertoires 299 overlapping, for example, in the case of East Asian shared usage 300 of the Han ideographs. 302 Registries choosing to make exceptions -- allow code points that 303 recommendations such as the MSR do not allow -- should make such 304 decisions only with great care and only if they have considerable 305 understanding of, and great confidence in, their appropriateness. 306 The obvious exception from the MSR would be to allow digits and the 307 hyphen. Neither were allowed by the MSR, but only because they are 308 not allowed in the Root Zone. 310 Nothing in this document permits a registry to allow code points or 311 labels that are disallowed or otherwise prohibited by IDNA2008. 313 4. Considerations for For-Profit Domains 315 As discussed in the Introduction (Section 1), the distributed 316 administrative structure of the DNS today can be described by 317 dividing zones into two categories depending on how they are 318 administered and for whom. These categories are not precise -- some 319 zones may not fall neatly into one category or the other -- but are 320 useful in understanding the practical applicability of this 321 specification. They are: 323 Zones operating primarily or exclusively within an organization or 324 enterprise and responsible to that organization or enterprise. 325 DNS operations, including registrations and delegations, will 326 typically occur in support of the purpose of that organization or 327 enterprise rather than being its primary purpose. 329 Zones operating primarily on a for-profit basis in which most 330 delegations of subdomains are to entities with little or no 331 affiliation with the registry operator other than contractual 332 agreements about operation of those subdomains. These zones are 333 often known as "public domains" or with similar terms, but those 334 terms often have other semantics and may not cover all cases. 336 Rules requiring strict registry responsibility, including either 337 thorough understanding of scripts and related issues in domain name 338 labels being considered for registration or local naming rules that 339 have the same effect, typically come naturally to registries for 340 zones of the first type. Registration of labels that would prove 341 problematic for any reason hurts the relevant organization or 342 enterprise or its customers. More generally, there are strong 343 incentives to be extremely conservative about labels that might be 344 registered and few, if any, incentives favoring adventures into 345 labels that might be considered clever, much less ones that are hard 346 to type, render, or, where it is relevant to users, remember 347 correctly. 349 By contrast, in a for-profit zone in which the profits are limited to 350 selling names, there may be perceived incentives to register whatever 351 names would-be registrants "want" or fears that any restrictions will 352 cut into the available namespace. In such situations, restrictions 353 are unlikely to be applied unless they meet at least one of two 354 criteria: (i) they are easy to apply and can be applied 355 algorithmically or otherwise automatically and/or (ii) there is clear 356 evidence that the particular label would cause harm. 358 As suggested above, the two categories above are not precise. In 359 particular, there may be domains that, despite being set up to 360 operate at a profit, are sufficiently conservative about their 361 operations to more closely resemble the first group in practice than 362 the second one. 364 The requirement of IDNA that is discussed at length elsewhere in this 365 specification stands: IDNA (and IDNs generally) would work better and 366 Internet users would be better protected and more secure if 367 registries and registrars (of any type) confined their registrations 368 to scripts and code point sequences that they understood thoroughly. 369 While the IETF rarely gives advice to those who choose to violate 370 IETF Standards, some advice to zones in the second category above may 371 be in order. That advice is that significant conservatism in what is 372 allowed to be registered, even for reservation purposes, and even 373 more conservatism about what labels are actually entered into zones 374 and delegated, is the best option for the Internet and its users. If 375 practical considerations do not allow that much conservatism, then it 376 is desirable to consult and utilize the many lists and tables that 377 have been, and continue to be, developed to advise on what might be 378 sensible for particular scripts and languages. These include ICANN's 379 twin efforts of creating per-script Root Zone Label Generation Rules 380 [RZ-LGR-3] and Second Level Reference Label Generation Rules 382 [SL-REF-LGR] (the latter of which may be per language). They also 383 include other lists of code points or code point relationships that 384 may be particularly problematic and that should be treated with extra 385 caution or prohibited entirely such as the proposed "troublesome 386 character" list [Freytag-troublesome]. See also Section 6 below. 388 5. Other corrections and updates 390 After the initial IDNA2008 documents were published (and RFC 5892 was 391 updated for Unicode 6.0 by RFC 6452 [RFC6452]) several errors or 392 instances of confusing text were noted. For the convenience of the 393 community, the relevant corrections for RFC 5890 and 5891 are noted 394 below and update the corresponding documents. There are no errata 395 for RFC 5893 or 5894 as of the date this document was published. 396 Because further updates to RFC 5892 would require addressing other 397 pending issues, the outstanding erratum for that document is not 398 considered here. For consistency with the original documents, 399 references to Unicode 5.0 are preserved in this document. 401 Readers should note that an update to RFC 5892 that is primarily 402 concerned with the review process for new versions of Unicode but 403 that makes some additional patches 404 [ID.draft-klensin-idna-unicode-review] is in progress. Its status 405 should be checked in conjunction with application of the present 406 specification. 408 5.1. Updates to RFC 5890 410 The outstanding errata against RFC 5890 (Errata ID 4695, 4696, 4823, 411 and 4824 [RFC-Editor-5890Errata]) are all associated with the same 412 issue, the number of Unicode characters that can be associated with a 413 maximum-length (63 octet) A-label. In retrospect and contrary to 414 some of the suggestions in the errata, that value should not be 415 expressed in octets because RFC 5890 and the other IDNA 2008 416 documents are otherwise careful to not specify Unicode encoding forms 417 but, instead, work exclusively with Unicode code points. 418 Consequently the relevant material in RFC 5890 should be corrected as 419 follows: 421 Section 2.3.2.1 423 Old: expansion of the A-label form to a U-label may produce 424 strings that are much longer than the normal 63 octet DNS limit 425 (potentially up to 252 characters). 427 New: expansion of the A-label form to a U-label may produce 428 strings that are much longer than the normal 63 octet DNS limit 429 (See Section 4.2). 431 Comment: If the length limit is going to be a source of confusion 432 or careful calculations, it should appear in only one place. 434 Section 4.2 436 Old: Because A-labels (the form actually used in the DNS) are 437 potentially much more compressed than UTF-8 (and UTF-8 is, in 438 general, more compressed that UTF-16 or UTF-32), U-labels that 439 obey all of the relevant symmetry (and other) constraints of 440 these documents may be quite a bit longer, potentially up to 441 252 characters (Unicode code points). 443 New: A-labels (the form actually used in the DNS) and the 444 Punycode algorithm used as part of the process to produce them 445 [RFC3492] are strings that are potentially much more compressed 446 than any standard Unicode Encoding Form. A 63 octet A-label 447 cannot represent more than 58 Unicode code points (four octet 448 overhead and the requirement that at least one character lie 449 outside the ASCII range) but implementations allocating buffer 450 space for the conversion should allow significantly more space 451 depending on the encoding form they are using. 453 5.2. Updates to RFC 5891 455 Errata ID 3969: Improve reference for combining marks. There is only 456 one erratum for RFC 5891, Errata ID 3969 [RFC5891Erratum]. 457 Combining marks are explained in the cited section, but not, as 458 the text indicates, exactly defined. 460 Old: The Unicode string MUST NOT begin with a combining mark or 461 combining character (see The Unicode Standard, Section 2.11 462 [UnicodeA] for an exact definition). 464 New: The Unicode string MUST NOT begin with a combining mark or 465 combining character (see The Unicode Standard, Section 2.11 466 [Unicode] for an explanation and Section 3.6, definition D52) 467 for an exact definition). 469 Comment: When RFC 5891 is actually updated, the references in the 470 text should be updated to the current version of Unicode and 471 the section numbers checked. 473 6. Related Discussions 475 This document is one of a series of measures that have been suggested 476 to address IDNA issues raised in other documents, including 477 mechanisms for dealing with combining sequences and single-code point 478 characters with the same appearance that normalization neither 479 combines nor decomposes as IDNA2008 assumed [IDNA-Unicode], including 480 the IAB response to that issue [IAB-2015], and to take a higher-level 481 view of issues, demands, and proposals for new uses of the DNS. 482 Those documents also include a discussion of issues with IDNA and 483 character graphemes for which abstractions exist in Unicode in 484 precomposed form but that can be generated from combining sequences 485 and a suggested registry of code points known to be problematic 486 [Freytag-troublesome]. The discussion of combining sequences and 487 non-decomposing characters is intended to lay the foundation for an 488 actual update to the IDNA code points document [RFC5892]. Such an 489 update will presumably also address the existing errata against that 490 document. 492 7. Security Considerations 494 As discussed in IAB recommendations about internationalized domain 495 names [RFC4690], [RFC6912], and elsewhere, poor choices of strings 496 for DNS labels can lead to opportunities for attacks, user confusion, 497 and other issues less directly related to security. This document 498 clarifies the importance of registries carefully establishing design 499 policies for the labels they will allow and that having such policies 500 and taking responsibility for them is a requirement, not an option. 501 If that clarification is useful in practice, the result should be an 502 improvement in security. 504 8. Acknowledgments 506 Many thanks to Patrik Faltstrom who provided an important review on 507 the initial version. 509 9. IANA Considerations 511 [[CREF1: RFC Editor: Please remove this section before publication.]] 513 This memo includes no requests to or actions for IANA. In 514 particular, it does not contain any provisions that would alter any 515 IDNA-related registries or tables. 517 10. References 519 10.1. Normative References 521 [ICANN-LGR3] 522 ICANN, "Root Zone Label Generation Rules (LGR-1)", July 523 2019, 524 . 526 [ICANN-MSR4] 527 ICANN, "Maximal Starting Repertoire Version 4 (MSR-4) for 528 the Development of Label Generation Rules for the Root 529 Zone", January 2019, 530 . 532 [RFC1591] Postel, J., "Domain Name System Structure and Delegation", 533 RFC 1591, DOI 10.17487/RFC1591, March 1994, 534 . 536 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 537 Requirement Levels", BCP 14, RFC 2119, 538 DOI 10.17487/RFC2119, March 1997, 539 . 541 [RFC5890] Klensin, J., "Internationalized Domain Names for 542 Applications (IDNA): Definitions and Document Framework", 543 RFC 5890, DOI 10.17487/RFC5890, August 2010, 544 . 546 [RFC5891] Klensin, J., "Internationalized Domain Names in 547 Applications (IDNA): Protocol", RFC 5891, 548 DOI 10.17487/RFC5891, August 2010, 549 . 551 [RFC5891Erratum] 552 "RFC 5891, "Internationalized Domain Names in Applications 553 (IDNA): Protocol"", Errata ID 3969, April 2014, 554 . 556 [RFC5894] Klensin, J., "Internationalized Domain Names for 557 Applications (IDNA): Background, Explanation, and 558 Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, 559 . 561 10.2. Informative References 563 [Freytag-troublesome] 564 Freytag, A., Klensin, J., and A. Sullivan, "Those 565 Troublesome Characters: A Registry of Unicode Code Points 566 Needing Special Consideration When Used in Network 567 Identifiers", June 2017, . 570 [IAB-2015] 571 Internet Architecture Board (IAB), "IAB Statement on 572 Identifiers and Unicode 7.0.0", February 2015, 573 . 577 [ID.draft-klensin-idna-unicode-review] 578 Klensin, J. and P. Faltstrom, "IDNA Review for New Unicode 579 Versions", June 2019, . 582 [IDNA-Unicode] 583 Klensin, J. and P. Falstrom, "IDNA Update for Unicode 584 7.0.0", September 2017, . 587 [LGR-Procedure] 588 Internet Corporation for Assigned Names and Numbers 589 (ICANN), "Procedure to Develop and Maintain the Label 590 Generation Rules for the Root Zone in Respect of IDNA 591 Labels", March 2013, 592 . 595 [RFC-Editor-5890Errata] 596 RFC Editor, "RFC Errata: RFC 5890, "Internationalized 597 Domain Names for Applications (IDNA): Definitions and 598 Document Framework", August 2010", Note to RFC 599 Editor: Please figure out how you would like this 600 referenced and make it so., Captured 2017-09-10, 2016, 601 . 603 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 604 for Internationalized Domain Names in Applications 605 (IDNA)", RFC 3492, DOI 10.17487/RFC3492, March 2003, 606 . 608 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 609 Recommendations for Internationalized Domain Names 610 (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006, 611 . 613 [RFC4713] Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin, 614 "Registration and Administration Recommendations for 615 Chinese Domain Names", RFC 4713, DOI 10.17487/RFC4713, 616 October 2006, . 618 [RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman, 619 "Linguistic Guidelines for the Use of the Arabic Language 620 in Internet Domains", RFC 5564, DOI 10.17487/RFC5564, 621 February 2010, . 623 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 624 Internationalized Domain Names for Applications (IDNA)", 625 RFC 5892, DOI 10.17487/RFC5892, August 2010, 626 . 628 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts 629 for Internationalized Domain Names for Applications 630 (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, 631 . 633 [RFC5992] Sharikov, S., Miloshevic, D., and J. Klensin, 634 "Internationalized Domain Names Registration and 635 Administration Guidelines for European Languages Using 636 Cyrillic", RFC 5992, DOI 10.17487/RFC5992, October 2010, 637 . 639 [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code 640 Points and Internationalized Domain Names for Applications 641 (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, 642 November 2011, . 644 [RFC6912] Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman, 645 "Principles for Unicode Code Point Inclusion in Labels in 646 the DNS", RFC 6912, DOI 10.17487/RFC6912, April 2013, 647 . 649 [RZ-LGR-3] 650 Internet Corporation for Assigned Names and Numbers, "Root 651 Zone Label Generation Rules - LGR-3: Overview and Summary, 652 Version 3", July 2019, 653 . 656 [SL-REF-LGR] 657 Internet Corporation for Assigned Names and Numbers 658 (ICANN), "Second Level Label Generation Rules", 2019, 659 . 662 [UnicodeA] 663 The Unicode Consortium, "The Unicode Standard, Version 664 12.1", May 2019. 666 Section 2.11 668 Appendix A. Change Log 670 RFC Editor: Please remove this appendix before publication. 672 A.1. Changes from version -00 (2017-03-11) to -01 674 o Added Acknowledgments and adjusted references. 676 o Filled in Section 5 with updates to respond to errata. 678 o Added Section 6 to discuss relationships to other documents. 680 o Modified the Abstract to note specifically updated documents. 682 o Several small editorial changes and corrections. 684 A.2. Changes from version -01 (2017-09-12) to -02 686 After a pause of nearly 34 months due to inability to get this draft 687 processed, including nearly a year waiting for a new directorate to 688 actually do anything of substance about fundamental IDNA issues, the 689 -02 version was posted in the hope of getting a new start. Specific 690 changes include: 692 o Added a new section, Section 4, and some introductory material to 693 address the very practical issue that domains run on a for-profit 694 basis are unlikely to follow the very strict "understand what you 695 are registering" requirement if they support IDNs at all and 696 expect to profit from them. 698 o Added a pointer to draft-klensin-idna-unicode-review to the 699 discussion of other work. 701 o Editorial corrections and changes. 703 A.3. Changes from version -02 (2019-07-06) to -03 705 o Minor editorial changes in response to shepherd review. 707 o Additional references. 709 A.4. Changes from version -03 (2019-07-22) to -04 711 o Editorial changes after AD review and some additional changes to 712 improve clarity. 714 Authors' Addresses 716 John C Klensin 717 1770 Massachusetts Ave, Ste 322 718 Cambridge, MA 02140 719 USA 721 Phone: +1 617 245 1457 722 Email: john-ietf@jck.com 724 Asmus Freytag 725 ASMUS, Inc. 727 Email: asmus@unicode.org