idnits 2.17.1 draft-klensin-idna-rfc5891bis-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC5894, but the abstract doesn't seem to directly say this. It does mention RFC5894 though, so this could be OK. -- The draft header indicates that this document updates RFC5890, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC5891, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5890, updated by this document, for RFC5378 checks: 2008-10-14) (Using the creation date from RFC5891, updated by this document, for RFC5378 checks: 2008-05-22) (Using the creation date from RFC5894, updated by this document, for RFC5378 checks: 2008-05-13) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 13, 2020) is 1382 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-LGR3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-MSR4' ** Downref: Normative reference to an Informational RFC: RFC 1591 -- Duplicate reference: RFC5891, mentioned in 'RFC5891Erratum', was also mentioned in 'RFC5891'. ** Downref: Normative reference to an Informational RFC: RFC 5894 ** Downref: Normative reference to an Informational RFC: RFC 6912 -- Duplicate reference: RFC5890, mentioned in 'RFC-Editor-5890Errata', was also mentioned in 'RFC5890'. Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft 4 Updates: 5890, 5891, 5894 (if approved) A. Freytag 5 Intended status: Standards Track ASMUS, Inc. 6 Expires: January 14, 2021 July 13, 2020 8 Internationalized Domain Names in Applications (IDNA): Registry 9 Restrictions and Recommendations 10 draft-klensin-idna-rfc5891bis-06 12 Abstract 14 The IDNA specifications for internationalized domain names combine 15 rules that determine the labels that are allowed in the DNS without 16 violating the protocol itself and an assignment of responsibility, 17 consistent with earlier specifications, for determining the labels 18 that are allowed in particular zones. Conformance to IDNA by 19 registries and other implementations requires both parts. Experience 20 strongly suggests that the language describing those responsibilities 21 was insufficiently clear to promote safe and interoperable use of the 22 specifications and that more details and discussion of circumstances 23 would have been helpful. Without making any substantive changes to 24 IDNA, this specification updates two of the core IDNA documents (RFCs 25 5890 and 5891) and the IDNA explanatory document (RFC 5894) to 26 provide that guidance and to correct some technical errors in the 27 descriptions. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on January 14, 2021. 46 Copyright Notice 48 Copyright (c) 2020 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 64 2. Registry Restrictions in IDNA2008 . . . . . . . . . . . . . . 4 65 3. Progressive Subsets of Allowed Characters . . . . . . . . . . 5 66 4. Considerations for Domains Operated Primarily for the 67 Financial Benefit of the Registry Owner or Operator 68 Organization . . . . . . . . . . . . . . . . . . . . . . . . 7 69 5. Other corrections and updates . . . . . . . . . . . . . . . . 9 70 5.1. Updates to RFC 5890 . . . . . . . . . . . . . . . . . . . 9 71 5.2. Updates to RFC 5891 . . . . . . . . . . . . . . . . . . . 10 72 6. Related Discussions . . . . . . . . . . . . . . . . . . . . . 11 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 74 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 75 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 76 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 77 10.1. Normative References . . . . . . . . . . . . . . . . . . 12 78 10.2. Informative References . . . . . . . . . . . . . . . . . 13 79 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 15 80 A.1. Changes from version -00 (2017-03-11) to -01 . . . . . . 15 81 A.2. Changes from version -01 (2017-09-12) to -02 . . . . . . 15 82 A.3. Changes from version -02 (2019-07-06) to -03 . . . . . . 16 83 A.4. Changes from version -03 (2019-07-22) to -04 . . . . . . 16 84 A.5. Changes from version -04 (2019-08-02) to -05 . . . . . . 16 85 A.6. Changes from version -05 (2019-08-29) to -06 . . . . . . 16 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 88 1. Introduction 90 Parts of the specifications for Internationalized Domain Names in 91 Applications (IDNA) [RFC5890] [RFC5891] [RFC5894] (collectively 92 known, along with RFC 5892 [RFC5892], RFC 5893 [RFC5893] and updates 93 to them, as "IDNA2008" (or just "IDNA") impose a requirement that 94 domain name system (DNS) registries restrict the characters they 95 allow in domain name labels (see Section 2 below), and the contents 96 and structure of those labels. That requirement and restriction are 97 consistent with the "duty to serve the community" described in the 98 original specification for DNS naming and authority [RFC1591]. The 99 restrictions are intended to limit the permitted characters and 100 strings to those for which the registries or their advisers have a 101 thorough understanding and for which they are willing to take 102 responsibility. 104 That provision is centrally important because it recognized that 105 historical relationships and variations among scripts and writing 106 systems, the continuing evolution of those systems, differences in 107 the uses of characters among languages (and locations) that use the 108 same script, and so on make it impossible for a single list of 109 characters and simple rules to be able to generate an "if we use 110 these, we will be safe from confusion and various attacks" guideline. 112 Instead, the algorithm and rules of RFCs 5891 and 5892 eliminate many 113 of the most dangerous and otherwise problematic cases, but cannot 114 eliminate the need for registries and registrars to understand what 115 they are doing and taking responsibility for the decisions they make. 117 The way in which the IDNA2008 specifications expressed these 118 requirements may have under emphasized the intention that they 119 actually are requirements. Section 2.3.2.3 of the Definitions 120 document [RFC5890] mentions the need for the restrictions, indicates 121 that they are mandatory, and points the reader to section 4.3 of the 122 Protocol document [RFC5891], which in turn points to Section 3.2 of 123 the Rationale document [RFC5894], with each document providing 124 further detail, discussion, and clarification. 126 At the same time, the Internet has evolved significantly since the 127 management assumptions for the DNS were established with RFC 1591 and 128 earlier. In particular, the management and use of domain names have 129 gone through several transformations. Recounting of those changes is 130 beyond the scope of this document but one of them has had significant 131 practical impact on the degree to which the requirement for registry 132 knowledge and responsibility is observed in practice. When RFC 1591 133 was written, the assumption was that domains at all levels of the DNS 134 would be operated in the best interest of the registrants in the 135 domain and of the Internet as a whole. There were no notions about 136 domains being operated for a profit, much less with a business model 137 that made them more profitable the more names that could be 138 registered (or even, under some circumstances, reserved and not 139 registered). At the time RFC 1591 was written, there was also no 140 notion that domains would be considered more successful based on the 141 number of names registered and delegated from them. While rarely 142 reflected in the DNS protocols, the distinction between domains 143 operated primarily as a revenue source of the organizations operating 144 the registry and ones that are operated for, e.g., use within an 145 enterprise or otherwise as a service have become very important 146 today. See Section 4 for a discussion on how those issues affect 147 this specification. 149 This specification is intended to unify and clarify these 150 requirements for registry decisions and responsibility and to 151 emphasize the importance of registry restrictions at all levels of 152 the DNS. It also makes a specific recommendation for character 153 repertoire subsetting that is intermediate between the code points 154 allowed by RFCs 5891 and 5892 and those allowed by individual 155 registries. It does not alter the basic IDNA2008 protocols and rules 156 themselves in any way. 158 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 159 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 160 document are to be interpreted as described in RFC 2119 [RFC2119]. 162 2. Registry Restrictions in IDNA2008 164 As mentioned above, IDNA2008 specifies that the registries for each 165 zone in the DNS that supports IDN labels are required to develop and 166 apply their own rules to restrict the allowable labels, including 167 limiting characters they allow to be used in labels in that zone. 168 The chosen list MUST be a subset of the collection of code points 169 specified as "PVALID", "CONTEXTJ", and "CONTEXTO" by the rules 170 established by the protocols themselves. Labels containing any 171 characters from the two CONTEXT categories or any characters that are 172 normally part of a script written right to left [RFC5893] require 173 that additional rules, specified in the protocols and known as 174 "contextual rules" and "bidi rules", be applied. The entire 175 collection of rules and restrictions required by the IDNA2008 176 protocols themselves are known as "protocol restrictions". 178 As mentioned above, registries may apply (and generally are required 179 to apply) additional rules to further restrict the list of permitted 180 code points, contextual rules (perhaps applied to normally PVALID 181 code points) that apply additional restrictions, and/or restrictions 182 on labels as distinct from code points. The most obvious of those 183 restrictions include provisions for restricting suggested new 184 registrations based on conflicts with labels already registered in 185 the zone, so as to avoid homograph attacks [Gabrilovich2002] and 186 other issues. The specifications of what constitutes such conflicts, 187 as well as the definition of "conflict" based on the properties of 188 the labels in question, is the responsibility of each registry. They 189 further include prohibitions on code points and labels that are not 190 consistent with the intended function of the zone, the subtree in 191 which the zone is embedded (see Section 3), or limitations on where 192 allowable code points may be placed in a label. 194 These per-registry (or per-zone) rules are commonly known as 195 "registry restrictions" to distinguish them from the protocol 196 restrictions described above. By necessity, protocol restrictions 197 are somewhat generic, having to cater both to the union of the needs 198 for all zones as well as to the desires of the most permissive zones. 199 In consequence, additional registry restrictions are essential to 200 provide for the necessary security in the face of the tremendous 201 variations and differences in writing systems and their ongoing 202 evolution and development, as well as the human ability to recognize 203 and distinguish characters in different scripts around the world and 204 under different circumstances. 206 3. Progressive Subsets of Allowed Characters 208 The algorithm and rules of RFCs 5891 and 5892 determine the set of 209 code points that are possible for inclusion in domain name labels; 210 registries MUST NOT permit code points in labels unless they are part 211 of that set. Labels that contain code points that are normally 212 written from right to left MUST also conform to the requirements of 213 RFC 5893. Each registry that intends to allow IDN registrations MUST 214 then determine the strict subset of that set of code points that will 215 be allowed by that registry. It SHOULD also consider additional 216 rules, including contextual and whole label restrictions that provide 217 further protection for registrants and users. For example, the 218 widely-used principle that bars labels containing characters from 219 more than one script is not an IDNA2008 requirement. It has been 220 adopted by many registries but there may be circumstances in which is 221 it not required or appropriate. 223 In formulating their own rules, registries should normally consult 224 carefully-developed consensus recommendations about global maximum 225 repertoires to be used such as the ICANN Maximal Starting Repertoire 226 4 (MSR-4) for the Development of Label Generation Rules for the Root 227 Zone [ICANN-MSR4] (or its successor documents). Additional 228 recommendations of similar quality about particular scripts or 229 languages exist, including, but not limited to, the RFCs for Cyrillic 230 [RFC5992], Arabic Language [RFC5564], or script-based repertoires 231 from the approved ICANN Root Zone Label Generation Rules (LGR-3) 232 [ICANN-LGR3] (or its successor documents). Many of these 233 recommendations also cover rules about relationships among code 234 points that may be particularly important for complex scripts. They 235 also interact with recommendations about how labels that appear to be 236 the same should be handled. 238 It is the responsibility of the registry to determine which, if any, 239 of those recommendations are applicable and to further subset or 240 extend them as needed. For example, several of the recommendations 241 are designed for the root zone and therefore exclude digits and 242 U+002D HYPHEN-MINUS; this restriction is not generally appropriate 243 for other zones. On the other hand, some zones may be designed to 244 not cater for all users of a given script, but perhaps only for the 245 needs of selected languages, in which case a more selective 246 repertoire may be appropriate. 248 In making these determinations, a registry SHOULD follow the IAB 249 guidance in RFC 6912 [RFC6912]. Those guidelines include a number of 250 principles for use in making decisions about allowable code points. 251 In addition, that document notes that the closer a particular zone is 252 to the root, the more restrictive the space of permitted labels 253 should be. RFC 5894 provides some suggestions for any registry that 254 may decide to reduce opportunities for confusion or attacks by 255 constructing policies that disallow characters used in historic 256 writing systems (whether these be archaic scripts or extensions of 257 modern scripts for historic or obsolete orthographies) or characters 258 whose use is restricted to specialized, or highly technical contexts. 259 These suggestions were among the principles guiding the design of 260 ICANN's Maximal Starting Repertoires (MSR) [LGR-Procedure]. 262 A registry decision to allow only those code points in the full 263 repertoire of the MSR (plus digits and hyphen) would already avoid a 264 number of issues inherent in a more permissive policy such as "use 265 anything permitted by IDNA2008", while still supporting the native 266 languages and scripts for the vast majority of users today. However, 267 it is unlikely, by itself, to fully satisfy the mandate set out above 268 for three reasons. 270 1. The MSR, like the set of code points permissible under IDNA2008 271 itself, was conceived merely as a boundary condition on 272 permissible letter code points (it excludes digits and the 273 hyphen). It was always intended to be used as a starting point 274 for setting registry policy, with the expectation that some of 275 the code points in the MSR would not be included in the final 276 registry policy, whether for lack of actual usage, or for being 277 inherently problematic. 279 2. It was recognized that many scripts require contextual rules for 280 many more code points than are covered by CONTEXTO or CONTEXTJ 281 rules defined in IDNA2008. This is particularly true for 282 combining marks, typically used to encode diacritics, tone marks, 283 vowel signs and the like. While, theoretically, any combining 284 mark may occur in any context in Unicode, in practice rendering 285 and other software that users rely on in viewing or entering 286 labels will not support arbitrary combining sequences, or indeed 287 arbitrary combinations of code points, in the case of complex 288 scripts. 290 Contextual rules are needed in order to limit allowable code 291 point sequences to those that can be expected to be rendered 292 reliably. Identifying those requires knowledge about the way 293 code points are used in a script, whence the mandate for 294 registries to only support code points they understand. In this, 295 some of the other recommendations, such as the Informational RFCs 296 for specific scripts (e.g., Cyrillic [RFC5992]) or languages 297 (e.g., Arabic [RFC5564] or Chinese [RFC4713]), or the Root Zone 298 LGRs developed by ICANN, may provide useful guidance. 300 3. Third, because of the widely accepted practice of limiting any 301 given label to a single script, a universal repertoire, such as 302 the MSR, would have to be divided on a per-script basis into 303 subrepertoires to make it useful, with some of those repertoires 304 overlapping, for example, in the case of East Asian shared usage 305 of the Han ideographs. 307 Registries choosing to make exceptions -- allow code points that 308 recommendations such as the MSR do not allow -- should make such 309 decisions only with great care and only if they have considerable 310 understanding of, and great confidence in, their appropriateness. 311 The obvious exception from the MSR would be to allow digits and the 312 hyphen. Neither were allowed by the MSR, but only because they are 313 not allowed in the Root Zone. 315 Nothing in this document permits a registry to allow code points or 316 labels that are disallowed or otherwise prohibited by IDNA2008. 318 4. Considerations for Domains Operated Primarily for the Financial 319 Benefit of the Registry Owner or Operator Organization 321 As discussed in the Introduction (Section 1), the distributed 322 administrative structure of the DNS today can be described by 323 dividing zones into two categories depending on how they are 324 administered and for whom. These categories are not precise -- some 325 zones may not fall neatly into one category or the other -- but are 326 useful in understanding the practical applicability of this 327 specification. They are: 329 Zones operating primarily or exclusively within a country, 330 organization, or enterprise and responsible to the Internet users 331 in that country or the management of the organization or 332 enterprise. DNS operations, including registrations and 333 delegations, will typically occur in support of the purpose of 334 that country, organization or enterprise rather than being its 335 primary purpose. 337 Zones operating primarily as all or part of a business of selling 338 names for the financial benefit of entities responsible for the 339 registry. For these domains, most delegations of subdomains are 340 to entities with little or no affiliation with the registry 341 operator other than contractual agreements about operation of 342 those subdomains. These zones are often known as "public domains" 343 or with similar terms, but those terms often have other semantics 344 and may not cover all cases. In particular, a country code domain 345 operated primarily in the interest of registrants and Internet 346 users and in service to the broader Internet community is often 347 considered a "public domain" but would fall into the first 348 category, not the second. 350 Rules requiring strict registry responsibility, including either 351 thorough understanding of scripts and related issues in domain name 352 labels being considered for registration or local naming rules that 353 have the same effect, typically come naturally to registries for 354 zones of the first type. Registration of labels that would prove 355 problematic for any reason hurts the relevant organization or 356 enterprise or its customers or users within the relevant country and 357 more broadly. More generally, there are strong incentives to be 358 extremely conservative about labels that might be registered and few, 359 if any, incentives favoring adventures into labels that might be 360 considered clever, much less ones that are hard to type, render, or, 361 where it is relevant to users, remember correctly. 363 By contrast, in a zone in which the profits are derived exclusively, 364 or almost exclusively, from selling or reserving (including 365 "blocking") names, there may be perceived incentives to register 366 whatever names would-be registrants "want" or fears that any 367 restrictions will cut into the available namespace. In such 368 situations, restrictions are unlikely to be applied unless they meet 369 at least one of two criteria: (i) they are easy to apply and can be 370 applied algorithmically or otherwise automatically and/or (ii) there 371 is clear evidence that the particular label would cause harm. 373 As suggested above, the two categories above are not precise. In 374 particular, there may be domains that, despite being set up to 375 operate to produce revenue about actual costs, are sufficiently 376 conservative about their operations to more closely resemble the 377 first group in practice than the second one. 379 The requirement of IDNA that is discussed at length elsewhere in this 380 specification stands: IDNA (and IDNs generally) would work better and 381 Internet users would be better protected and more secure if 382 registries and registrars (of any type) confined their registrations 383 to scripts and code point sequences that they understood thoroughly. 384 While the IETF rarely gives advice to those who choose to violate 385 IETF Standards, some advice to zones in the second category above may 386 be in order. That advice is that significant conservatism in what is 387 allowed to be registered, even for reservation purposes, and even 388 more conservatism about what labels are actually entered into zones 389 and delegated, is the best option for the Internet and its users. If 390 practical considerations do not allow that much conservatism, then it 391 is desirable to consult and utilize the many lists and tables that 392 have been, and continue to be, developed to advise on what might be 393 sensible for particular scripts and languages. These include ICANN's 394 twin efforts of creating per-script Root Zone Label Generation Rules 395 [RZ-LGR-3] and Second Level Reference Label Generation Rules 396 [SL-REF-LGR] (the latter of which may be per language). They also 397 include other lists of code points or code point relationships that 398 may be particularly problematic and that should be treated with extra 399 caution or prohibited entirely such as the proposed "troublesome 400 character" list [Freytag-troublesome]. See also Section 6 below. 402 5. Other corrections and updates 404 After the initial IDNA2008 documents were published (and RFC 5892 was 405 updated for Unicode 6.0 by RFC 6452 [RFC6452]) several errors or 406 instances of confusing text were noted. For the convenience of the 407 community, the relevant corrections for RFCs 5890 and 5891 are noted 408 below and update the corresponding documents. There are no errata 409 for RFC 5893 or 5894 as of the date this document was published. 410 Because further updates to RFC 5892 would require addressing other 411 pending issues, the outstanding erratum for that document is not 412 considered here. For consistency with the original documents, 413 references to Unicode 5.0 are preserved in this document. 415 5.1. Updates to RFC 5890 417 The outstanding errata against RFC 5890 (Errata ID 4695, 4696, 4823, 418 and 4824 [RFC-Editor-5890Errata]) are all associated with the same 419 issue, the number of Unicode characters that can be associated with a 420 maximum-length (63 octet) A-label. In retrospect and contrary to 421 some of the suggestions in the errata, that value should not be 422 expressed in octets because RFC 5890 and the other IDNA 2008 423 documents are otherwise careful to not specify Unicode encoding forms 424 but, instead, work exclusively with Unicode code points. 425 Consequently the relevant material in RFC 5890 should be corrected as 426 follows: 428 Section 2.3.2.1 429 Old: expansion of the A-label form to a U-label may produce 430 strings that are much longer than the normal 63 octet DNS limit 431 (potentially up to 252 characters). 433 New: expansion of the A-label form to a U-label may produce 434 strings that are much longer than the normal 63 octet DNS limit 435 (See Section 4.2). 437 Comment: If the length limit is going to be a source of confusion 438 or careful calculations, it should appear in only one place. 440 Section 4.2 442 Old: Because A-labels (the form actually used in the DNS) are 443 potentially much more compressed than UTF-8 (and UTF-8 is, in 444 general, more compressed that UTF-16 or UTF-32), U-labels that 445 obey all of the relevant symmetry (and other) constraints of 446 these documents may be quite a bit longer, potentially up to 447 252 characters (Unicode code points). 449 New: A-labels (the form actually used in the DNS) and the 450 Punycode algorithm used as part of the process to produce them 451 [RFC3492] are strings that are potentially much more compressed 452 than any standard Unicode Encoding Form. A 63 octet A-label 453 cannot represent more than 58 Unicode code points (four octet 454 overhead and the requirement that at least one character lie 455 outside the ASCII range) but implementations allocating buffer 456 space for the conversion should allow significantly more space 457 (i.e., extra octets) depending on the encoding form they are 458 using. 460 5.2. Updates to RFC 5891 462 Errata ID 3969: Improve reference for combining marks. There is only 463 one erratum for RFC 5891, Errata ID 3969 [RFC5891Erratum]. 464 Combining marks are explained in the cited section, but not, as 465 the text indicates, exactly defined. 467 Old: The Unicode string MUST NOT begin with a combining mark or 468 combining character (see The Unicode Standard, Section 2.11 469 [UnicodeA] for an exact definition). 471 New: The Unicode string MUST NOT begin with a combining mark or 472 combining character (see The Unicode Standard, Section 2.11 473 [UnicodeA] for an explanation and Section 3.6, definition D52 474 [UnicodeB]) for an exact definition). 476 Comment: When RFC 5891 is actually updated, the references in the 477 text should be updated to the current version of Unicode and 478 the section numbers checked. 480 6. Related Discussions 482 This document is one of a series of measures that have been suggested 483 to address IDNA issues raised in other documents and discussions. 484 Those other discussions and associated documents include suggested 485 mechanisms for dealing with combining sequences and single-code point 486 characters with the same appearance, ones that normalization neither 487 combines nor decomposes as IDNA2008 assumed. That topic was 488 discussed further in [IDNA-Unicode] and in the IAB response to that 489 issue [IAB-2015]. Those and other documents also discuss issues with 490 IDNA and character graphemes for which abstractions exist in Unicode 491 in precomposed form but that can be generated from combining 492 sequences. Another approach is a suggested registry of code points 493 known to be problematic [Freytag-troublesome]. In combination, the 494 various discussions of combining sequences and non-decomposing 495 characters may lay the foundation for an actual update to the IDNA 496 code points document [RFC5892]. Such an update would presumably also 497 address the existing errata against that document. 499 At a much higher-level, discussions are ongoing to consider issues, 500 demands, and proposals for new uses of the DNS. 502 7. Security Considerations 504 As discussed in IAB recommendations about internationalized domain 505 names [RFC4690], [RFC6912], and elsewhere, poor choices of strings 506 for DNS labels can lead to opportunities for attacks, user confusion, 507 and other issues less directly related to security. This document 508 clarifies the importance of registries carefully establishing design 509 policies for the labels they will allow and that having such policies 510 and taking responsibility for them is a requirement, not an option. 511 If that clarification is useful in practice, the result should be an 512 improvement in security. 514 8. Acknowledgments 516 Many thanks to Patrik Faltstrom who provided an important review on 517 the initial version, to Jaap Akkerhuis, Don Eastlake, Barry Leiba, 518 and Alessandro Vesely who did reviews that improved the text and to 519 Pete Resnick who acted as document shepherd and did an additional 520 careful review. 522 9. IANA Considerations 524 [[CREF1: RFC Editor: Please remove this section before publication.]] 526 This memo includes no requests to or actions for IANA. In 527 particular, it does not contain any provisions that would alter any 528 IDNA-related registries or tables. 530 10. References 532 10.1. Normative References 534 [ICANN-LGR3] 535 ICANN, "Root Zone Label Generation Rules (LGR-1)", July 536 2019, 537 . 539 [ICANN-MSR4] 540 ICANN, "Maximal Starting Repertoire Version 4 (MSR-4) for 541 the Development of Label Generation Rules for the Root 542 Zone", January 2019, 543 . 545 [RFC1591] Postel, J., "Domain Name System Structure and Delegation", 546 RFC 1591, DOI 10.17487/RFC1591, March 1994, 547 . 549 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 550 Requirement Levels", BCP 14, RFC 2119, 551 DOI 10.17487/RFC2119, March 1997, 552 . 554 [RFC5890] Klensin, J., "Internationalized Domain Names for 555 Applications (IDNA): Definitions and Document Framework", 556 RFC 5890, DOI 10.17487/RFC5890, August 2010, 557 . 559 [RFC5891] Klensin, J., "Internationalized Domain Names in 560 Applications (IDNA): Protocol", RFC 5891, 561 DOI 10.17487/RFC5891, August 2010, 562 . 564 [RFC5891Erratum] 565 "RFC 5891, "Internationalized Domain Names in Applications 566 (IDNA): Protocol"", Errata ID 3969, April 2014, 567 . 569 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts 570 for Internationalized Domain Names for Applications 571 (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, 572 . 574 [RFC5894] Klensin, J., "Internationalized Domain Names for 575 Applications (IDNA): Background, Explanation, and 576 Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, 577 . 579 [RFC6912] Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman, 580 "Principles for Unicode Code Point Inclusion in Labels in 581 the DNS", RFC 6912, DOI 10.17487/RFC6912, April 2013, 582 . 584 10.2. Informative References 586 [Freytag-troublesome] 587 Freytag, A., Klensin, J., and A. Sullivan, "Those 588 Troublesome Characters: A Registry of Unicode Code Points 589 Needing Special Consideration When Used in Network 590 Identifiers", June 2017, . 593 [Gabrilovich2002] 594 Gabrilovich, E. and A. Gontmakher, "The Homograph Attack", 595 Communications of the ACM 45(2):128, February 2002. 597 [IAB-2015] 598 Internet Architecture Board (IAB), "IAB Statement on 599 Identifiers and Unicode 7.0.0", February 2015, 600 . 604 [IDNA-Unicode] 605 Klensin, J. and P. Faltstrom, "IDNA Update for Unicode 606 7.0.0", September 2017, . 609 [LGR-Procedure] 610 Internet Corporation for Assigned Names and Numbers 611 (ICANN), "Procedure to Develop and Maintain the Label 612 Generation Rules for the Root Zone in Respect of IDNA 613 Labels", March 2013, 614 . 617 [RFC-Editor-5890Errata] 618 RFC Editor, "RFC Errata: RFC 5890, "Internationalized 619 Domain Names for Applications (IDNA): Definitions and 620 Document Framework", August 2010", Note to RFC 621 Editor: Please figure out how you would like this 622 referenced and make it so., Captured 2017-09-10, 2016, 623 . 625 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 626 for Internationalized Domain Names in Applications 627 (IDNA)", RFC 3492, DOI 10.17487/RFC3492, March 2003, 628 . 630 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 631 Recommendations for Internationalized Domain Names 632 (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006, 633 . 635 [RFC4713] Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin, 636 "Registration and Administration Recommendations for 637 Chinese Domain Names", RFC 4713, DOI 10.17487/RFC4713, 638 October 2006, . 640 [RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman, 641 "Linguistic Guidelines for the Use of the Arabic Language 642 in Internet Domains", RFC 5564, DOI 10.17487/RFC5564, 643 February 2010, . 645 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 646 Internationalized Domain Names for Applications (IDNA)", 647 RFC 5892, DOI 10.17487/RFC5892, August 2010, 648 . 650 [RFC5992] Sharikov, S., Miloshevic, D., and J. Klensin, 651 "Internationalized Domain Names Registration and 652 Administration Guidelines for European Languages Using 653 Cyrillic", RFC 5992, DOI 10.17487/RFC5992, October 2010, 654 . 656 [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code 657 Points and Internationalized Domain Names for Applications 658 (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, 659 November 2011, . 661 [RZ-LGR-3] 662 Internet Corporation for Assigned Names and Numbers, "Root 663 Zone Label Generation Rules - LGR-3: Overview and Summary, 664 Version 3", July 2019, 665 . 668 [SL-REF-LGR] 669 Internet Corporation for Assigned Names and Numbers 670 (ICANN), "Second Level Label Generation Rules", 2019, 671 . 674 [UnicodeA] 675 The Unicode Consortium, "The Unicode Standard, Version 676 12.1", May 2019. 678 Section 2.11 680 [UnicodeB] 681 The Unicode Consortium, "The Unicode Standard, Version 682 12.1", May 2019. 684 Section 3.6, definition D52 686 Appendix A. Change Log 688 RFC Editor: Please remove this appendix before publication. 690 A.1. Changes from version -00 (2017-03-11) to -01 692 o Added Acknowledgments and adjusted references. 694 o Filled in Section 5 with updates to respond to errata. 696 o Added Section 6 to discuss relationships to other documents. 698 o Modified the Abstract to note specifically updated documents. 700 o Several small editorial changes and corrections. 702 A.2. Changes from version -01 (2017-09-12) to -02 704 After a pause of nearly 34 months due to inability to get this draft 705 processed, including nearly a year waiting for a new directorate to 706 actually do anything of substance about fundamental IDNA issues, the 707 -02 version was posted in the hope of getting a new start. Specific 708 changes include: 710 o Added a new section, Section 4, and some introductory material to 711 address the very practical issue that domains run on a for-profit 712 basis are unlikely to follow the very strict "understand what you 713 are registering" requirement if they support IDNs at all and 714 expect to profit from them. 716 o Added a pointer to draft-klensin-idna-unicode-review to the 717 discussion of other work. 719 o Editorial corrections and changes. 721 A.3. Changes from version -02 (2019-07-06) to -03 723 o Minor editorial changes in response to shepherd review. 725 o Additional references. 727 A.4. Changes from version -03 (2019-07-22) to -04 729 o Editorial changes after AD review and some additional changes to 730 improve clarity. 732 A.5. Changes from version -04 (2019-08-02) to -05 734 o Small editorial corrections, many to correct glitches found during 735 IETF Last Call. 737 o Updated acknowledgments, particularly to reflect reviews in Last 738 Call. 740 A.6. Changes from version -05 (2019-08-29) to -06 742 Other than some small editorial adjustments, these changes made 743 after, and reflect, IESG post-last-call review and comments. To the 744 extent it was possible to do so without making this document 745 inconsistent with the other IDNA documents, established IETF, 746 Unicode, and ICANN community i18n terminology, or well-established 747 IDNA or i18n practices, the first author believes that the document 748 responds to all previously-outstanding IESG substantive comments. 750 o Fixed a remaining citation issue with a Unicode document. This 751 version has not been updated to reflect Unicode 13, but the 752 document should be adjusted so that all references are 753 contemporary at the time of publication. 755 o Added reference to homograph attacks, and slightly adjusted 756 discussion of them, per discussion with IESG post-last-call. 758 o Removed pointer to RFC 5890 from discussion of mixed-script labels 759 in Section 3. 761 o Rewrote parts of Section 4 to eliminate the term "for-profit" and 762 clarify the issues. 764 o Removed pointer to draft-klensin-idna-unicode-review because RFC 765 8753 has been published and is therefore no longer pending / 766 parallel work. 768 o Rewrote Section 6 to make the relationships among various 769 documents and efforts somewhat more clear. 771 o References to RFCs 5893 and 6912 moved from Informative to 772 Normative. 774 Authors' Addresses 776 John C Klensin 777 1770 Massachusetts Ave, Ste 322 778 Cambridge, MA 02140 779 USA 781 Phone: +1 617 245 1457 782 Email: john-ietf@jck.com 784 Asmus Freytag 785 ASMUS, Inc. 787 Email: asmus@unicode.org