idnits 2.17.1 draft-klensin-idna-rfc5891bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC5894, but the abstract doesn't seem to directly say this. It does mention RFC5894 though, so this could be OK. -- The draft header indicates that this document updates RFC5890, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC5891, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5890, updated by this document, for RFC5378 checks: 2008-10-14) (Using the creation date from RFC5891, updated by this document, for RFC5378 checks: 2008-05-22) (Using the creation date from RFC5894, updated by this document, for RFC5378 checks: 2008-05-13) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 6, 2019) is 1749 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Unicode' is mentioned on line 457, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-LGR3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ICANN-MSR4' ** Downref: Normative reference to an Informational RFC: RFC 1591 -- Duplicate reference: RFC5891, mentioned in 'RFC5891Erratum', was also mentioned in 'RFC5891'. ** Downref: Normative reference to an Informational RFC: RFC 5894 -- No information found for draft-lgr-procedure-20mar13-en - is the name correct? -- Duplicate reference: RFC5890, mentioned in 'RFC-Editor-5890Errata', was also mentioned in 'RFC5890'. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft 4 Updates: 5890, 5891, 5894 (if approved) A. Freytag 5 Intended status: Standards Track ASMUS, Inc. 6 Expires: January 7, 2020 July 6, 2019 8 Internationalized Domain Names in Applications (IDNA): Registry 9 Restrictions and Recommendations 10 draft-klensin-idna-rfc5891bis-02 12 Abstract 14 The IDNA specifications for internationalized domain names combine 15 rules that determine the labels that are allowed in the DNS without 16 violating the protocol itself and an assignment of responsibility, 17 consistent with earlier specifications, for determining the labels 18 that are allowed in particular zones. Conformance to IDNA by 19 registries and other implementations requires both parts. Experience 20 strongly suggests that the language describing those responsibilities 21 was insufficiently clear to promote safe and interoperable use of the 22 specifications and that more details and discussion of circumstances 23 would have been helpful. Without making any substantive changes to 24 IDNA, this specification updates two of the core IDNA documents (RFC 25 5980 and 5891) and the IDNA explanatory document (RFC 5894) to 26 provide that guidance and to correct some technical errors in the 27 descriptions. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on January 7, 2020. 46 Copyright Notice 48 Copyright (c) 2019 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 64 2. Registry Restrictions in IDNA2008 . . . . . . . . . . . . . . 4 65 3. Progressive Subsets of Allowed Characters . . . . . . . . . . 5 66 4. Considerations for For-Profit Domains . . . . . . . . . . . . 7 67 5. Other corrections and updates . . . . . . . . . . . . . . . . 9 68 5.1. Updates to RFC 5890 . . . . . . . . . . . . . . . . . . . 9 69 5.2. Updates to RFC 5891 . . . . . . . . . . . . . . . . . . . 10 70 6. Related Discussions . . . . . . . . . . . . . . . . . . . . . 10 71 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 72 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 73 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 74 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 75 10.1. Normative References . . . . . . . . . . . . . . . . . . 11 76 10.2. Informative References . . . . . . . . . . . . . . . . . 12 77 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 14 78 A.1. Changes from version -00 (2017-03-11) to -01 . . . . . . 14 79 A.2. Changes from version -01 (2017-09-12) to -02 . . . . . . 14 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 82 1. Introduction 84 Parts of the specifications for Internationalized Domain Names in 85 Applications (IDNA) [RFC5890] [RFC5891] [RFC5894] (collectively 86 known, along with RFC 5892 [RFC5892], RFC 5893 [RFC5893] and updates 87 to them, as "IDNA2008" (or just "IDNA") impose a requirement that 88 domain name system (DNS) registries restrict the characters they 89 allow in domain name labels (see Section 2 below), and the contents 90 and structure of those labels. That requirement and restriction are 91 consistent with the "trustee for the community" requirements of the 92 original specification for DNS naming and authority [RFC1591]. The 93 restrictions are intended to limit the permitted characters and 94 strings to those for which the registries or their advisers have a 95 thorough understanding and for which they are willing to take 96 responsibility. 98 That provision is centrally important because it recognized that 99 historical relationships and variations among scripts and writing 100 systems, the continuing evolution of those systems, differences in 101 the uses of characters among languages (and locations) that use the 102 same script, and so on make it impossible for a single list of 103 characters and simple rules to be able to generate an "if we use 104 these, we will be safe from confusion and various attacks" guideline. 106 Instead, the algorithm and rules of RFC 5981 and 5982 eliminate many 107 of the most dangerous and otherwise problematic cases, but cannot 108 eliminate the need for registries and registrars to understand what 109 they are doing and taking responsibility for the decisions they make. 111 The way in which the IDNA2008 specifications expressed these 112 requirements may have obscured the intention that they actually are 113 requirements. Section 2.3.2.3 of the Definitions document [RFC5890] 114 mentions the need for the restrictions, indicates that they are 115 mandatory, and points the reader to section 4.3 of the Protocol 116 document [RFC5891], which in turn points to Section 3.2 of the 117 Rationale document [RFC5894], with each document providing further 118 detail, discussion, and clarification. 120 At the same time, the Internet has evolved significantly since the 121 management assumptions for the DNS were established with RFC 1591 and 122 earlier. In particular, the management and use of domain names have 123 gone through several transformations. Recounting of those changes is 124 beyond the scope of this document but one of them has had significant 125 practical impact on the degree to which the requirement for registry 126 knowledge and responsibility is observed in practice. When RFC 1591 127 was written, the assumption was that domains at all levels of the DNS 128 would be operated in the best interest of the registrants in the 129 domain and of the Internet as a whole. There were no notions about 130 domains being operated for a profit and with a business model that 131 made them more profitable the more names that could be registered (or 132 even, under some circumstances, reserved and not registered) or that 133 domains would be considered more successful based on the number of 134 names registered and delegated from them. While rarely reflected in 135 the DNS protocols, the distinction between domains operated in those 136 ways and ones that are operated for, e.g., use within an enterprise 137 or otherwise as a service have become very important today. See 138 Section 4 for a discussion on how those issues affect this 139 specification. 141 This specification is intended to unify and clarify these 142 requirements for registry decisions and responsibility and to 143 emphasize the importance of registry restrictions at all levels of 144 the DNS. It also makes a specific recommendation for character 145 repertoire subsetting intermediate between the code points allowed by 146 RFC 5891 and 5892 and those allowed by individual registries. It 147 does not alter the basic IDNA2008 protocols and rules themselves in 148 any way. 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 152 document are to be interpreted as described in RFC 2119 [RFC2119]. 154 2. Registry Restrictions in IDNA2008 156 As mentioned above, IDNA2008 specifies that the registries for each 157 zone in the DNS that supports IDN labels are required to develop and 158 apply their own rules to restrict the allowable labels, including 159 limiting characters they allow to be used in labels in that zone. 160 The chosen list MUST BE smaller than the collection of code points 161 specified as "PVALID", "CONTEXTJ", and "CONTEXTO" by the rules 162 established by the protocols themselves. The latter two categories, 163 and labels containing any characters that are normally part of a 164 script written right to left [RFC5893], require that additional 165 rules, specified in the protocols and known as "contextual rules" and 166 "bidi rules", be applied. The entire collection of rules and 167 restrictions required by the IDNA2008 protocols themselves are known 168 as "protocol restrictions". 170 As mentioned above, registries may apply (and generally are required 171 to apply) additional rules to further restrict the list of permitted 172 code points, contextual rules (perhaps applied to normally PVALID 173 code points) that apply additional restrictions, and/or restrictions 174 on labels. The most obvious of those restrictions include provisions 175 for restricting suggested new registrations based on conflicts with 176 labels already registered in the zone and specifications of what 177 constitutes such conflicts based on the properties of the labels in 178 question. They further include prohibitions on code points and 179 labels that are not consistent with the intended function of the zone 180 or the subtree in which it is embedded (see Section 3) or limitations 181 on where in a label allowable code points may be placed. 183 These per-registry (or per-zone) rules are commonly known as 184 "registry restrictions" to distinguish them from the protocol 185 restrictions described above. By necessity, the latter are somewhat 186 generic, having to cater both to the union of the needs for all 187 zones, as well as to the most permissive zones. In consequence, 188 additional Registry restrictions are essential to provide for the 189 necessary security in the face of the tremendous variations and 190 differences in writing systems, their ongoing evolution and 191 development, as well as the human ability to recognize and 192 distinguish characters in different scripts around the world and 193 under different circumstances. 195 3. Progressive Subsets of Allowed Characters 197 The algorithm and rules of RFC 5891 and 5892 set an absolute upper 198 bound on the code points that can be used in domain name labels; 199 registries MUST NOT include code points unless they are allowed by 200 those rules. Each registry that intends to allow IDN registrations 201 MUST then determine which code points will be allowed by that 202 registry. It SHOULD also consider additional rules, including 203 contextual and whole label restrictions that provide further 204 protection for registrants and users. For example, the widely-used 205 principle that bars labels containing characters from more than one 206 script is not an IDNA2008 requirement. It has been adopted by many 207 registries but, as Section 4.4 of RFC 5890 indicates, there may be 208 circumstances in which is it not required or appropriate. 210 In formulating their own rules, registries SHOULD normally consult 211 carefully-developed consensus recommendations about global maximum 212 repertoires to be used such as the ICANN Maximal Starting Repertoire 213 4 (MSR-4) for the Development of Label Generation Rules for the Root 214 Zone [ICANN-MSR4] (or its successor documents). Additional 215 recommendations of similar quality about particular scripts or 216 languages exist, including, but not limited to, the RFCs for Cyrillic 217 [RFC5992] or Arabic Language [RFC5564] or script-based repertoires 218 from the approved ICANN Root Zone Label Generation Rules (LGR-3) 219 [ICANN-LGR3] (or its successor documents). Many of these 220 recommendations also cover rules about relationships among code 221 points that may be particularly important for complex scripts and 222 recommendations on how to deal with alternate representations of the 223 same or apparently the same labels. 225 It is the responsibility of the registry to determine which, if any, 226 of those recommendations are applicable and to further subset or 227 extend them as needed. For example, several of the recommendations 228 are designed for the root zone and therefore exclude digits and 229 U+002D HYPHEN-MINUS; this restriction is not generally appropriate 230 for other zones. On the other hand, some zones may be designed to 231 not cater for all users of a given script, but perhaps only for the 232 needs of selected languages, in which case a more selective 233 repertoire may be appropriate. 235 In making these determinations, a registry SHOULD follow the IAB 236 guidance in RFC 6912 [RFC6912]. Those guidelines include a number of 237 principles for use in making decisions about allowable code points. 238 In addition, that document notes that the closer a particular zone is 239 to the root, the more restrictive the space of permitted labels 240 should be. RFC 5894 provides some suggestions for any registry that 241 may decide to reduce opportunities for confusion or attacks by 242 constructing policies that disallow characters used in historic 243 writing systems (whether these be archaic scripts or extensions of 244 modern scripts for historic or obsolete orthographies) or characters 245 whose use is restricted to specialized, or highly technical contexts. 246 These suggestions were among the principles guiding the design of 247 ICANN's Maximal Starting Repertoires [LGR-Procedure]. 249 Particularly for a zone for which all labels to be delegated are not 250 for the use of the same organization or enterprise, a registry 251 decision to allow only those code points in the full repertoire of 252 the MSR (plus digits and hyphen) would already avoid a number of 253 issues inherent in a more permissive policy like "use anything 254 permitted by IDNA2008", while still supporting the native languages 255 and scripts for the vast majority of users today. However, it is 256 unlikely, by itself, to fully satisfy the mandate set out above for 257 three reasons. 259 1. The MSR, like the set of code points permissible under IDNA2008 260 itself, was conceived merely as an upper bound on permissible 261 letter code points (it excludes digits and the hyphen). It was 262 always intended to be used as a starting point for setting 263 registry policy, with the expectation that some of the code 264 points in the MSR would not be included in the final registry 265 policy, whether for lack of actual usage, or for being inherently 266 problematic. 268 2. It was recognized that many scripts require contextual rules for 269 many more code points than are covered by CONTEXTO or CONTEXTJ 270 rules defined in IDNA2008. This is particularly true for 271 combining marks, typically used to encode diacritics, tone marks, 272 vowel signs and the like. While, theoretically, any combining 273 mark may occur in any context in Unicode, in practice rendering 274 and other software that users rely on in viewing or entering 275 labels will not support arbitrary combining sequences, or indeed 276 arbitrary combinations of code points, in the case of complex 277 scripts. 279 Contextual rules are required to limit allowable code point 280 sequences to those that can be expected to be rendered reliably. 281 Identifying those requires knowledge about the way code points 282 are used in a script, whence the mandate for registries to only 283 support code points they understand. In this, some of the other 284 recommendations, such as the Informational RFCs for specific 285 scripts (e.g., Cyrillic [RFC5992]) or languages (e.g., Arabic 286 [RFC5564] or Chinese [RFC4713]), or the Root Zone LGRs developed 287 by ICANN, may provide useful guidance. 289 3. Third, because of the widely accepted practice of limiting any 290 given label to a single script, a universal repertoire, such as 291 the MSR, would have to be divided on a per script basis into 292 subrepertoires to make it useful, with some of those repertoires 293 overlapping, for example, in the case of East Asian shared usage 294 of the Han ideographs. 296 Registries choosing to make exceptions and allow code points that 297 recommendations such as the MSR do not allow should make such 298 decisions only with great care and only if they have considerable 299 understanding of, and great confidence in, their appropriateness. 300 The obvious exception from the MSR would be to allow digits and the 301 hyphen. Neither were allowed by the MSR, but only because they are 302 not allowed in the Root Zone. 304 Nothing in this document permits a registry to allow code points or 305 labels that are disallowed or otherwise prohibited by IDNA2008. 307 4. Considerations for For-Profit Domains 309 As discussed in the Introduction (Section 1), the distributed 310 administrative structure of the DNS today can be described by 311 dividing zones into two categories depending on how they are 312 administered and for whom. These categories are not precise -- some 313 zones may not fall neatly into one category or the other -- but are 314 useful in understanding the practical applicability of this 315 specification. They are: 317 Zones operating primarily or exclusively within an organization or 318 enterprise and responsible to that organization or enterprise. 319 DNS operations, including registrations and delegations, will 320 typically occur in support of the purpose of that organization or 321 enterprise rather than being its primary purpose. 323 Zones operating primarily on a for-profit basis in which most 324 delegations of subdomains are to entities with little or no no 325 affiliation with the registry operator other than contractual 326 agreements about operation of those subdomains. These zones are 327 often known as "public domains" or with similar terms, but those 328 terms often have other semantics and may not cover all cases. 330 Rules requiring strict registry responsibility, including either 331 thorough understanding of scripts and related issues in domain name 332 labels being considered for registration or local naming rules that 333 have the same effect, typically come naturally to registries for 334 zones of the first type. Registration of labels that would prove 335 problematic for any reason hurts the relevant organization or 336 enterprise or its customers. More generally, there are strong 337 incentives to be extremely conservative about labels that might be 338 registered and few, if any, incentives favoring adventures into 339 labels that might be considered clever, much less ones that are hard 340 to type, render, or, where it is relevant to users, remember 341 correctly. 343 By contrast, in a for-profit zone in which the profits are limited to 344 selling names, there may be perceived incentives to register whatever 345 names would-be registrants "want" or fears that any restrictions will 346 cut into the available namespace. In such situations, restrictions 347 are unlikely to be applied unless they meet at least one of two 348 criteria: (i) they are easy to apply and can be applied 349 algorithmically or otherwise automatically and/or (ii) there is clear 350 evidence that the particular label would cause harm. 352 As suggested above, the two categories above are not precise. In 353 particular, there may be domains that, despite being set up to 354 operate at a profit, are sufficiently conservative about their 355 operations to more closely resemble the first group in practice than 356 the second one. 358 The requirement of IDNA that is discussed at length elsewhere in this 359 specification stands: IDNA (and IDNs generally) would work better and 360 Internet users would be better protected and more secure if 361 registries and registrars (of any type) confined their registrations 362 to scripts and code point sequences that they understood thoroughly. 363 While the IETF rarely gives advice to those who choose to violate 364 IETF Standards, some advice to zones in the second category above may 365 be in order. That advice is that significant conservatism in what is 366 allowed to be registered, even for reservation purposes, and even 367 more conservatism about what labels are actually entered into zones 368 and delegated, is the best option for the Internet and its users. If 369 practical considerations do not allow that much conservatism, then it 370 is desirable to consult and utilize the many lists and tables that 371 have been, and continue to be, developed to advise on what might be 372 sensible for particular scripts (such as ICANN's efforts for script- 373 specific "generation rules" [[CREF1: Reference??? ]]) and lists of 374 code points or code point relationships that may be particularly 375 problematic and that should be treated with extra caution or 376 prohibited entirely such as the proposed "troublesome character" list 377 [Freytag-troublesome]. See also Section 6 below. 379 5. Other corrections and updates 381 After the initial IDNA2008 documents were published (and RFC 5892 was 382 updated for Unicode 6.0 by RFC 6452 [RFC6452]) several errors or 383 instances of confusing text were noted. For the convenience of the 384 community, the relevant corrections for RFC 5890 and 5891 are noted 385 below and update the corresponding documents. There are no errata 386 for RFC 5893 or 5894 as of the date this document was published. 387 Because further updates to RFC 5892 would require addressing other 388 pending issues, the outstanding erratum for that document is not 389 considered here. For consistency with the original documents, 390 references to Unicode 5.0 are preserved in this document. 392 Readers should note that an update to RFC 5892 that is primarily 393 concerned with the review process for new versions of Unicode but 394 that makes some additional patches 395 [ID.draft-klensin-idna-unicode-review] is in progress. Its status 396 should be checked in conjunction with application of the present 397 specification. 399 5.1. Updates to RFC 5890 401 The outstanding errata against RFC 5890 (Errata ID 4695, 4696, 4823, 402 and 4824 [RFC-Editor-5890Errata]) are all associated with the same 403 issue, the number of Unicode characters that can be associated with a 404 maximum-length (63 octet) A-label. In retrospect and contrary to 405 some of the suggestions in the errata, that value should not be 406 expressed in octets because RFC 5890 and the other IDNA 2008 407 documents are otherwise careful to not specify Unicode encoding forms 408 but, instead, work exclusively with Unicode code points. 409 Consequently the relevant material in RFC 5890 should be corrected as 410 follows: 412 Section 2.3.2.1 414 Old: expansion of the A-label form to a U-label may produce 415 strings that are much longer than the normal 63 octet DNS limit 416 (potentially up to 252 characters). 418 New: expansion of the A-label form to a U-label may produce 419 strings that are much longer than the normal 63 octet DNS limit 420 (See Section 4.2). 422 Comment: If the length limit is going to be a source of confusion 423 or careful calculations, it should appear in only one place. 425 Section 4.2 426 Old: Because A-labels (the form actually used in the DNS) are 427 potentially much more compressed than UTF-8 (and UTF-8 is, in 428 general, more compressed that UTF-16 or UTF-32), U-labels that 429 obey all of the relevant symmetry (and other) constraints of 430 these documents may be quite a bit longer, potentially up to 431 252 characters (Unicode code points). 433 New: A-labels (the form actually used in the DNS) and the 434 Punycode algorithm used as part of the process to produce them 435 [RFC3492] are strings that are potentially much more compressed 436 than any standard Unicode Encoding Form. [[CREF2: Do we need a 437 reference for this here??]] A 63 octet A-label cannot represent 438 more than 58 Unicode code points (four octet overhead and the 439 requirement that at least one character lie outside the ASCII 440 range) but implementations allocating buffer space for the 441 conversion should allow significantly more space depending on 442 the encoding form they are using. 444 5.2. Updates to RFC 5891 446 Errata ID 3969: Improve reference for combining marks There is only 447 one erratum for RFC 5891, Errata ID 3969 [RFC5891Erratum]. 448 Combining marks are explained in the cited section, but not, as 449 the text indicates, exactly defined. 451 Old: The Unicode string MUST NOT begin with a combining mark or 452 combining character (see The Unicode Standard, Section 2.11 453 [Unicode] for an exact definition). 455 New: The Unicode string MUST NOT begin with a combining mark or 456 combining character (see The Unicode Standard, Section 2.11 457 [Unicode] for an explanation and Section 3.6, definition D52) 458 for an exact definition). 460 Comment: When RFC 5891 is actually updated, the references in the 461 text should be updated to the current version of Unicode and 462 the section numbers checked. 464 6. Related Discussions 466 This document is one of a series of measures that have been suggested 467 to address IDNA issues raised in other documents, including 468 mechanisms for dealing with combining sequences and single-code point 469 characters with the same appearance that normalization neither 470 combines nor decomposes as IDNA2008 assumed [IDNA-Unicode], including 471 the IAB response to that issue [IAB-2015], and to take a higher-level 472 view of issues, demands, and proposals for new uses of the DNS. 473 Those documents also include a discussion of issues with IDNA and 474 character graphemes for which abstractions exist in Unicode in 475 precomposed form but that can be generated from combining sequences 476 and a suggested registry of code points known to be problematic 477 [Freytag-troublesome]. The discussion of combining sequences and 478 non-decomposing characters is intended to lay the foundation for an 479 actual update to the IDNA code points document [RFC5892]. Such an 480 update will presumably also address the existing errata against that 481 document. 483 7. Security Considerations 485 As discussed in IAB recommendations about internationalized domain 486 names [RFC4690], [RFC6912], and elsewhere, poor choices of strings 487 for DNS labels can lead to opportunities for attacks, user confusion, 488 and other issues less directly related to security. This document 489 clarifies the importance of registries carefully establishing design 490 policies for the labels they will allow and that having such policies 491 and taking responsibility for them is a requirement, not an option. 492 If that clarification is useful in practice, the result should be an 493 improvement in security. 495 8. Acknowledgments 497 Many thanks to Patrik Faltstrom who provided an important review on 498 the initial version. 500 9. IANA Considerations 502 [[CREF3: RFC Editor: Please remove this section before publication.]] 504 This memo includes no requests to or actions for IANA. In 505 particular, it does not contain any provisions that would alter any 506 IDNA-related registries or tables. 508 10. References 510 10.1. Normative References 512 [ICANN-LGR3] 513 ICANN, "Root Zone Label Generation Rules (LGR-1)", July 514 2019, 515 . 517 [ICANN-MSR4] 518 ICANN, "Maximal Starting Repertoire Version 4 (MSR-4) for 519 the Development of Label Generation Rules for the Root 520 Zone", January 2019, 521 . 523 [RFC1591] Postel, J., "Domain Name System Structure and Delegation", 524 RFC 1591, DOI 10.17487/RFC1591, March 1994, 525 . 527 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 528 Requirement Levels", BCP 14, RFC 2119, 529 DOI 10.17487/RFC2119, March 1997, 530 . 532 [RFC5890] Klensin, J., "Internationalized Domain Names for 533 Applications (IDNA): Definitions and Document Framework", 534 RFC 5890, DOI 10.17487/RFC5890, August 2010, 535 . 537 [RFC5891] Klensin, J., "Internationalized Domain Names in 538 Applications (IDNA): Protocol", RFC 5891, 539 DOI 10.17487/RFC5891, August 2010, 540 . 542 [RFC5891Erratum] 543 "RFC 5891, "Internationalized Domain Names in Applications 544 (IDNA): Protocol"", Errata ID 3969, April 2014, 545 . 547 [RFC5894] Klensin, J., "Internationalized Domain Names for 548 Applications (IDNA): Background, Explanation, and 549 Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, 550 . 552 10.2. Informative References 554 [Freytag-troublesome] 555 Freytag, A., Klensin, J., and A. Sullivan, "Those 556 Troublesome Characters: A Registry of Unicode Code Points 557 Needing Special Consideration When Used in Network 558 Identifiers", June 2017, . 561 [IAB-2015] 562 Internet Architecture Board (IAB), "IAB Statement on 563 Identifiers and Unicode 7.0.0", February 2015, 564 . 568 [ID.draft-klensin-idna-unicode-review] 569 Klensin, J. and P. Faltstrom, "IDNA Review for New Unicode 570 Versions", June 2019, . 573 [IDNA-Unicode] 574 Klensin, J. and P. Falstrom, "IDNA Update for Unicode 575 7.0.0", September 2017, . 578 [LGR-Procedure] 579 Internet Corporation for Assigned Names and Numbers 580 (ICANN), "Procedure to Develop and Maintain the Label 581 Generation Rules for the Root Zone in Respect of IDNA 582 Labels", March 2013, 583 . 586 [RFC-Editor-5890Errata] 587 RFC Editor, "RFC Errata: RFC 5890, "Internationalized 588 Domain Names for Applications (IDNA): Definitions and 589 Document Framework", August 2010", Note to RFC 590 Editor: Please figure out how you would like this 591 referenced and make it so., Captured 2017-09-10, 2016, 592 . 594 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 595 for Internationalized Domain Names in Applications 596 (IDNA)", RFC 3492, DOI 10.17487/RFC3492, March 2003, 597 . 599 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 600 Recommendations for Internationalized Domain Names 601 (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006, 602 . 604 [RFC4713] Lee, X., Mao, W., Chen, E., Hsu, N., and J. Klensin, 605 "Registration and Administration Recommendations for 606 Chinese Domain Names", RFC 4713, DOI 10.17487/RFC4713, 607 October 2006, . 609 [RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman, 610 "Linguistic Guidelines for the Use of the Arabic Language 611 in Internet Domains", RFC 5564, DOI 10.17487/RFC5564, 612 February 2010, . 614 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 615 Internationalized Domain Names for Applications (IDNA)", 616 RFC 5892, DOI 10.17487/RFC5892, August 2010, 617 . 619 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts 620 for Internationalized Domain Names for Applications 621 (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, 622 . 624 [RFC5992] Sharikov, S., Miloshevic, D., and J. Klensin, 625 "Internationalized Domain Names Registration and 626 Administration Guidelines for European Languages Using 627 Cyrillic", RFC 5992, DOI 10.17487/RFC5992, October 2010, 628 . 630 [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code 631 Points and Internationalized Domain Names for Applications 632 (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, 633 November 2011, . 635 [RFC6912] Sullivan, A., Thaler, D., Klensin, J., and O. Kolkman, 636 "Principles for Unicode Code Point Inclusion in Labels in 637 the DNS", RFC 6912, DOI 10.17487/RFC6912, April 2013, 638 . 640 Appendix A. Change Log 642 RFC Editor: Please remove this appendix before publication. 644 A.1. Changes from version -00 (2017-03-11) to -01 646 o Added Acknowledgments and adjusted references. 648 o Filled in Section 5 with updates to respond to errata. 650 o Added Section 6 to discuss relationships to other documents. 652 o Modified the Abstract to note specifically updated documents. 654 o Several small editorial changes and corrections. 656 A.2. Changes from version -01 (2017-09-12) to -02 658 After pause of nearly 34 months due to inability to get this draft 659 processed, including nearly a year waiting for a new directorate to 660 actually do anything of substance about fundamental IDNA issues, the 661 -02 version is being posted in the hope of getting a new start. 662 Specific changes include: 664 o Added a new section, Section 4, and some introductory material to 665 address the very practical issue that domains run on a for-profit 666 basis are unlikely to follow the very strict "understand what you 667 are registering" requirement if they support IDNs at all and 668 expect to profit from them. 670 o Added a pointer to draft-klensin-idna-unicode-review to the 671 discussion of other work. 673 o Editorial corrections and changes. 675 Authors' Addresses 677 John C Klensin 678 1770 Massachusetts Ave, Ste 322 679 Cambridge, MA 02140 680 USA 682 Phone: +1 617 245 1457 683 Email: john-ietf@jck.com 685 Asmus Freytag 686 ASMUS, Inc. 688 Email: asmus@unicode.org