idnits 2.17.1 draft-ietf-idnabis-protocol-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC3490, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC3492, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3492, updated by this document, for RFC5378 checks: 2002-01-10) -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 5, 2009) is 5529 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC1123' is defined on line 716, but no explicit reference was found in the text == Unused Reference: 'Unicode-PropertyValueAliases' is defined on line 726, but no explicit reference was found in the text == Unused Reference: 'Unicode-RegEx' is defined on line 731, but no explicit reference was found in the text == Unused Reference: 'Unicode-Scripts' is defined on line 736, but no explicit reference was found in the text == Unused Reference: 'ASCII' is defined on line 748, but no explicit reference was found in the text == Unused Reference: 'RFC2136' is defined on line 762, but no explicit reference was found in the text == Unused Reference: 'RFC2181' is defined on line 766, but no explicit reference was found in the text == Unused Reference: 'RFC2535' is defined on line 769, but no explicit reference was found in the text == Unused Reference: 'RFC2671' is defined on line 772, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IDNA2008-BIDI' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-PropertyValueAliases' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-RegEx' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-Scripts' -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode-UAX15' -- Obsolete informational reference (is this intentional?): RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) -- Obsolete informational reference (is this intentional?): RFC 2671 (Obsoleted by RFC 6891) -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 3491 (Obsoleted by RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 4952 (Obsoleted by RFC 6530) Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft March 5, 2009 4 Obsoletes: 3490, 3491 5 (if approved) 6 Updates: 3492 (if approved) 7 Intended status: Standards Track 8 Expires: September 6, 2009 10 Internationalized Domain Names in Applications (IDNA): Protocol 11 draft-ietf-idnabis-protocol-10.txt 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. This document may contain material 17 from IETF Documents or IETF Contributions published or made publicly 18 available before November 10, 2008. The person(s) controlling the 19 copyright in some of this material may not have granted the IETF 20 Trust the right to allow modifications of such material outside the 21 IETF Standards Process. Without obtaining an adequate license from 22 the person(s) controlling the copyright in such materials, this 23 document may not be modified outside the IETF Standards Process, and 24 derivative works of it may not be created outside the IETF Standards 25 Process, except to format it for publication as an RFC or to 26 translate it into languages other than English. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on September 6, 2009. 46 Copyright Notice 48 Copyright (c) 2009 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents in effect on the date of 53 publication of this document (http://trustee.ietf.org/license-info). 54 Please review these documents carefully, as they describe your rights 55 and restrictions with respect to this document. 57 Abstract 59 This document supplies the protocol definition for a revised and 60 updated specification for internationalized domain names (IDNs). The 61 rationale for these changes, the relationship to the older 62 specification, and important terminology are provided in other 63 documents. This document specifies the protocol mechanism, called 64 Internationalizing Domain Names in Applications (IDNA), for 65 registering and looking up IDNs in a way that does not require 66 changes to the DNS itself. IDNA is only meant for processing domain 67 names, not free text. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 1.1. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 5 73 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 3. Requirements and Applicability . . . . . . . . . . . . . . . . 6 75 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 6 76 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 6 77 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 7 78 3.2.2. Non-domain-name Data Types Stored in the DNS . . . . . 7 79 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 7 80 4.1. Input to IDNA Registration Process . . . . . . . . . . . . 8 81 4.2. Permitted Character and Label Validation . . . . . . . . . 8 82 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 8 83 4.2.2. Rejection of Characters that are not Permitted . . . . 9 84 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 9 85 4.2.4. Registration Validation Summary . . . . . . . . . . . 10 86 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 10 87 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 11 88 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 11 89 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 11 90 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 12 91 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 12 92 5.3. Character Changes in Preprocessing or the User 93 Interface . . . . . . . . . . . . . . . . . . . . . . . . 12 94 5.4. A-label Input . . . . . . . . . . . . . . . . . . . . . . 13 95 5.5. Validation and Character List Testing . . . . . . . . . . 14 96 5.6. Punycode Conversion . . . . . . . . . . . . . . . . . . . 15 97 5.7. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 15 98 6. Security Considerations . . . . . . . . . . . . . . . . . . . 15 99 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 100 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 16 101 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 102 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 103 10.1. Normative References . . . . . . . . . . . . . . . . . . . 17 104 10.2. Informative References . . . . . . . . . . . . . . . . . . 18 105 Appendix A. Local Mapping Alternatives . . . . . . . . . . . . . 19 106 A.1. Transitional Mapping Model . . . . . . . . . . . . . . . . 19 107 A.2. Internationalized Resource Identifier (IRI) Mapping 108 Model . . . . . . . . . . . . . . . . . . . . . . . . . . 20 109 Appendix B. Summary of Major Changes from IDNA2003 . . . . . . . 21 110 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 22 111 C.1. Changes between Version -00 and -01 of 112 draft-ietf-idnabis-protocol . . . . . . . . . . . . . . . 22 113 C.2. Version -02 . . . . . . . . . . . . . . . . . . . . . . . 22 114 C.3. Version -03 . . . . . . . . . . . . . . . . . . . . . . . 22 115 C.4. Version -04 . . . . . . . . . . . . . . . . . . . . . . . 22 116 C.5. Version -05 . . . . . . . . . . . . . . . . . . . . . . . 23 117 C.6. Version -06 . . . . . . . . . . . . . . . . . . . . . . . 23 118 C.7. Version -07 . . . . . . . . . . . . . . . . . . . . . . . 23 119 C.8. Version -08 . . . . . . . . . . . . . . . . . . . . . . . 23 120 C.9. Version -09 . . . . . . . . . . . . . . . . . . . . . . . 24 121 C.10. Version -10 . . . . . . . . . . . . . . . . . . . . . . . 24 122 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 24 124 1. Introduction 126 This document supplies the protocol definition for a revised and 127 updated specification for internationalized domain names. Essential 128 definitions and terminology for understanding this document and a 129 road map of the collection of documents that make up IDNA2008 appear 130 in [IDNA2008-Defs]. Appendix B discusses the relationship between 131 this specification and the earlier version of IDNA (referred to here 132 as "IDNA2003") and the rationale for these changes, along with 133 considerable explanatory material and advice to zone administrators 134 who support IDNs is provided in another documents, notably 135 [IDNA2008-Rationale]. 137 IDNA works by allowing applications to use certain ASCII string 138 labels (beginning with a special prefix) to represent non-ASCII name 139 labels. Lower-layer protocols need not be aware of this; therefore 140 IDNA does not depend on changes to any infrastructure. In 141 particular, IDNA does not depend on any changes to DNS servers, 142 resolvers, or protocol elements, because the ASCII name service 143 provided by the existing DNS is entirely sufficient for IDNA. 145 IDNA is applied only to DNS labels. Standards for combining labels 146 into fully-qualified domain names and parsing labels out of those 147 names are covered in the base DNS standards [RFC1034] [RFC1035] and 148 their various updates. An application may, of course, apply locally- 149 appropriate conventions to the presentation forms of domain names as 150 discussed in [IDNA2008-Rationale]. 152 While they share terminology, reference data, and some operations, 153 this document describes two separate protocols, one for IDN 154 registration (Section 4) and one for IDN lookup (Section 5). 156 1.1. Discussion Forum 158 [[anchor3: RFC Editor: please remove this section.]] 160 This work is being discussed in the IETF IDNABIS WG and on the 161 mailing list idna-update@alvestrand.no 163 2. Terminology 165 General terminology applicable to IDNA, but with meanings familiar to 166 those who have worked with Unicode or other character set standards 167 and the DNS, appears in [IDNA2008-Defs]. Terminology that is an 168 integral, normative, part of the IDNA definition, including the 169 definitions of "ACE", appears in that document as well. Familiarity 170 with the terminology materials in that document is assumed for 171 reading this one. The reader of this document is assumed to be 172 familiar with DNS-specific terminology as defined in RFC 1034 173 [RFC1034]. 175 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 176 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 177 document are to be interpreted as described in BCP 14, RFC 2119 178 [RFC2119]. 180 3. Requirements and Applicability 182 3.1. Requirements 184 IDNA conformance means adherence to the following requirements: 186 1. Whenever a domain name is put into an IDN-unaware domain name 187 slot (see Section 2 and [IDNA2008-Defs]), it MUST contain only 188 ASCII characters (i.e., must be either an A-label or an NR-LDH- 189 label), or must be a label associated with a DNS application that 190 is not subject to either IDNA or the historical recommendations 191 for "hostname"-style names [RFC1034]. 193 2. Comparison of labels MUST be done on equivalent forms: either 194 both A-Label forms or both U-Label forms. Because A-labels and 195 U-labels can be transformed into each other without loss of 196 information, these comparisons are equivalent. However, when a 197 pair of putative A-labels are compared, the comparison MUST use 198 an ASCII case-insensitive comparison (as with all comparisons of 199 ASCII DNS labels). Comparisons on putative U-labels must test 200 that the two strings are identical, without case-folding or other 201 intermediate steps. Note that it is not necessary to verify that 202 labels are valid in order to compare them. In many cases, 203 verification of validity (that the strings actually are A-labels 204 or U-labels) may be important for other reasons and SHOULD be 205 performed. 207 3. Labels being registered MUST conform to the requirements of 208 Section 4. Labels being looked up and the lookup process MUST 209 conform to the requirements of Section 5. 211 3.2. Applicability 213 IDNA is applicable to all domain names in all domain name slots 214 except where it is explicitly excluded. It is not applicable to 215 domain name slots which do not use the LDH syntax rules. 217 This implies that IDNA is applicable to many protocols that predate 218 IDNA. Note that IDNs occupying domain name slots in those older 219 protocols MUST be in A-label form until and unless those protocols 220 and implementations of them are upgraded to be IDN-aware. IDNs 221 actually appearing in DNS queries or responses MUST be A-labels. 223 3.2.1. DNS Resource Records 225 IDNA applies only to domain names in the NAME and RDATA fields of DNS 226 resource records whose CLASS is IN. 228 There are currently no other exclusions on the applicability of IDNA 229 to DNS resource records. Applicability depends entirely on the 230 CLASS, and not on the TYPE except as noted below. This will remain 231 true, even as new types are defined, unless there is a compelling 232 reason for a new type that requires type-specific rules. The special 233 naming conventions applicable to SRV records are examples of type- 234 specific rules that are incompatible with IDNA coding. Hence the 235 first two labels (the ones required to start in "_") on a record with 236 TYPE SRV MUST NOT be A-labels or U-labels (while it would be possible 237 to write a non-ASCII string with a leading underscore, conversion to 238 an A-label would be impossible without loss of information because 239 the underscore is not a letter, digit, or hyphen and is consequently 240 DISALLOWED in IDNs). Of course, those labels may be part of a domain 241 that uses IDN labels at higher levels in the tree. 243 3.2.2. Non-domain-name Data Types Stored in the DNS 245 Although IDNA enables the representation of non-ASCII characters in 246 domain names, that does not imply that IDNA enables the 247 representation of non-ASCII characters in other data types that are 248 stored in domain names, specifically in the RDATA field for types 249 that have structured RDATA format. For example, an email address 250 local part is stored in a domain name in the RNAME field as part of 251 the RDATA of an SOA record (hostmaster@example.com would be 252 represented as hostmaster.example.com). IDNA specifically does not 253 update the existing email standards, which allow only ASCII 254 characters in local parts. Even though work is in progress to define 255 internationalization for email addresses [RFC4952], changes to the 256 email address part of the SOA RDATA would require action in, or 257 updates to, other standards, specifically those that specify the 258 format of the SOA RR. 260 4. Registration Protocol 262 This section defines the procedure for registering an IDN. The 263 procedure is implementation independent; any sequence of steps that 264 produces exactly the same result for all labels is considered a valid 265 implementation. 267 Note that, while the registration and lookup protocols (Section 5) 268 are very similar in most respects, they are different and 269 implementers should carefully follow the steps they are implementing. 271 4.1. Input to IDNA Registration Process 273 [[anchor8: Note in Draft: This subsection is new in -09/, based on 274 comments on the mailing list in January and February 2009. It 275 replaces the previous first two subsections of this section and 276 completely eliminates the discussion of local mapping for 277 registration.]] 279 Registration processes are outside the scope of these protocols and 280 may differ significantly depending on local needs. By the time a 281 string enters the IDNA registration process as described in this 282 specification, it is expected to be in Unicode and MUST be in Unicode 283 Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for 284 zone files ("registries") are expected to accept only the exact 285 string for which registration is requested, free of any mappings or 286 local adjustments. They SHOULD avoid any possible ambiguity by 287 accepting registrations only for A-labels, possibly paired with the 288 relevant U-labels so that they can verify the correspondence. 290 4.2. Permitted Character and Label Validation 292 4.2.1. Input Format 294 The registry MAY permit submission of labels in A-label form and is 295 encouraged to accept both the A-label form and the U-label one. If 296 it does so, it MUST perform a conversion to a U-label, perform the 297 steps and tests described below, and verify that the A-label produced 298 by the step in Section 4.4 matches the one provided as input. In 299 addition, if a U-label was provided, that U-label and the one 300 obtained by conversion of the A-label MUST match exactly. If, for 301 some reason, these tests fail, the registration MUST be rejected. If 302 the conversion to a U-label is not performed, the registry MUST still 303 verify that the A-label is superficially valid, i.e., that it does 304 not violate any of the rules of Punycode [RFC3492] encoding such as 305 the prohibition on trailing hyphen-minus, appearance of non-basic 306 characters before the delimiter, and so on. Fake A-labels, i.e., 307 invalid strings that appear to be A-labels but are not, MUST NOT be 308 placed in DNS zones that support IDNA. 310 4.2.2. Rejection of Characters that are not Permitted 312 The candidate Unicode string is checked to verify that characters 313 that IDNA does not permit do not appear in it. Those characters are 314 identified in the "DISALLOWED" and "UNASSIGNED" lists that are 315 specified in [IDNA2008-Tables] and described informally in 316 [IDNA2008-Rationale]. Characters that are either DISALLOWED or 317 UNASSIGNED MUST NOT be part of labels to be processed for 318 registration in the DNS. 320 4.2.3. Label Validation 322 The proposed label (in the form of a Unicode string, i.e., a putative 323 U-label) is then examined, performing tests that require examination 324 of more than one character. 326 4.2.3.1. Rejection of Hyphen Sequences in U-labels 328 The Unicode string MUST NOT contain "--" (two consecutive hyphens) in 329 the third and fourth character positions when the label is considered 330 in "on the wire" order. 332 4.2.3.2. Leading Combining Marks 334 The first character of the string (when the label is considered in 335 "on the wire" order) is examined to verify that it is not a combining 336 mark (or combining character) (see The Unicode Standard, Section 2.11 337 [Unicode] for an exact definition). If it is a combining mark, the 338 string MUST NOT be registered. 340 4.2.3.3. Contextual Rules 342 Each code point is checked for its identification as a character 343 requiring contextual processing for registration (the list of 344 characters appears as the combination of CONTEXTJ and CONTEXTO in 345 [IDNA2008-Tables] as do the contextual rules themselves). If that 346 indication appears, the table of contextual rules is checked for a 347 rule for that character. If no rule is found, the proposed label is 348 rejected and MUST NOT be installed in a zone file. If one is found, 349 it is applied (typically as a test on the entire label or on adjacent 350 characters within the label). If the application of the rule does 351 not conclude that the character is valid in context, the proposed 352 label MUST BE rejected. (See the IANA Considerations: IDNA Context 353 Registry section of [IDNA2008-Tables].) 355 These contextual rules are required to support the use of characters 356 that could be used, under other conditions, to produce misleading 357 labels or to cause unacceptable ambiguity in label matching and 358 interpretation. For example, labels containing invisible ("zero- 359 width") characters may be permitted in context with characters whose 360 presentation forms are significantly changed by the presence or 361 absence of the zero-width characters, while other labels in which 362 zero-width characters appear may be rejected. 364 4.2.3.4. Labels Containing Characters Written Right to Left 366 Special tests are required for strings containing characters that are 367 normally written from right to left. The criteria for classifying 368 characters in terms of directionality are identified in the "Bidi" 369 document [IDNA2008-BIDI] in this series. That document also 370 describes conditions for strings that contain one or more of those 371 characters to be U-labels. The tests for those conditions, specified 372 there, are applied. Strings that contain right to left characters 373 but that do not conform to the IDNA Bidi rules MUST NOT be inserted 374 as labels in zone files. 376 4.2.4. Registration Validation Summary 378 Strings that contain at least one non-ASCII character, have been 379 produced by the steps above, whose contents pass all of the tests in 380 Section 4.2, and are 63 or fewer characters long in ACE form (see 381 Section 4.4), are U-labels. 383 To summarize, tests are made in Section 4.2 for invalid characters, 384 invalid combinations of characters, for labels that are invalid even 385 if the characters they contain are valid individually, and for labels 386 that do not conform to the restrictions for strings containing right 387 to left characters. 389 4.3. Registry Restrictions 391 Registries at all levels of the DNS, not just the top level, are 392 expected to establish policies about the labels that may be 393 registered, and for the processes associated with that action. While 394 exact policies are not specified as part of IDNA2008 and it is 395 expected that different registries may specify different policies, 396 there SHOULD be policies. Even a trivial policy (e.g., "anything can 397 be registered in this zone that can be represented as an A-label - 398 U-label pair") has value because it provides notice to users and 399 applications implementers that the registry cannot be relied upon to 400 provide even minimal user-protection restrictions. These per- 401 registry policies and restrictions are an essential element of the 402 IDNA registration protocol even for registries (and corresponding 403 zone files) deep in the DNS hierarchy. As discussed in 404 [IDNA2008-Rationale], such restrictions have always existed in the 405 DNS. That document also contains a discussion and recommendations 406 about possible types of rules. 408 The string produced by the above steps is checked and processed as 409 appropriate to local registry restrictions. Application of those 410 registry restrictions may result in the rejection of some labels or 411 the application of special restrictions to others. 413 4.4. Punycode Conversion 415 The resulting U-label is converted to an A-label. The A-label, more 416 precisely defined elsewhere, is the encoding of the U-label according 417 to the Punycode algorithm [RFC3492] with the ACE prefix "xn--" added 418 at the beginning of the string. The resulting string must, of 419 course, conform to the length limits imposed by the DNS. This 420 document updates RFC 3492 only to the extent of replacing the 421 reference to the discussion of the ACE prefix. The ACE prefix is now 422 specified in this document rather than as part of RFC 3490 or 423 Nameprep [RFC3491] but is the same in both sets of documents. 425 The failure conditions identified in the Punycode encoding procedure 426 cannot occur if the input is a U-label as determined by the steps 427 above. 429 4.5. Insertion in the Zone 431 The A-label is registered in the DNS by insertion into a zone. 433 5. Domain Name Lookup Protocol 435 Lookup is conceptually different from registration and different 436 tests are applied on the client. Although some validity checks are 437 necessary to avoid serious problems with the protocol (see 438 Section 5.5ff.), the lookup-side tests are more permissive and rely 439 on the assumption that names that are present in the DNS are valid. 440 That assumption is, however, a weak one because the presence of wild 441 cards in the DNS might cause a string that is not actually registered 442 in the DNS to be successfully looked up. 444 For convenience in description, we introduce an extra concept, a 445 "C-label", to describe a string that has the same appearance as an 446 A-label but that has been verified only to meet the somewhat more 447 flexible lookup requirements. 449 [[anchor14: Note in Draft: Try to reorganize and renumber Section 5 450 (Lookup) so that it exactly parallels Section 4 (Registration). This 451 has no been done in draft -10 because the task will be much easier if 452 the local mapping material is pulled from here (and there is no point 453 trying to align the section numbers twice).]] 455 5.1. Label String Input 457 The user supplies a string in the local character set, typically by 458 typing it or clicking on, or copying and pasting, a resource 459 identifier, e.g., a URI [RFC3986] or IRI [RFC3987] from which the 460 domain name is extracted. Alternately, some process not directly 461 involving the user may read the string from a file or obtain it in 462 some other way. Processing in this step and the next two are local 463 matters, to be accomplished prior to actual invocation of IDNA, but 464 at least the two steps in Section 5.2 and Section 5.3 must be 465 accomplished in some way. 467 5.2. Conversion to Unicode 469 The string is converted from the local character set into Unicode, if 470 it is not already Unicode. The exact nature of this conversion is 471 beyond the scope of this document, but may involve normalization 472 identical to that discussed in Section 4.1. The result MUST be a 473 Unicode string in NFC form. 475 5.3. Character Changes in Preprocessing or the User Interface 477 [[anchor15: Note in Draft -10. As of the time this draft was posted, 478 the WG was continuing to discuss various alternatives to this 479 section, which was pragmatic relative to various options and behavior 480 but that seems to make no one happy from a predictability or 481 transition standpoint. Please see the (temporary) first appendix to 482 this document for a first cut at possible alternate formulations.]] 484 The Unicode string MAY then be processed to prevent confounding of 485 user expectations. For instance, it might be reasonable, at this 486 step, to convert all upper case characters to lower case, if this 487 makes sense in the user's environment, but even this should be 488 approached with caution due to some edge cases: in the long term, it 489 is probably better for users to understand IDNs strictly in lower- 490 case, U-label, form. More generally, preprocessing may be useful to 491 smooth the transition from IDNA2003, especially for direct user 492 input, but with similar cautions. In general, IDNs appearing in 493 files and those transmitted across the network as part of protocols 494 are expected to be in either ASCII form (including A-labels) or to 495 contain U-labels, rather than being in forms requiring mapping or 496 other conversions. 498 Other examples of processing for localization might be applied, 499 especially to direct user input, at this point. They include 500 interpreting various characters as separating domain name components 501 from each other (label separators) because they either look like 502 periods or are used to separate sentences, mapping halfwidth or 503 fullwidth East Asian characters to the common form permitted in 504 labels, or giving special treatment to characters whose presentation 505 forms are dependent only on placement in the label. Such 506 localization changes are also outside the scope of this 507 specification. 509 Recommendations for preprocessing for global contexts (i.e., when 510 local considerations do not apply or cannot be used) and for maximum 511 interoperability with labels that might have been specified under 512 liberal readings of IDNA2003 are given in [IDNA2008-Rationale]. It 513 is important to note that the intent of these specifications is that 514 labels in application protocols, files, or links are intended to be 515 in U-label or A-label form. Preprocessing MUST NOT map a character 516 that is valid in a label as specified elsewhere in this document or 517 in [IDNA2008-Tables] into another character. Excessively liberal use 518 of preprocessing, especially to strings stored in files, poses a 519 threat to consistent and predictable behavior for the user even if 520 not to actual interoperability. 522 Because these transformations are local, it is important that domain 523 names that might be passed between systems (e.g., in IRIs) be 524 U-labels or A-labels and not forms that might be accepted locally as 525 a consequence of this step. This step is not standardized as part of 526 IDNA, and is not further specified here. 528 5.4. A-label Input 530 If the input to this procedure appears to be an A-label (i.e., it 531 starts in "xn--"), the lookup application MAY attempt to convert it 532 to a U-label and apply the tests of Section 5.5 and the conversion of 533 Section 5.6 to that form. If the label is converted to Unicode 534 (i.e., to U-label form) using the Punycode decoding algorithm, then 535 the processing specified in those two sections MUST be performed, and 536 the label MUST be rejected if the resulting label is not identical to 537 the original. See the Name Server Considerations section of 538 [IDNA2008-Rationale] for additional discussion on this topic. 540 That conversion and testing SHOULD be performed if the domain name 541 will later be presented to the user in native character form (this 542 requires that the lookup application be IDNA-aware). If those steps 543 are not performed, the lookup process SHOULD at least make tests to 544 determine that the string is actually an A-label, examining it for 545 the invalid formats specified in the Punycode decoding specification. 546 Applications that are not IDNA-aware will obviously omit that 547 testing; others MAY treat the string as opaque to avoid the 548 additional processing at the expense of providing less protection and 549 information to users. 551 5.5. Validation and Character List Testing 553 As with the registration procedure described in Section 4, the 554 Unicode string is checked to verify that all characters that appear 555 in it are valid as input to IDNA lookup processing. As discussed 556 above and in [IDNA2008-Rationale], the lookup check is more liberal 557 than the registration one. Putative labels with any of the following 558 characteristics MUST BE rejected prior to DNS lookup: 560 o Labels containing code points that are unassigned in the version 561 of Unicode being used by the application, i.e.,in the UNASSIGNED 562 category of [IDNA2008-Tables]. 564 o Labels that are not in NFC form as defined in [Unicode-UAX15]. 566 o Labels containing prohibited code points, i.e., those that are 567 assigned to the "DISALLOWED" category in the permitted character 568 table [IDNA2008-Tables]. 570 o Labels containing code points that are identified in 571 [IDNA2008-Tables] as "CONTEXTJ", i.e., requiring exceptional 572 contextual rule processing on lookup, but that do not conform to 573 that rule. Note that this implies that a rule much be defined, 574 not null: a character that requires a contextual rule but for 575 which the rule is null is treated in this step as having failed to 576 conform to the rule. 578 o Labels containing code points that are identified in 579 [IDNA2008-Tables] as "CONTEXTO", but for which no such rule 580 appears in the table of rules. Applications resolving DNS names 581 or carrying out equivalent operations are not required to test 582 contextual rules for "CONTEXTO" characters, only to verify that a 583 rule is defined (although they MAY make such tests to give better 584 information to the user). 586 o Labels whose first character is a combining mark (see 587 Section 4.2.3.2. 589 In addition, the application SHOULD apply the following test. The 590 test may be omitted in special circumstances, such as when the lookup 591 application knows that the conditions are enforced elsewhere, because 592 an attempt to look up and resolve such strings will almost certainly 593 lead to a DNS lookup failure except when wildcards are present in the 594 zone. However, applying the test is likely to give much better 595 information about the reason for a lookup failure -- information that 596 may be usefully passed to the user when that is feasible -- than DNS 597 resolution failure information alone. In any event, lookup 598 applications should avoid attempting to resolve labels that are 599 invalid under that test. 601 o Verification that the string is compliant with the requirements 602 for right to left characters, specified in [IDNA2008-BIDI]. 604 For all other strings, the lookup application MUST rely on the 605 presence or absence of labels in the DNS to determine the validity of 606 those labels and the validity of the characters they contain. If 607 they are registered, they are presumed to be valid; if they are not, 608 their possible validity is not relevant. A lookup application that 609 declines to process a string that conforms to the rules above and 610 does not look it up in the DNS is not in conformance with this 611 protocol. 613 5.6. Punycode Conversion 615 The validated string, an apparent U-label, is converted to an 616 apparent A-label using the Punycode algorithm with the ACE prefix 617 added. These label forms are "apparent" U-labels and A-labels 618 because not all of the tests used in the Registration procedure 619 (Section 4) to effectively define those terms precisely are applied 620 in this lookup procedure. 621 [[anchor16: Note in Draft: As of -10, we are back to "apparent" (or 622 "putative" if the WG prefers) label forms. The previous text 623 asserted that these strings were A-labels and U-labels, which was 624 clearly wrong, since those terms are defined in terms of complete 625 validity and all of the registration tests. Mark suggested an 626 alternative, which was to introduce a new term, C-label, which was a 627 superset of A-labels but with fewer test conditions. I like the 628 idea, but could not figure out how to make it work without also 629 introducing a near-U-label term, and that started to become much too 630 terminology heavy to be followed easily. Suggestions of ways out of 631 this, preferably with specific text for this document and Defs, would 632 be welcome.]] 634 5.7. DNS Name Resolution 636 The resulting string (the apparent A-label) is looked up in the DNS, 637 using normal DNS resolver procedures. 639 6. Security Considerations 641 Security Considerations for this version of IDNA, except for the 642 special issues associated with right to left scripts and characters, 643 are described in [IDNA2008-Defs]. Specific issues for labels 644 containing characters associated with scripts written right to left 645 appear in [IDNA2008-BIDI]. 647 7. IANA Considerations 649 IANA actions for this version of IDNA are specified in 650 [IDNA2008-Tables] and discussed informally in [IDNA2008-Rationale]. 651 The components of IDNA described in this document do not require any 652 IANA actions. 654 8. Contributors 656 While the listed editor held the pen, the original versions of this 657 document represent the joint work and conclusions of an ad hoc design 658 team consisting of the editor and, in alphabetic order, Harald 659 Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document 660 draws significantly on the original version of IDNA [RFC3490] both 661 conceptually and for specific text. This second-generation version 662 would not have been possible without the work that went into that 663 first version and its authors, Patrik Faltstrom, Paul Hoffman, and 664 Adam Costello. While Faltstrom was actively involved in the creation 665 of this version, Hoffman and Costello were not and should not be held 666 responsible for any errors or omissions. 668 9. Acknowledgments 670 This revision to IDNA would have been impossible without the 671 accumulated experience since RFC 3490 was published and resulting 672 comments and complaints of many people in the IETF, ICANN, and other 673 communities, too many people to list here. Nor would it have been 674 possible without RFC 3490 itself and the efforts of the Working Group 675 that defined it. Those people whose contributions are acknowledged 676 in RFC 3490, [RFC4690], and [IDNA2008-Rationale] were particularly 677 important. 679 Specific textual changes were incorporated into this document after 680 suggestions from the other contributors, Stephane Bortzmeyer, Vint 681 Cerf, Mark Davis, Paul Hoffman, Kent Karlsson, Erik van der Poel, 682 Marcos Sanz, Andrew Sullivan, Ken Whistler, and other WG 683 participants. Special thanks are due to Paul Hoffman for permission 684 to extract material from his Internet-Draft to form the basis for 685 Appendix B 687 10. References 688 10.1. Normative References 690 [IDNA2008-BIDI] 691 Alvestrand, H. and C. Karp, "An updated IDNA criterion for 692 right-to-left scripts", July 2008, . 695 [IDNA2008-Defs] 696 Klensin, J., "Internationalized Domain Names for 697 Applications (IDNA): Definitions and Document Framework", 698 February 2009, . 701 [IDNA2008-Tables] 702 Faltstrom, P., "The Unicode Codepoints and IDNA", 703 July 2008, . 706 A version of this document is available in HTML format at 707 http://stupid.domain.name/idnabis/ 708 draft-ietf-idnabis-tables-02.html 710 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 711 STD 13, RFC 1034, November 1987. 713 [RFC1035] Mockapetris, P., "Domain names - implementation and 714 specification", STD 13, RFC 1035, November 1987. 716 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application 717 and Support", STD 3, RFC 1123, October 1989. 719 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 720 Requirement Levels", BCP 14, RFC 2119, March 1997. 722 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 723 for Internationalized Domain Names in Applications 724 (IDNA)", RFC 3492, March 2003. 726 [Unicode-PropertyValueAliases] 727 The Unicode Consortium, "Unicode Character Database: 728 PropertyValueAliases", March 2008, . 731 [Unicode-RegEx] 732 The Unicode Consortium, "Unicode Technical Standard #18: 733 Unicode Regular Expressions", May 2005, 734 . 736 [Unicode-Scripts] 737 The Unicode Consortium, "Unicode Standard Annex #24: 738 Unicode Script Property", February 2008, 739 . 741 [Unicode-UAX15] 742 The Unicode Consortium, "Unicode Standard Annex #15: 743 Unicode Normalization Forms", 2006, 744 . 746 10.2. Informative References 748 [ASCII] American National Standards Institute (formerly United 749 States of America Standards Institute), "USA Code for 750 Information Interchange", ANSI X3.4-1968, 1968. 752 ANSI X3.4-1968 has been replaced by newer versions with 753 slight modifications, but the 1968 version remains 754 definitive for the Internet. 756 [IDNA2008-Rationale] 757 Klensin, J., Ed., "Internationalizing Domain Names for 758 Applications (IDNA): Issues, Explanation, and Rationale", 759 February 2009, . 762 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, 763 "Dynamic Updates in the Domain Name System (DNS UPDATE)", 764 RFC 2136, April 1997. 766 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 767 Specification", RFC 2181, July 1997. 769 [RFC2535] Eastlake, D., "Domain Name System Security Extensions", 770 RFC 2535, March 1999. 772 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", 773 RFC 2671, August 1999. 775 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 776 "Internationalizing Domain Names in Applications (IDNA)", 777 RFC 3490, March 2003. 779 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 780 Profile for Internationalized Domain Names (IDN)", 781 RFC 3491, March 2003. 783 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 784 Resource Identifier (URI): Generic Syntax", STD 66, 785 RFC 3986, January 2005. 787 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 788 Identifiers (IRIs)", RFC 3987, January 2005. 790 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 791 Recommendations for Internationalized Domain Names 792 (IDNs)", RFC 4690, September 2006. 794 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for 795 Internationalized Email", RFC 4952, July 2007. 797 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 798 5.0", 2007. 800 Boston, MA, USA: Addison-Wesley. ISBN 0-321-48091-0 802 Appendix A. Local Mapping Alternatives 804 The subsections of this appendix are temporary and represent 805 different sketches of possible replacements for Section 5.3. They do 806 not represent an assertion of WG consensus or any assertion about the 807 possibility of including one of them as part of the WG's work 808 program. Instead, they are supplied only for purposes of comparison, 809 discussion, and, should it be relevant, refinement. 811 The first paragraph of each subsection describes how the material 812 would be placed relative to the existing main document text. 813 Subsequent paragraphs are the actual suggestions, although in 814 incomplete sketch form. 816 A.1. Transitional Mapping Model 818 If this subsection were adopted, Section 5.3 would be deleted and 819 this one would be inserted after, or integrated with, Section 5.7. 821 This specification does not support the extensive mappings from one 822 character to another, including Unicode Case Folding and 823 Compatibility Character mapping, of IDNA2003. It also changes the 824 interpretations of a small number of characters relative to IDNA2003. 825 Most applications, especially those with which IDNs have been used 826 for some time, will need to maintain reasonable compatibility with 827 files created under IDNA2003 and user interfaces designed for it. 828 This section specifies additional steps to be taken to provide 829 maximum IDNA2003 compatibility. 831 If an application requires IDNA2003 backward compatibility, it MUST 832 execute one of the two bulleted steps below. 834 o If the resolution attempt in Section 5.7 fails, the apparent 835 U-label is processed through the ToASCII operation specified in 836 IDNA2003 [RFC3490] and, if the two apparent A-labels are not 837 identical, the result is looked up. If it is found, the relevant 838 values are handled as if the resolution attempt in Section 5.7 had 839 succeeded with that value. If the resolution attempt in 840 Section 5.7 is successful, this step simply produces that value. 842 o Once the resolution attempt in Section 5.7 is completed, the 843 apparent U-label is processed through the ToASCII operation 844 specified in IDNA2003 [RFC3490]. The two apparent A-labels are 845 compared to each other. If they are not identical, the second one 846 is looked up as well. If one of the two lookups is successful and 847 the other is not, that value is used as the result of the lookup. 848 If both are successful, the user is presented with a choice. If 849 neither is successful, the IDNA lookup fails. 851 Note that, if both interpretations of the name return values, the 852 lookup application has no practical way to tell whether the relevant 853 registry has applied "variant" or "bundling" techniques to ensure 854 that both domain name are under the same control or not. From that 855 perspective, the first of these approaches assumes that has been done 856 (if the IDNA2003-interpretation label is present at all) while the 857 second assumes that such bundling is unlikely to have occurred. 858 [[anchor24: Note in Draft: If this appendix is used, RFC3490 must be 859 moved from Informative to Normative.]] 861 A.2. Internationalized Resource Identifier (IRI) Mapping Model 863 This subsection is intended to be descriptive of an approach that 864 lies outside IDNA, rather than a normative component of it. If it 865 were adopted, Section 5.3 would be deleted and the material below 866 would be referenced, either as a non-normative Appendix in Protocol 867 or, more reasonably, as a section of Rationale. 869 IDNA2003 supported extensive mappings from one character to another, 870 including Unicode Case Folding and Compatibility Character mapping. 871 Those mappings are no longer supported on registration and are 872 inconsistent with the "exact match" lookups that people expect from 873 the DNS. Some mapping should still be supported, both for 874 compatibility with applications that assume IDNA2003 and to avoid 875 confounding user expectations. The specific mappings involved are 876 not part of IDNA, but are expected to be specified as part of a 877 revision to the IRI specification [RFC3987] and the conversion from 878 IRI form to URI form. That change leaves mapping unspecified and 879 prohibited for actual domain names, however, in practice, most domain 880 names, especially in the web applications that appear to have been 881 most important for IDNs between the publication of IDNA2003 and the 882 release of this specification, are not interpreted as themselves but 883 as abbreviated form of URIs or IRIs and hence subject to the 884 transformation rules of the latter. 886 Appendix B. Summary of Major Changes from IDNA2003 888 1. Update base character set from Unicode 3.2 to Unicode version- 889 agnostic. 891 2. Separate the definitions for the "registration" and "lookup" 892 activities. 894 3. Disallow symbol and punctuation characters except where special 895 exceptions are necessary. 897 4. Remove the mapping and normalization steps from the protocol and 898 have them instead done by the applications themselves, possibly 899 in a local fashion, before invoking the protocol. 901 5. Change the way that the protocol specifies which characters are 902 allowed in labels from "humans decide what the table of 903 codepoints contains" to "decision about codepoints are based on 904 Unicode properties plus a small exclusion list created by 905 humans". 907 6. Introduce the new concept of characters that can be used only in 908 specific contexts. 910 7. Allow typical words and names in languages such as Dhivehi and 911 Yiddish to be expressed. 913 8. Make bidirectional domain names (delimited strings of labels, 914 not just labels standing on their own) display in a less 915 surprising fashion whether they appear in obvious domain name 916 contexts or as part of running text in paragraphs. 918 9. Remove the dot separator from the mandatory part of the 919 protocol. 921 10. Make some currently-valid labels that are not actually IDNA 922 labels invalid. 924 Appendix C. Change Log 926 [[anchor27: RFC Editor: Please remove this appendix.]] 928 C.1. Changes between Version -00 and -01 of draft-ietf-idnabis-protocol 930 o Corrected discussion of SRV records. 932 o Several small corrections for clarity. 934 o Inserted more "open issue" placeholders. 936 C.2. Version -02 938 o Rewrote the "conversion to Unicode" text in Section 5.2 as 939 requested on-list. 941 o Added a comment (and reference) about EDNS0 to the "DNS Server 942 Conventions" section, which was also retitled. 944 o Made several editorial corrections and improvements in response to 945 various comments. 947 o Added several new discussion placeholder anchors and updated some 948 older ones. 950 C.3. Version -03 952 o Trimmed change log, removing information about pre-WG drafts. 954 o Incorporated a number of changes suggested by Marcos Sanz in his 955 note of 2008.07.17 and added several more placeholder anchors. 957 o Several minor editorial corrections and improvements. 959 o "Editor" designation temporarily removed because the automatic 960 posting machinery does not accept it. 962 C.4. Version -04 964 o Removed Contextual Rule appendices for transfer to Tables. 966 o Several changes, including removal of discussion anchors, based on 967 discussions at IETF 72 (Dublin) 969 o Rewrote the preprocessing material (Section 5.3) somewhat. 971 C.5. Version -05 973 o Updated part of the A-label input explanation (Section 5.4) per 974 note from Erik van der Poel. 976 C.6. Version -06 978 o Corrected a few typographical errors. 980 o Incorporated the material (formerly in Rationale) on the 981 relationship between IDNA2003 and IDNA2008 as an appendix and 982 pointed to the new definitions document. 984 o Text modified in several places to recognize the dangers of 985 interaction between DNS wildcards and IDNs. 987 o Text added to be explicit about the handling of edge and failure 988 cases in Punycode encoding and decoding. 990 o Revised for consistency with the new Definitions document and to 991 make the text read more smoothly. 993 C.7. Version -07 995 o Multiple small textual and editorial changes and clarifications. 997 o Requirement for normalization clarified to apply to all cases and 998 conditions for preprocessing further clarified. 1000 o Substantive change to Section 4.2.1, turning a SHOULD to a MUST 1001 (see note from Mark Davis, 19 November, 2008 18:14 -0800). 1003 C.8. Version -08 1005 o Added some references and altered text to improve clarity. 1007 o Changed the description of CONTEXTJ/CONTEXTO to conform to that in 1008 Tables. In other words, these are now treated as distinction 1009 categories (again), rather than as specially-flagged subsets of 1010 PROTOCOL VALID. 1012 o The discussion of label comparisons has been rewritten to make it 1013 more precise and to clarify that one does not need to verify that 1014 a string is a [valid] A-label or U-label in order to test it for 1015 equality with another string. The WG should verify that the 1016 current text is what is desired. 1018 o Other changes to reflect post-IETF discussions or editorial 1019 improvements. 1021 C.9. Version -09 1023 o Removed Security Considerations material to Defs document. 1025 o Removed the Name Server Considerations material to Rationale. 1026 That material is not normative and not needed to implement the 1027 protocol itself. 1029 o Adjusted terminology to match new version of Defs. 1031 o Removed all discussion of local mapping and option for it from 1032 registration protocol. 1034 o Removed some old placeholders and inquiries because no comments 1035 have been received. 1037 o Small editorial corrections. 1039 C.10. Version -10 1041 o Rewrote the registration input material slightly to further 1042 clarify the "no mapping on registration" principle. 1044 o Added placeholder notes about several tasks, notably reorganizing 1045 Section 4 and Section 5 so that subsection numbers are parallel. 1047 o Cleaned up an incorrect use of the terms "A-label" and "U-label" 1048 in the lookup phase that was spotted by Mark Davis. Inserted a 1049 note there about alternate ways to deal with the resulting 1050 terminology problem. 1052 o Added a temporarily appendix (above) to document alternate 1053 strategies for possible replacements for Section 5.3. 1055 Author's Address 1057 John C Klensin 1058 1770 Massachusetts Ave, Ste 322 1059 Cambridge, MA 02140 1060 USA 1062 Phone: +1 617 245 1457 1063 Email: john+ietf@jck.com