idnits 2.17.1 draft-iab-idn-encoding-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (July 6, 2009) is 5407 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '10646' on line 266 == Missing Reference: 'BCP9' is mentioned on line 271, but not defined == Unused Reference: 'RFC2119' is defined on line 647, but no explicit reference was found in the text == Unused Reference: 'RFC2492' is defined on line 660, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-cheshire-dnsext-multicastdns-07 == Outdated reference: A later version (-02) exists of draft-ietf-idn-punycode-00 == Outdated reference: A later version (-06) exists of draft-skwan-utf8-dns-00 -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group D. Thaler 3 Internet-Draft Microsoft 4 Intended status: Informational July 6, 2009 5 Expires: January 7, 2010 7 IAB Thoughts on Encodings for Internationalized Domain Names 8 draft-iab-idn-encoding-00.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on January 7, 2010. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. 44 Abstract 46 This document explores issues with Internationalized Domain Names 47 (IDNs) that result from the use of various encoding schemes such as 48 Punycode and UTF-8. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 54 2. Use of Non-DNS Protocols . . . . . . . . . . . . . . . . . . . 7 55 3. Use of Non-ASCII in DNS . . . . . . . . . . . . . . . . . . . 8 56 4. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 12 57 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 58 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 59 7. IAB Members at the time of this writing . . . . . . . . . . . 14 60 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 61 8.1. Normative References . . . . . . . . . . . . . . . . . . . 15 62 8.2. Informative References . . . . . . . . . . . . . . . . . . 15 63 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16 65 1. Introduction 67 The goal of this document is to explore what can be learned from the 68 current difficulties in implementing Internationalized Domain Names 69 (IDNs). Although some elements of this exploration may immediately 70 feed back into current IETF work it is explicitly not the intention 71 for this document to influence any current working group charter. 73 An Internationalized Domain Name (IDN) is a name that contains one or 74 more non-ASCII characters. An IDN can be encoded in various ways. 76 Punycode [RFC3492] is a mechanism for encoding a Unicode string in 77 ASCII characters using only letters, digits, and hypens. When an IDN 78 is encoded with Punycode, it is prefixed with "xn--", which assumes 79 that ASCII names do not start with this prefix. While this 80 assumption is not necessarily true, taking this limitation is seen to 81 be acceptable. 83 The term "ToASCII" refers to the combination of a non-reversible 84 character mapping operation (e.g., converting upper case characters 85 to lower case characters), plus a reversible Unicode-to-Punycode 86 conversion. Similarly, the term "ToUnicode" refers to the 87 combination of a non-reversible character mapping operation, plus a 88 reversible Punycode-to-Unicode conversion. 90 ISO-2022-JP [RFC1468] is a mechanism for encoding a string of ASCII 91 and Japanese characters, where an ASCII character is preserved as-is. 93 UTF-8 [RFC3629] is a mechanism for encoding a Unicode character in a 94 variable number of 8-bit octets, where an ASCII character is 95 preserved as-is. A UTF-8 string is thus a string of UTF-8 96 characters. 98 UTF-16 [RFC2781] is a mechanism for encoding a Unicode character in 99 one or two 16-bit integers. A UTF-16 string is thus a string of 100 UTF-16 characters. 102 UTF-32 (formerly UCS-4) ([UNICODE] section 3.10) is a mechanism for 103 encoding a Unicode character in a single 32-bit integer. A UTF-32 104 string is thus a string of UTF-32 characters. 106 Different applications, APIs, and protocols use different encoding 107 schemes today. Historically, many of them were originally defined to 108 use only ASCII. Internationalizing Domain Names in Applications 109 (IDNA) [RFC3490] defined a mechanism that required changes to 110 applications, but not APIs or servers, and specifies that Punycode is 111 to be used. 113 [RFC3490] section 1.3 states: 115 The IDNA protocol is contained completely within applications. It 116 is not a client-server or peer-to-peer protocol: everything is 117 done inside the application itself. When used with a DNS resolver 118 library, IDNA is inserted as a "shim" between the application and 119 the resolver library. When used for writing names into a DNS 120 zone, IDNA is used just before the name is committed to the zone. 122 Figure 1 depicts a simplistic architecture that a naive reader might 123 assume from the paragraph quoted above. (A variant of this same 124 picture appears in [RFC3490] section 6, further strengthening this 125 assumption.) 127 +-----------------------------------------+ 128 |Host | 129 | +-------------+ | 130 | | Application | | 131 | +------+------+ | 132 | | | 133 | +----+----+ | 134 | | DNS | | 135 | | Resolver| | 136 | | Library | | 137 | +----+----+ | 138 | | | 139 +-----------------------------------------+ 140 | 141 _________|_________ 142 / \ 143 / \ 144 / \ 145 | Internet | 146 \ / 147 \ / 148 \___________________/ 150 Simplistic Architecture 152 Figure 1 154 There are, however, two problems with this simplistic architecture 155 that cause it to differ from reality. 157 First, resolver APIs on OS's today (MacOS, Windows, Linux, etc.) are 158 not DNS-specific. They typically provide a layer of indirection so 159 that the application can work independent of the name resolution 160 mechanism, which could be DNS, mDNS 162 [I-D.cheshire-dnsext-multicastdns], LLMNR [RFC4795], NetBIOS-over-TCP 163 [RFC1001][RFC1002], etc/hosts file [RFC0952], NIS [NIS], or anything 164 else. For example, RFC 3493 [RFC3493] specifies the getaddrinfo() 165 API and contains many phrases like "For example, when using the DNS" 166 and "any type of name resolution service (for example, the DNS)". 167 Importantly, DNS is mentioned only as an example, and the application 168 has no knowledge as to whether DNS or some other protocol will be 169 used. 171 Second, even with the DNS protocol, private name spaces (sometimes 172 referred to as "split DNS"), do not necessarily use the same 173 character set encoding scheme as the public Internet name space. 175 We will discuss each of the above issues in subsequent sections. For 176 reference, Figure 2 depicts a more realistic architecture on typical 177 hosts today. More generally, the host may be multi-homed to one or 178 more local networks, each of which may or may not be connected to the 179 public Internet and may or may not have a private name space. 181 +-----------------------------------------+ 182 |Host | 183 | +-------------+ | 184 | | Application | | 185 | +------+------+ | 186 | | | 187 | +------+------+ | 188 | | Sockets | | 189 | | Library | | 190 | +------+------+ | 191 | | | 192 | +-----+------+---+--+-------+-----+ | 193 | | | | | | | | 194 | +-+-++--+--++--+-++---+---++--+--++-+-+ | 195 | |DNS||LLMNR||mDNS||NetBIOS||hosts||...| | 196 | +---++-----++----++-------++-----++---+ | 197 | | 198 +-----------------------------------------+ 199 | 200 ______|______ 201 / \ 202 / \ 203 / local \ 204 \ network / 205 \ / 206 \_____________/ 207 | 208 _________|_________ 209 / \ 210 / \ 211 / \ 212 | Internet | 213 \ / 214 \ / 215 \___________________/ 217 Realistic Architecture 219 Figure 2 221 1.1. APIs 223 [RFC3490] section 6.2 states: 225 It is expected that new versions of the resolver libraries in the 226 future will be able to accept domain names in other charsets than 227 ASCII, and application developers might one day pass not only 228 domain names in Unicode, but also in local script to a new API for 229 the resolver libraries in the operating system. Thus the ToASCII 230 and ToUnicode operations might be performed inside these new 231 versions of the resolver libraries. 233 Resolver APIs such as getaddrinfo() and its predecessor 234 gethostbyname() were defined to accept "char *" arguments, meaning 235 they accept a string of bytes, terminated with a NULL (0) byte. This 236 is sufficient for ASCII strings, Punycode strings, and even 237 ISO-2022-JP and UTF-8 strings (unless an implementation artificially 238 precludes them), but not UTF-16 or UTF-32 strings. Several operating 239 systems historically used in Japan will accept (and expect) 240 ISO-2022-JP strings in such APIs. Some platforms used worldwide also 241 have new versions of the APIs (e.g., GetAddrInfoW() on Windows) that 242 accept other encoding schemes such as UTF-16. 244 It is worth noting that an API using "char *" arguments can 245 distinguish between ASCII, Punycode, ISO-2022-JP, and UTF-8 strings 246 as follows: 247 o if the string contains an ESC (0x1B) byte the string is 248 ISO-2022-JP; otherwise, 249 o if any byte in the string has the high bit set, the string is 250 UTF-8; otherwise, 251 o if the string starts with "xn--" then it is Punycode; otherwise, 252 o the string is ASCII. 253 Again this assumes that ASCII names never start with "xn--", and also 254 that UTF-8 strings never contain an ESC character. 256 2. Use of Non-DNS Protocols 258 As noted earlier, typical name resolution libraries are not DNS- 259 specific. Furthermore, some protocols are defined to use encoding 260 schemes other than Punycode. For example, mDNS specifies that UTF-8 261 be used. Indeed, the IETF policy on character sets and languages 262 [RFC2277] states: 264 Protocols MUST be able to use the UTF-8 charset, which consists of 265 the ISO 10646 coded character set combined with the UTF-8 266 character encoding scheme, as defined in [10646] Annex R 267 (published in Amendment 2), for all text. Protocols MAY specify, 268 in addition, how to use other charsets or other character encoding 269 schemes for ISO 10646, such as UTF-16, but lack of an ability to 270 use UTF-8 is a violation of this policy; such a violation would 271 need a variance procedure ([BCP9] section 9) with clear and solid 272 justification in the protocol specification document before being 273 entered into or advanced upon the standards track. For existing 274 protocols or protocols that move data from existing datastores, 275 support of other charsets, or even using a default other than 276 UTF-8, may be a requirement. This is acceptable, but UTF-8 277 support MUST be possible. 279 Applications that convert an IDN to Punycode before calling 280 getaddrinfo() will result in name resolution failures if the Punycode 281 name is directly used in such protocols. Having libraries or 282 protocols to convert from Punycode to the encoding scheme defined by 283 the protocol (e.g., UTF-8) would require changes to APIs and/or 284 servers, which IDNA was intended to avoid. 286 As a result, applications that assume that non-ASCII names are 287 resolved using the public DNS and blindly convert them to Punycode 288 without knowledge of what protocol will be selected by the name 289 resolution library have problems. Furthermore, name resolution 290 libraries often try multiple protocols, until one succeeds, because 291 they are defined to use a common name space. For example, the hosts 292 file ([RFC0952] and [RFC1123] section 2.1), DNS ([RFC1034] section 293 2.1), and NetBIOS-over-TCP ([RFC1001] section 11.1.1) are all defined 294 by RFC to be able to share a common name space. This means that when 295 an application passes a name to be resolved, resolution may in fact 296 be attempted using multiple protocols, each with a potentially 297 different encoding scheme. For this to work successfully, the name 298 must be converted to the appropriate encoding scheme only after the 299 choice is made to use that protocol. In general, this cannot be done 300 by the application since the choice of protocol is not made by the 301 application. 303 3. Use of Non-ASCII in DNS 305 A common misconception is that DNS only supports names that can be 306 expressed using letters, digits, and hyphens. 308 This misconception originally stemed from the definition of an 309 "Internet host name" in [RFC0952], published in 1985, which defines 310 the use of the hosts file. An Internet host name was defined therein 311 as including only letters, digits, and hyphens, where upper and lower 312 case letters were to be treated as identical. For DNS, [RFC1034] 313 section 3.5 entitled "Preferred name syntax" then repeated this 314 definition in 1987, saying that this "syntax will result in fewer 315 problems with many applications that use domain names (e.g., mail, 316 TELNET)". 318 The confusion was thus left as to whether the "preferred" name syntax 319 was a mandatory restriction, or merely "preferred". 321 In 1989, [RFC1123] section 2.1 updated the definition of an Internet 322 host name as defined in [RFC0952], to allow starting with a digit (to 323 support IPv4 addresses in dotted-decimal form). Section 6.1 of that 324 RFC discusses the use of DNS (and the hosts file) for resolving host 325 names to IP addresses and vice versa. This led to confusion as to 326 whether all names in DNS are "host names", or whether a "host name" 327 is merely a special case of a DNS name. 329 By 1997, things had progressed to a state where it was necessary to 330 clarify these areas of confusion. "Clarifications to the DNS 331 Specification" [RFC2181] section 11 clarifies: 333 The DNS itself places only one restriction on the particular 334 labels that can be used to identify resource records. That one 335 restriction relates to the length of the label and the full name. 336 The length of any one label is limited to between 1 and 63 octets. 337 A full domain name is limited to 255 octets (including the 338 separators). The zero length full name is defined as representing 339 the root of the DNS tree, and is typically written and displayed 340 as ".". Those restrictions aside, any binary string whatever can 341 be used as the label of any resource record. Similarly, any 342 binary string can serve as the value of any record that includes a 343 domain name as some or all of its value (SOA, NS, MX, PTR, CNAME, 344 and any others that may be added). Implementations of the DNS 345 protocols must not place any restrictions on the labels that can 346 be used. 348 Hence, it clarified that the restriction to letters, digits, and 349 hyphens does not apply to DNS names in general, nor to records that 350 include "domain names". Hence the "preferred" name syntax specified 351 in [RFC1034] is indeed merely "preferred", not mandatory. 353 Since there is no restriction even to ASCII, let alone letter-digit- 354 hyphen use, DNS is in conformance with the requirement in [RFC2277] 355 to allow UTF-8. 357 However, this requirement is complicated by the fact that in an 8-bit 358 clean protocol, one has to have some way of knowing whether a binary 359 string is encoded in UTF-8, UTF-16, UTF-32, or some other encoding. 361 While implementations of the DNS protocol must not place any 362 restrictons on the labels that can be used, applications that use the 363 DNS are free to impose whatever restrictions they like, and many 364 have. The above rules permit a domain name label that contains 365 unusual characters, such as embedded spaces which many applications 366 would consider a bad idea. For example, the SMTP protocol [RFC5321] 367 originally constrained the character set usable in email addresses 368 and now has an effort underway to extend SMTP to support email 369 address internationalization. 371 Shortly after [RFC2181] and [RFC2277] were written, the need for 372 internationalized names within private name spaces (i.e., within 373 enterprises) arose. The current (and past, predating Punycode) 374 practice within enterprises that support other languages is to put 375 UTF-8 names in their internal DNS servers in a private name space. 376 For example, [I-D.skwan-utf8-dns-00] was first written in 1997, and 377 was then widely deployed in Windows. The use of UTF-8 names in DNS 378 was similarly implemented and deployed in MacOS. Within a private 379 name space, and especially in light of [RFC2277], it was reasonable 380 to assume within a private name space that binary strings were 381 encoded in UTF-8. 383 [EDITOR'S NOTE: There are also normalization/mapping issues which the 384 next version of this document may explore. Currently we only explore 385 encoding issues.] 387 Five years after UTF-8 was already in use in private name spaces in 388 DNS, Punycode began to be developed ([I-D.ietf-idn-punycode-00] began 389 in 2002, culminating in the publication of [RFC3492] in 2003) for use 390 in the public DNS name space. This publication thus resulted in 391 having to use different encodings for different name spaces (where 392 UTF-8 for private name spaces was already deployed). Hence, 393 referring back to Figure 2, a different encoding scheme may be in use 394 on the Internet vs. a local network. 396 In general a host may be connected to zero or more networks using 397 private name spaces, plus potentially the public name space. 398 Applications that convert an IDN to Punycode before calling 399 getaddrinfo() will result in name resolution failures if the name is 400 actually registered in a private name space in some other encoding 401 (e.g., UTF-8). Having libraries or protocols convert from Punycode 402 to the encoding used by a private name space (e.g., UTF-8) would 403 require changes to APIs and/or servers, which IDNA was intended to 404 avoid. 406 Some examples of cases that can happen in existing implementations 407 today (where {non-ASCII} below represents some user-entered non-ASCII 408 string) are: 409 1. User types in {non-ASCII}.{non-ASCII}.com, and the application 410 passes it, in the form of a UTF-8 string, to getaddrinfo or 411 gethostbyname or equivalent. 412 * The DNS resolver passes the (UTF-8) string unmodified to a DNS 413 server. 414 2. User types in {non-ASCII}.{non-ASCII}.com, and the application 415 passes it to a name resolution API that accepts strings in some 416 other encoding such as UTF-16, e.g., GetAddrInfoW on Windows. 418 * The name resolution API decides to pass the string to DNS (and 419 possibly other protocols). 420 * The DNS resolver converts the name from UTF-16 to UTF-8 and 421 passes the query to a DNS server. 422 3. User types in {non-ASCII}.{non-ASCII}.com, but the application 423 first converts it to Punycode such that the name that is passed 424 to name resolution APIs is (say) xn--e1afmkfd.xn-- 425 80akhbyknj4f.com. 426 * The name resolution API decides to pass the string to DNS (and 427 possibly other protocols). 428 * The DNS resolver passes the string unmodified to a DNS server. 429 * If the name is not found in DNS, the name resolution API 430 decides to try another protocol, say mDNS. 431 * The query goes out in mDNS, but since mDNS specified that 432 names are to be registered in UTF-8, the name isn't found 433 since it was Punycode encoded in the query. 434 4. User types in {non-ASCII}, and the application passes it, in the 435 form of a UTF-8 string, to getaddrinfo or equivalent. 436 * The name resolution API decides to pass the string to DNS (and 437 possibly other protocols). 438 * The DNS resolver will append suffixes in the suffix search 439 list, which may contain UTF-8 characters if the local network 440 uses a private name space. 441 * Each FQDN in turn will then be sent in a query to a DNS 442 server, until one succeeds. 443 5. User types in {non-ASCII}, but the application first converts it 444 to Punycode, such that the name that is passed to getaddrinfo or 445 equivalent is (say) xn--e1afmkfd. 446 * The name resolution API decides to pass the string to DNS (and 447 possibly other protocols). 448 * The DNS stub resolver will append suffixes in the suffix 449 search list, which may contain UTF-8 characters if the local 450 network uses a private name space, resulting in (say) xn-- 451 e1afmkfd.{non-ASCII}.com 452 * Each FQDN in turn will then be sent in a query to a DNS 453 server, until one succeeds. 454 * Since the private name space in this case uses UTF-8, the 455 above queries fail, since the Punycode version of the name was 456 not registered in that name space. 457 6. User types in {non-ASCII1}.{non-ASCII2}.{non-ASCII3}.com, where 458 {non-ASCII3}.com is a public name space using Punycode, but {non- 459 ASCII2}.{non-ASCII3}.com is a private name space using UTF-8, 460 which is accessible to the user. The application passes the 461 name, in the form of a UTF-8 string, to getaddrinfo or 462 equivalent. 463 * The name resolution API decides to pass the string to DNS (and 464 possibly other protocols). 466 * The DNS resolver tries to locate the authoritative server, but 467 fails the lookup because it cannot find a server for the UTF-8 468 encoding of {non-ASCII3}.com, even though it would have access 469 to the private name space. (To make this work, the private 470 name space would need to include the UTF-8 encoding of {non- 471 ASCII3}.com.) 473 When users use multiple applications, some of which do Punycode 474 conversion prior to passing a name to name resolution APIs, and some 475 of which do not, odd behavior can result which at best violates the 476 principle of least surprise, and at worst can result in security 477 vulnerabilities. 479 First consider two competing applications, such as web browsers, that 480 are designed to achieve the same task. If the user types the same 481 name into each browser, one may successfully resolve the name (and 482 hence access the desired content) because the encoding scheme was 483 correct, while the other may fail name resolution because the 484 encoding scheme was incorrect. Hence the issue can incent users to 485 switch to another application (which in some cases means switching to 486 an IDNA application, and in other cases means switching away from an 487 IDNA application). 489 Next consider two separate applications where one is designed to be 490 launched from the other, for example a web browser launching a media 491 player application when the link to a media file is clicked. If both 492 types of content (web pages and media files in this example) are 493 hosted at the same IDN in a private name space, but one application 494 converts to Punycode before calling name resolution APIs and the 495 other does not, the user may be able to access a web page, click on 496 the media file causing the media player to launch and attempt to 497 retrieve the media file, which will then fail because the IDN 498 encoding scheme was incorrect. Or even worse, if an attacker was 499 able to register the same name in the other encoding scheme, may get 500 the content from the attacker's machine. This is similar to a normal 501 phishing attack, except that the two names represent exactly the same 502 Unicode characters. 504 4. Recommendations 506 Taking into account the issues above, it would seem inappropriate for 507 an application to convert a name to Punycode when it does not know 508 whether DNS will be used by the name resolution library, or whether 509 the name exists in a private name space that uses UTF-8, or in the 510 global DNS that uses Punycode. 512 Instead, conversion to Punycode, UTF-8, or whatever other encoding, 513 should be done only by an entity that knows which protocol will be 514 used (e.g., the DNS resolver, or getaddrinfo upon deciding to pass 515 the name to DNS), rather than by general applications that call 516 protocol-independent name resolution APIs. Similarly, even when DNS 517 is used, the conversion to Punycode should be done only by an entity 518 that knows which name space will be used. 520 That is, a more intelligent DNS resolver would be more liberal in 521 what it would accept from an application and be able to query for 522 both a Punycode name (e.g., over the Internet) and a UTF-8 name 523 (e.g., over a corporate network with a private name space) in case 524 the server only recognized one. However, we might also take into 525 account that the various resolution behaviors discussed earlier could 526 also occur with record updates (e.g., with Dynamic Update [RFC2136]), 527 resulting in some names being registered in a local network's private 528 name space by applications doing Punycode conversion, and other names 529 being registered using UTF-8. Hence a name might have to be queried 530 with both encodings to be sure to succeed without changes to DNS 531 servers. 533 Similarly, a more intelligent stub resolver would also be more 534 liberal in what it would accept from a response as the value of a 535 record (e.g., PTR) in that it would accept either UTF-8 or Punycode 536 and convert them to whatever encoding is used by the application APIs 537 to return strings to applications. 539 Indeed the choice of conversion within the resolver libraries is 540 consistent with the quote from [RFC3490] section 6.2 stating that 541 Punycode conversion "might be performed inside these new versions of 542 the resolver libraries". 544 That said, some application-layer protocols may be defined to use 545 Punycode rather than UTF-8 as recommended by [RFC2277]. In this 546 case, an application may receive a Punycode name and want to pass it 547 to name resolution APIs. Again the recommendation is that a resolver 548 library be more liberal in what it would accept from an application 549 would mean that such a name would be accepted and re-encoded as 550 needed, rather than requiring the application to do so. 552 Finally, the question remains about what a DNS server should do to 553 handle cases where some existing applications or hosts do Punycode 554 queries within the local network using a private name space, and 555 other existing applications or hosts send UTF-8 queries. It is 556 undesirable to store different records for different encodings of the 557 same name, since this introduces the possibility for inconsistency 558 between them. Instead, a new DNS server could treat encoding- 559 conversion in the same way as case-insensitive comparison which a DNS 560 server is already required to do. Two encodings are, in this sense, 561 two representations of the same name, just as two case-different 562 strings are. However, whereas case comparison of non-ASCII 563 characters is complicated by ambiguities (see [RFC4690]), encoding 564 conversion between Punycode and UTF-8 is unambiguous. 566 [EDITOR'S NOTE: There are also normalization/mapping issues which the 567 next version of this document may explore. Currently we only explore 568 encoding issues.] 570 5. Security Considerations 572 Having applications convert names to Punycode before calling name 573 resolution can result in security vulnerabilities. If the name is 574 resolved by protocols or in zones for which records are registered 575 using other encoding schemes, an attacker can claim the Punycode 576 version of the same name and hence trick the victim into accessing a 577 different destination. This can be done for any non-ASCII name, even 578 when there is no possible confusion due to case, language, or other 579 issues. Other types of confusion beyond those resulting simply from 580 the choice of encoding scheme are discussed in [RFC4690]. 582 6. IANA Considerations 584 [RFC Editor: please remove this section prior to publication.] 586 This document has no IANA Actions. 588 7. IAB Members at the time of this writing 590 Marcelo Bagnulo 591 Gonzalo Camarillo 592 Stuart Cheshire 593 Vijay Gill 594 Russ Housley 595 John Klensin 596 Olaf Kolkman 597 Gregory Lebovitz 598 Andrew Malis 599 Danny McPherson 600 David Oran 601 Jon Peterson 602 Dave Thaler 604 8. References 605 8.1. Normative References 607 8.2. Informative References 609 [I-D.cheshire-dnsext-multicastdns] 610 Cheshire, S. and M. Krochmal, "Multicast DNS", 611 draft-cheshire-dnsext-multicastdns-07 (work in progress), 612 September 2008. 614 [I-D.ietf-idn-punycode-00] 615 Costello, A., "Punycode version 0.3.3", 616 draft-ietf-idn-punycode-00 (work in progress), July 2002. 618 [I-D.skwan-utf8-dns-00] 619 Kwan, S. and J. Gilroy, "Using the UTF-8 Character Set in 620 the Domain Name System", draft-skwan-utf8-dns-00 (work in 621 progress), November 1997. 623 [NIS] Sun Microsystems, "System and Network Administration", 624 March 1990. 626 [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet 627 host table specification", RFC 952, October 1985. 629 [RFC1001] NetBIOS Working Group, "Protocol standard for a NetBIOS 630 service on a TCP/UDP transport: Concepts and methods", 631 STD 19, RFC 1001, March 1987. 633 [RFC1002] NetBIOS Working Group, "Protocol standard for a NetBIOS 634 service on a TCP/UDP transport: Detailed specifications", 635 STD 19, RFC 1002, March 1987. 637 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 638 STD 13, RFC 1034, November 1987. 640 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application 641 and Support", STD 3, RFC 1123, October 1989. 643 [RFC1468] Murai, J., Crispin, M., and E. van der Poel, "Japanese 644 Character Encoding for Internet Messages", RFC 1468, 645 June 1993. 647 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 648 Requirement Levels", BCP 14, RFC 2119, March 1997. 650 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, 651 "Dynamic Updates in the Domain Name System (DNS UPDATE)", 652 RFC 2136, April 1997. 654 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 655 Specification", RFC 2181, July 1997. 657 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 658 Languages", BCP 18, RFC 2277, January 1998. 660 [RFC2492] Armitage, G., Schulter, P., and M. Jork, "IPv6 over ATM 661 Networks", RFC 2492, January 1999. 663 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 664 10646", RFC 2781, February 2000. 666 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 667 "Internationalizing Domain Names in Applications (IDNA)", 668 RFC 3490, March 2003. 670 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 671 for Internationalized Domain Names in Applications 672 (IDNA)", RFC 3492, March 2003. 674 [RFC3493] Gilligan, R., Thomson, S., Bound, J., McCann, J., and W. 675 Stevens, "Basic Socket Interface Extensions for IPv6", 676 RFC 3493, February 2003. 678 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 679 10646", STD 63, RFC 3629, November 2003. 681 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 682 Recommendations for Internationalized Domain Names 683 (IDNs)", RFC 4690, September 2006. 685 [RFC4795] Aboba, B., Thaler, D., and L. Esibov, "Link-local 686 Multicast Name Resolution (LLMNR)", RFC 4795, 687 January 2007. 689 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 690 October 2008. 692 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 693 4.0.0, defined by: The Unicode Standard, Version 4.0", 694 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1) . 696 Author's Address 698 Dave Thaler 699 Microsoft Corporation 700 One Microsoft Way 701 Redmond, WA 98052 702 USA 704 Phone: +1 425 703 8835 705 Email: dthaler@microsoft.com