idnits 2.17.1 draft-ietf-idn-icu-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 515 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The abstract seems to contain references ([RFC1035]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (14 July 2000) is 8687 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Unicode' is mentioned on line 185, but not defined == Unused Reference: 'KWAN' is defined on line 419, but no explicit reference was found in the text == Unused Reference: 'RFC2535' is defined on line 442, but no explicit reference was found in the text == Unused Reference: 'UNICODE' is defined on line 445, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'IDN-REQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'KWAN' -- Possible downref: Non-RFC (?) normative reference: ref. 'Oscarsson' ** Downref: Normative reference to an Informational RFC: RFC 2130 ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR21' Summary: 10 errors (**), 0 flaws (~~), 7 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IETF IDN Working Group Seungik Lee, Hyewon Shin, Dongman Lee 2 Internet Draft ICU 3 draft-ietf-idn-icu-00.txt Eunyong Park, Sungil Kim 4 Expires: 14 January 2001 KKU, Netpia.com 5 14 July 2000 7 Architecture of Internationalized Domain Name System 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that other 16 groups may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 1. Abstract 31 For restrict use of Domain Name System (DNS) for domain names with 32 alphanumeric characters only, there needs a way to find an Internet 33 host using multi-lingual domain names: Internationalized Domain Name 34 System (IDNS). 36 This document describes how multi-lingual domain names are handled in 37 a new protocol scheme for IDNS servers and resolvers in architectural 38 view and it updates the [RFC1035] but still preserves the backward 39 compatibility with the current DNS protocol. 41 2. Conventions used in this document 43 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 44 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 45 document are to be interpreted as described in [RFC2119]. 47 "IDNS" (Internationalized Domain Name System) is used here to 48 indicate a new system designed for a domain name service, which 49 supports multi-lingual domain names. 51 "The current/conventional DNS" or "DNS" (Domain Name System) is used 52 here to indicate the domain name systems currently in use. It 53 fulfills the [RFC1034, RFC1035], but implementations and functional 54 operations may be different from each other. 56 The "alphanumeric" character data used here is the character set that 57 is allowed for a domain name in DNS query format, [a-zA-Z0-9-]. 59 3. Introduction 61 Domain Name System (DNS) has eliminated the difficulty of remembering 62 the IP addresses. As the Internet becomes spread over all the people, 63 the likelihood that the people who are not familiar with alphanumeric 64 characters use the Internet increases. The domain names in 65 alphanumeric characters are difficult to remember or use for the 66 people who is not educated English. Therefore, it needs a way to find 67 an Internet host using multi-lingual domain name: Internationalized 68 Domain Name System. 70 3.1 The current issues of IDNS 72 IDNS maps a name to an IP address as the typical DNS does, but it 73 allows domain names to contain multi-lingual characters. The multi- 74 lingual characters need to be encoded/decoded into one standardized 75 format, and it needs changes in the conventional DNS protocol 76 described in [RFC1034] and [RFC1035]. But it is required to minimize 77 the changes in the present DNS protocol so that it guarantees the 78 backward compatibility. 80 The IDNS issues have been discussed in IETF IDN Working Group. These 81 issues are well described in [IDN-REQ]. The main issues are: 83 - Compatibility and interoperability. The DNS protocol is in use 84 widely in the Internet. Although a new protocol is introduced for DNS, 85 the current protocol may be used with no changes. Therefore, a new 86 design for DNS protocol, IDNS must provide backward compatibility and 87 interoperability with the current DNS. 89 - Internationalization. IDNS is on the purpose of using multi-lingual 90 domain names. The international character data must be represented by 91 one standardized format in domain names. 93 - Canonicalization. DNS indexes and matches domain names to look up a 94 domain name from zone data. In the conventional DNS, canonicalization 95 is subjected to US-ASCII only. However, every multi-lingual character 96 data must be canonicalized in its own rules for a DNS standardized 97 matching policy, e.g. case-insensitive matching rule. 99 - Operational issues. IDNS uses international character data for 100 domain names. Normalization and canonicalization of domain names are 101 needed in addition to the current DNS operations. IDNS also needs an 102 operation for interoperability with the current DNS. Therefore, it is 103 needed to specify the operational guidelines for IDNS. 105 3.2 Overview of the proposed scheme 107 Our proposed scheme for IDNS is also subjected on the issues 108 described earlier to fulfill the requirements of IDN [IDN-REQ]. 110 The proposed scheme can be summarized as following: 112 - The IN bit, which is reserved and currently unused in the DNS 113 query/response format header, is used to distinguish between the 114 queries generated by IDNS servers or resolvers and those of non-IDNS 115 ones [Oscarsson]. This mechanism is also needed to indicate whether 116 the query is generated by the appropriate IDNS operations for 117 canonicalization and normalization or not. 119 - The multi-lingual domain names are encoded into UTF-8 as a wire 120 format. UTF-8 is recommended as a default character encoding scheme 121 (CES) in the creation of new protocols which transmit text in 122 [RFC2130]. This scheme allows the IDNS server to handle the DNS query 123 from non-IDNS servers or resolvers because the ASCII code has no 124 changes in UTF-8. 126 - The UTF-8 domain names must be case-folded before transmission. It 127 minimizes the overhead on server's operations of matching names in 128 case-insensitive. It also guarantees that the result of caching 129 queries can be used without any further normalization and 130 canonicalization. If IDNS server gets non-IDNS query that is not 131 case-folded, it case-folds the query before transmitting to another 132 servers. 134 4. Design considerations 136 Our proposed scheme is designed to fulfill the requirements of IETF 137 IDN WG [IDN-REQ]. All the methods for IDNS schemes must be approved 138 by the requirements documents. The design described in this document 139 is based on these requirements. 141 4.1 Protocol Extensions 143 To indicate an IDNS query format, we use an unallocated bit in the 144 current DNS query format header, named 'IN' bit [Oscarsson]. All IDNS 145 queries are set IN bit to 1. Without this bit set to 1, we cannot 146 guarantee that the query is in the appropriate format for IDNS. 148 'IN' bit is to indicate whether the query is from IDNS 149 resolvers/servers or not. It also reduces overhead on canonicalizing 150 operation at IDNS server. It will be described further in <4.4. 151 Canonicalization>. 153 We devise new operations and new structures of resolvers and name 154 servers to add the multi-lingual domain name handling features into 155 the DNS. This causes changes of all DNS servers and resolvers to use 156 multi-lingual domain names. The new architectures for resolvers and 157 servers will be described in <5. Architectures> 159 4.2 Compatibility and interoperability 161 The 'IN' bit is valid bit location of query for the conventional DNS 162 protocol to be set to zero [RFC1035]. And operations and structures 163 of IDNS preserve the conventional rules of DNS to guarantee the 164 interoperability with the conventional DNS servers or resolvers so 165 that the changes are optional. These make this scheme for IDNS 166 compatible with the current protocol. 168 Although the current DNS protocol uses 7-bit ASCII characters only, 169 the query format of the current DNS protocol set is 8 bit-clean. 170 Therefore, we can guarantee the backward compatibility and 171 interoperability with the current DNS using UTF-8 code because the 172 ASCII code is preserved with no changes in UTF-8. 174 Note: There are also in use implementations that are compatible with 175 the current DNS but extend their operations to use UTF-8 domain names. 176 The IDNS described here interoperates well with these implementations. 177 The interoperability with these implementations will be described in 178 <5.4 Interoperability with the current DNS>. 180 4.3 Internationalization 182 All international character data must be represented in one 183 standardized format and the standardized format must be compatible 184 with the current ASCII-based protocols. Therefore, the coded 185 character set (CCS) for IDNS protocol must be Unicode [Unicode], and 186 be encoded using the UTF-8 [RFC2279] character encoding scheme (CES). 188 The client-side interface may allow the domain names encoded in any 189 local character sets, Unicode, ASCII and so on. But they must be 190 encoded into Unicode before being used in IDNS resolver. The IDNS 191 resolver accepts Unicode character data only, and converts it to UTF- 192 8 finally for transmission. 194 4.4 Canonicalization 196 In the current DNS protocol, the domain names are matched in case- 197 insensitive. Therefore, the domain names in a query and zone file 198 must be case-folded before equivalence test. 200 The case-folding issue has been discussed for a long time in IETF IDN 201 WG. The main problem is for case folding in locale-dependent. Some 202 different local characters are overlapped within case-folded format. 203 For example, Latin capital letter I (U+0049) case-folded to lower 204 case in the Turkish context will become Latin small letter dotless i 205 (U+0131). But in the English context, it will become Latin small 206 letter i (U+0069) 208 Therefore, we case-fold the domain names in locale-independent in our 209 new IDNS design with a method defined in [UTR21]. 211 Multi-lingual domain names should be case-folded in IDNS resolvers or 212 IDNS servers before transmitting to other IDNS/DNS servers. That is, 213 IDNS resolver should case-fold the domain name and converts it to 214 UTF-8 before transmission. In case of IDNS server, if it gets a query 215 with IN bit set to 1, then it needs not to make the multi-lingual 216 domain name canonicalized anymore. If the IDNS server gets a query 217 with IN bit set to 0, then it cannot determine the query is 218 appropriate canonicalized format for IDNS server, so that it case- 219 folds that multi-lingual domain name in the query, and set 'IN' bit 220 to 1. 222 The current DNS queries contain the original case of domain names to 223 preserve the original cases. To be consistent with this rule, all 224 case-folded multi-lingual domain names should be stored by IDNS 225 resolvers or servers before case-folding, and should be restored 226 before sending response. 228 In the case of case-folding UTF-8 code, using the case-folding method 229 in [UTR21], the UTF-8 should be converted to Unicode and it must be 230 mapped to the mapping table finally. Of course that if we could make 231 a case-folding mapping table of UTF-8 character data, this overhead 232 could be reduced. 234 However it cannot avoid an overhead in IDNS servers for 235 canonicalization, because the canonicalization of international 236 character data is complicated. 238 To minimize this overhead, we use 'IN' bit to indicate that the 239 canonicalization for the query has been already handled. That means 240 it needs not canonicalization operation anymore. The detailed 241 operations according to the 'IN' bit are described later in <5. 242 Architectures>. 244 With international character data, the canonicalization (e.g. case- 245 folding) is much more complicated than the one with US-ASCII, and is 246 different from each other's by their locale contexts. 248 But this document doesn't specify any method or recommendation more 249 than case-folding. For canonicalization of international character 250 data, [UTR15] is a good start. It must be discussed further and 251 specified in the IDNS protocol specification. 253 4.5 Operational issues 255 In the current DNS scheme, it uses only ASCII code for a wire format. 256 But our new IDNS scheme uses UTF-8 code for a wire format. All the 257 IDNS resolvers must transmit queries encoded in UTF-8 and case-folded. 258 This format can be guaranteed by checking the IN bit: if IN bit is 259 set to 1, the query is encoded in UTF-8 and case-folded. Otherwise 260 the IDNS server cannot assure that the query is encoded in UTF-8 and 261 case-folded. Therefore it needs additional operations for encoding to 262 UTF-8 and case-folding, etc in this case. 264 The current DNS resolvers transmit the queries in ASCII code. But 265 it's not considerable in IDNS servers because the ASCII code is 266 preserved with no changes in UTF-8. 268 Some applications and resolvers transmit the queries in UTF-8 269 although they don't fit on the new IDNS resolvers' structures, e.g. 270 Microsoft's DNS servers. We cannot guarantee that those queries are 271 case-folded correctly. Therefore, the IDNS servers should convert 272 them to appropriate IDNS queries instead of the IDNS resolver in that 273 case. 275 All detailed operations of IDNS servers and resolvers are described 276 in <5. Architectures>. 278 5. Architectures 280 5.1 New header format 282 A new IDNS servers and resolvers must interoperate with the ones of 283 current DNS. Therefore, we need a way to determine whether the query 284 is for IDN or not. For this reason, we use a new header format as 285 proposed in [Oscarsson]. 287 1 1 1 1 1 1 288 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 289 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 290 | ID | 291 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 292 |QR| Opcode |AA|TC|RD|RA|IN|AD|CD| RCODE | 293 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 294 | QDCOUNT | 295 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 296 | ANCOUNT | 297 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 298 | NSCOUNT | 299 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 300 | ARCOUNT | 301 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 303 The IDNS resolvers and servers identify themselves in a query or a 304 response by setting the 'IN' bit to 1 in the DNS query/response 305 format header. This bit is defined to be zero by default in the 306 current DNS servers and resolvers. 308 5.2 Structures of IDNS resolvers 310 To use multi-lingual domain names with IDNS servers, all the IDNS/DNS 311 resolvers must generate the query in a format of UTF-8 or ASCII. The 312 design of a resolver could be different with each other according to 313 the local operating systems or applications. We propose new design 314 guidelines of a resolver for a new standardization. 316 The IDNS resolver accepts Unicode from user interface for domain 317 names. The other character sets should be rejected. It encodes all 318 such character data into UTF-8 for transmission to name servers. 320 The procedures of the operation of an IDNS resolver are below: 322 <1>. If the resolver gets a domain name in Unicode or ASCII then it 323 stores the original domain name query. Otherwise the request for 324 lookup is rejected. In the current DNS protocol, the original case of 325 the domain name should be preserved. Therefore, the resolver must 326 store the original cases of the domain names before canonicalization 327 (e.g. case-folding). 329 <2>. Make the domain name case-folded with locale-independent case- 330 mapping table defined in [UTR21]. 332 <3>. Convert it to UTF-8. 334 <4>. Set IN bit to 1. It indicates the query is from IDNS resolver 335 and the format is UTF-8, case-folded. 337 <5>. Send request query to name servers. 339 <6>. Restore the original domain name query into the response query 340 format. 342 <7>. Send response to the application. 344 5.3 Structures of IDNS servers 346 The operation of IDNS server is similar to the current one of DNS 347 server, but the IDNS server accepts UTF-8 queries and converts them 348 to the appropriate formats additionally. 350 The IDNS server distinguishes between the IDNS queries and DNS 351 queries by checking IN bit in the query/response format header. 352 According to the 'IN' bit, it operates differently. 354 The procedures of the operation of an IDNS server are below: 356 <1>. If the IN bit in the query/response format header is set to 1 357 then it matches the domain name within zone file data or forwards 358 request query to resolve. It operates as same as the operations of 359 the current DNS servers but retrieves UTF-8 code. In this case, it 360 needs not to make domain name canonicalized because the domain name 361 is already canonicalized in the previous procedures of IDNS resolvers 362 or IDNS servers. Go to step <7>. 364 <2>. Set IN bit to 1. 366 <3>. Store the original domain name query. 368 <4>. Make the domain name case-folded with locale-independent case- 369 mapping table defined in [UTR21]. 371 <5>. Match the domain name within zone file data or send request 372 query to lookup. 374 <6>. Restore the original domain name query into the response query 375 format. 377 <7>. Send response for the query to the resolver or the other server 378 requested. 380 5.4 Interoperability with the current DNS 382 The DNS servers and resolvers accept domain names in ASCII only. But 383 IDNS servers and resolvers accept domain names in UTF-8. Therefore, 384 the queries from DNS ones to IDNS ones can be well handled because 385 the UTF-8 is a superset of ASCII code. But the queries from IDNS ones 386 to DNS ones will be rejected because the UTF-8 code is beyond the 387 range of ASCII code. 389 Note: There are some implementations which can handle UTF-8 domain 390 names although they don't fit on this specification of IDNS and fully 391 implemented with DNS protocol specification, e.g. Microsoft's DNS 392 server and resolvers. In this case, we cannot guarantee that the 393 queries from these 3rd-party implementations are encoded into UTF-8 394 and well canonicalized. But this queries are set 'IN' bit to 0, so 395 that the IDNS evaluates whether the domain name is the range of UTF-8 396 or not, and converts it into UTF-8 and makes it canonicalized finally. 398 6. Security Considerations 400 This architecture of IDNS uses 8bit-clean queries for transmission 401 and the UTF-8 code is handled instead of ASCII. The DNS protocol has 402 already allocated 8bit query format for domain names Therefore, the 403 IDNS protocol inherits the security issues for the current DNS. 405 Canonicalization of IDNS is defined in [UTR15] and case folding in 406 [UTR21]. All security issues related with canonicalization or 407 normalization inherits ones described in [UTR15, UTR21]. 409 As always with data, if software does not check for data that can be 410 a problem, security may be affected. As more characters than ASCII is 411 allowed, software only expecting ASCII and with no checks may now get 412 security problems. 414 7. References 416 [IDN-REQ] James Seng, "Requirements of Internationalized Domain 417 Names," Internet Draft, June 2000 419 [KWAN] Stuart Kwan, "Using the UTF-8 Character Set in the 420 Domain Name System," Internet Draft, February 2000 422 [Oscarsson] Dan Oscarsson, "Internationalisation of the Domain Name 423 Service," Internet Draft, February 2000 425 [RFC1034] Mockapetris, P., "Domain Names - Concepts and 426 Facilities," STD 13, RFC 1034, USC/ISI, November 1987 428 [RFC1035] Mockapetris, P., "Domain Names - Implementation and 429 Specification," STD 13, RFC 1035, USC/ISI, November 430 1987 432 [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate 433 Requirement Levels," RFC 2119, March 1997 435 [RFC2130] C. Weider et. Al., "The Report of the IAB Character Set 436 Workshop held 29 February - 1 March 1996," RFC 2130, 437 Apr 1997. 439 [RFC2279] F. Yergeau, "UTF-8, a transformation format of ISO 440 10646," RFC 2279, January 1998 442 [RFC2535] D. Eastlake, "Domain Name System Security Extensions," 443 RFC 2535, March 1999 445 [UNICODE] The Unicode Consortium, "The Unicode Standard - Version 446 3.0," http://www.unicode.org/unicode/ 448 [UTR15] M. Davis and M. Duerst, "Unicode Normalization Forms", 449 Unicode Technical Report #15, Nov 1999, 450 http://www.unicode.org/unicode/reports/tr15/ 452 [UTR21] Mark Davis, "Case Mappings," Unicode Technical Report 453 #21, May 2000, 454 http://www.unicode.org/unicode/reports/tr21 456 8. Acknowledgments 458 Kyoungseok Kim 459 Chinhyun Bae 461 9. Author's Addresses 463 Seungik Lee 464 Email: silee@icu.ac.kr 466 Hyewon Shin 467 Email: hwshin@icu.ac.kr 469 Dongman Lee 470 Email: dlee@icu.ac.kr 472 Information & Communications University 473 58-4 Whaam-dong Yuseong-gu Taejon, 305-348 Korea 475 Eunyong Park 476 Email: eunyong@eunyong.pe.kr 477 Konkuk University 478 93-1 Mojindong, Kwangjin-ku Seoul, 143-701 Korea 480 Sungil Kim 481 Email: clicky@netpia.com 482 Netpia.com 483 35-1 8-ga Youngdeungpo-dong Youngdeungpo-gu Seoul, 150-038 Korea