idnits 2.17.1 draft-ietf-idn-uname-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 472 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 7 instances of too long lines in the document, the longest one being 54 characters in excess of 72. ** There are 23 instances of lines with control characters in the document. ** The abstract seems to contain references ([NAMEPREP]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 333 has weird spacing: '...ue name v ...' == Line 347 has weird spacing: '...ue name v ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC2119' on line 35 looks like a reference -- Missing reference section? 'NAMEPREP' on line 438 looks like a reference -- Missing reference section? 'HAN' on line 441 looks like a reference -- Missing reference section? 'UTR15' on line 455 looks like a reference -- Missing reference section? 'UTR21' on line 458 looks like a reference -- Missing reference section? 'LDAP' on line 444 looks like a reference -- Missing reference section? 'CNRP' on line 447 looks like a reference -- Missing reference section? 'DNS' on line 450 looks like a reference -- Missing reference section? 'TSCONV' on line 460 looks like a reference -- Missing reference section? 'IDNA' on line 464 looks like a reference -- Missing reference section? 'IDNREQ' on line 435 looks like a reference -- Missing reference section? 'CJKV' on line 453 looks like a reference Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Authors: Li Ming TSENG 2 Jan Ming HO 3 13 Jul 2001 Hua Lin QIAN 4 Expires 13 Jan 2002 Kenny HUANG Editor: James SENG 6 Internationalized Domain Names and Unique Identifiers/Names 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance 11 with all provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet 14 Engineering Task Force (IETF), its areas, and its working 15 groups. Note that other groups may also distribute working 16 documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of 19 six months and may be updated, replaced, or obsoleted by other 20 documents at any time. It is inappropriate to use Internet- 21 Drafts as reference material or to cite them other than as 22 "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 Terminology 32 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and 33 "MAY" in this document are to be interpreted as described in RFC 2119 34 [RFC2119]. 36 Abstract 38 One of the biggest technical challenge of Internationalized Domain 39 Names (IDN) is how to determine if the two given domain names matches. 40 The current approach to this problem is via a process known as 41 [NAMEPREP]. 43 This document attempts to describe an alternative view and solution to 44 the IDN matching problem. It could be treated as a further process of 45 NAMEPREP and it is compatible with the IDNA aproach. 47 There is a practical case to indicate that using CNAME to implement 48 UNAME is workable for Internet application to fetch a unique name. 50 1. Introduction 52 The Chinese Domain Name Consortium (CDNC) has taken a very keen 53 interest in the IDN, in particular, the uses of chinese script in the 54 domain names. CDNC are formed by the regional registries (CNNIC, TWNIC, 55 HKNIC and MONIC) and have experimented doing Chinese Domain Names 56 System for many months. 58 The primarily motivation for this proposal is due to the lack of 59 support of Traditional and Simplified Chinese in NAMEPREP. See [HAN] 60 for a discussion of Traditional/Simplified Han Ideograph problems. 62 In addition, given the operational experience of the registries, this 63 proposal will reduce the operational and deployment cost from a TLD 64 managers' perspective based on the examinations and developments in 65 CDNC. 67 Backward compatibility, interoperability, scalability, security, 68 operational and deployment are all elements that must be considered as 69 part of criteria when designing internationalized domain name system. 71 2. Background on Legacy Encoding 73 The most popular Chinese character set used in Taiwan is the 74 industrial standard "BIG5" and the corresponding one in China is 75 "GBK". BIG5 have primarily Traditional Chinese characters and GBK have 76 Simplified Chinese. 77 In addition, the China government has also mandated that all Chinese 78 software in China must support a new standard that supercede GBK known 79 as GB18030. 81 Both BIG5 and GBK are widely used in China, Taiwan, Hong Kong and 82 Macao and supported within many operating systems including Windows. 83 Thus, supporting these encodings in IDN is essential from a 84 geographical perspective. 86 3. An overview of current proposals and its problems 88 3.1. ASCII Compatible Encoding (ACE) 90 The need of supporting ACE in IDN has been extensively discussed in 91 the IDN Working Group. Backward compatibility is the strongest 92 advantage of ACE. The deployment of ACE neither affects the existing 93 naming infrastructure, nor creates potential damage of current 94 Internet applications. To move the current Internet to multilingual 95 infrastructure, ACE obviously is the most appropriate bridging 96 solution. 98 Although ACE has the advantages mentioned above, but most of the 99 user's systems support local encoding. User doesn't want to download 100 any special software or upgrade their software in order to handle 101 multilingual domain name system. The support of native encoding 102 without altering user's software has became an important issue for 103 TLD managers'. 105 3.2. NAMEPREP 107 The design goal of [NAMEPREP] is to allow users to enter host names 108 in applications and have the highest chance of getting the name 109 correct. The NAMEPREP process comprises of three basic steps, namely 110 "MAP", "NORMALIZATION" and "PROHIB". 112 The MAP and NORMALIZATION step aims to reduce the number of possible 113 representations domain name that should be equivalent. These are 114 based upon Unicode Technical Reports [UTR15] and [UTR21]. However, 115 when there are multiple representations of the same domain name but 116 matching changes depending on languages and context, NAMEPREP will 117 fail in these cases. Of our interest, Traditional and Simplified 118 Chinese ideograph cannot be handled by NAMEPREP. 120 4. Alternative view to the problem space 122 While the IDN WG has been working very hard to solve the ACE and 123 NAMEPREP in IDN, it is apparently that there is another view to these 124 problems that may give us a different approach and solution. 126 First, there is an assumption that NAMEPREP IDN is ISO10646/Unicode 127 string. In reality, most IDN is often encoded in legacy encoding and 128 a additional step have to be taken to covert it to ISO10646/Unicode. 130 Other than the backward compatibility feature of ACE, ACE is also an 131 identifier string for an IDN. And the NAMEPREP process is to unify the 132 various possible representations of IDNs to a single "unique name" for 133 matching purposes. 135 In other words, we have a conceptual model. 137 +-------+ +---------+ (ISO10646) 138 |XYZ.COM|-->--|Transcode|-->------------+ 139 +-------+ +---------+ +----------------+ +---------------+ 140 : (Legacy) ...---|NAMEPREP/Unified|-->--|ACE/unique name| 141 +-------+ +---------+ +----------------+ +---------------+ 142 |xyz.com|-->--|Transcode|-->------------+ 143 --------+ +---------+ (ISO10646) 145 5. Proposal 147 Given the context of the alternative view to IDN, we can derive another 148 set of solution using a directory concept. 150 +-------+ +---------+ 151 |XYZ.COM|-->----| | 152 +-------+ | | +---------------+ 153 : (Legacy)|Directory|-->--|ACE/unique name| 154 +-------+ | | +---------------+ 155 |xyz.com|-->----| | 156 +-------+ +---------+ 158 In section 3.2., it mentioned there are some ideograph cannot only be 159 handled by NAMEPREP's "MAP", "NORMALIZATION" and "PROHIB" essential 160 process. To build up a directory system is to doing as a further 161 NAMEPREP process. The further process will solve the matching problem. 162 For example the one to many and many to one mapping. 164 The purpose of this directory system is to list all the possible 165 representations of IDNs and unify them to a unique name. This unique 166 name could be an ACE of the most common representation or NAMEPREPPED 167 ACE. 169 The content of the directory is build up upon registration whereby 170 registrant will have to provide a list of equivalence representation 171 of the domain names they registered. 173 However, there is still a question of what directory should we use. 174 In this document, we shall examine a couple of different solutions. 176 5.1. LDAP as Directory 178 Lightweight Directory Access Protocol [LDAP] is one of the most 179 widely used directory protocols. In LDAP, there is a concept of 180 hierarchy similar to the DNS hierarchy. Hence, it is possible to 181 distribute the content of the directory across various LDAP servers 182 for scalability and authority control. For example, each registries 183 who wish to deploy IDN may setup an LDAP server and to register this 184 LDAP with a "root" LDAP server. 186 The IDN query process would then look something like this: 187 a. User Input IDN name into an application 188 b. Application does a LDAP query to look for unique name 189 c. Application use unique name to do DNS lookup 191 Advantages: 192 - encapsulate the problem in the representation layer and 193 registration time 194 - able to handle with unification problems 196 Disadvantage 197 - requires all applications to upgrade 198 - additional LDAP lookup overhead 199 - policy issues with "root" LDAP server 200 - requires access to LDAP servers to function, i.e. can't work 201 offline 203 5.2. CNRP as Directory 205 Common Name Resolution Protocol [CNRP] is a newly developed protocol 206 in IETF that does common names resolutions. In CNRP, there is no 207 concept of hierarchy but there is a referrer scheme. Hence, it is 208 possible to build a distributed directory system whereby they refer 209 to each another. 211 The IDN query process would then look something like this: 212 a. User Input IDN name into an application 213 b. Application does a CNRP query to look for unique name 214 c. Application use unique name to do DNS lookup 216 Advantages: 217 - encapsulate the problem in the representation layer and 218 registration time 219 - able to handle with unification problems 220 - no policy issues with "root" CNRP server 222 Disadvantage 223 - requires all applications to upgrade 224 - additional CNRP lookup overhead and no assurance that unique name 225 can be located 226 - requires access to CNRP servers to function, i.e. can't work 227 offline 229 5.3. DNS as Directory 231 Domain Name System [DNS] is a widely established lookup distributed 232 directory. There is an existing hierarchy structure and resource 233 records are distributed. In theory, the DNS is able to handle 8-bit 234 binary string. 236 The IDN query process would then look something like this: 237 a. User Input IDN name into an application 238 b. Application does a DNS query to look for unique name which will 239 return the Resource Record of the unique name together 241 Advantages: 242 - encapsulate the problem in the representation layer and 243 registration time 244 - able to handle with unification problems 245 - existing "root" DNS server with existing hierarchy 246 - does not requires all applications to upgrade 248 Disadvantage 249 - unknown behavior on applications which cannot handle 8-bit 250 - unknown behavior of servers/caching software which cannot handle 251 8-bit 253 6. Solution 255 Given CDNC operational experience that it is difficult to get 256 applications developers to upgrade, difficult to get users to 257 download new applications and difficult etc, using DNS as a Directory 258 would be the fastest approach to deploy IDN for our users. 260 6.1. Zone file 262 Because there are multiple encoding and multiple representation of the 263 same name even within the same encoding, for a single name, there are 264 multiple binary strings for a single domain name (e.g. ML1, ML2, ML3, 265 ML4). 267 Hence, we would create the following Resource Records within the name 268 server. In the Resource Records, it would look like this: 270 ML1 UNAME ACE1 271 ML2 UNAME ACE1 272 ML3 UNAME ACE1 273 ML4 UNAME ACE1 275 ACE1 IN A 1.2.3.4. 276 IN A 1.2.3.4. 278 A "UNAME" Resource Record is shown here. In practice, it could be 279 CNAME (except CNAME is unable to handle MX). 281 6.2. The practical case of implementing UNAME with CNAME 283 Before the UNAME protocol is defined, in TWNIC IDN testbed, it has 284 implimented IDN unique name with CNAME in current stage. When register 285 a Traditional Chinese domain name(TCDN) can get another one 286 corresponding Simplified Chinese domain name(SCDN). The Traditional 287 and Simplified Chinese Conversion is defined in [TSCONV]. 289 The Resource Records is look like this: 291 TCDN1 CNAME EDN1 292 SCDN1 CNAME EDN1 293 EDN1 IN A IP-of-EDN1 295 If the EDN1 is not in the same domain with TCDN1 and SCDN1, that the 296 Resource Record of EDN1 would in the different zone file. The left 297 side of CNAME Resource Record would be all of the equivalent ML1, 298 ML2 .... Like TCDN1 and SCDN1 are equivalent. The right side of CNAME 299 Resource Record would be an unique name of ACE compatible. EDN1 300 (English Domain Name 1) is a kind of ACE compatible nique name. EDN1 301 could be substiude with any kind of ACE compatible unique name. Such 302 like xACE encode or random number. Once the xACE is decided by IETF 303 IDN WG, the implimentation would adopt the standard. The unique name 304 also retains compatible with [IDNA] approach. 306 In order to get the unique name EDN1 not the destination IP-of-EDN1, 307 there would be construct some intermediate server. In TWNIC testbed, 308 there are Web DNS or DNS proxy as the intermediate server. Any 309 application can pass a TCDN1 or SCDN1 to the intermediate server. The 310 intermediate server would ask the DNS for the coresponding right side 311 which is the unique name. And then pass the unique name EDN1 to the 312 application. And then go with the current DNS infrastructure. Once the 313 UNAME protocol is defined, there is no more need a intermediate server. 315 The process could be represented as following: 317 +------+ 318 | User | 319 +------+ 320 | ^ 321 Request to AP | | Response from AP 322 with MDN | | End system 323 +----------------------------|----|----------------------------+ 324 | v | 325 | +--------------------------------------------------------+ | 326 | | Application Client | | 327 | +--------------------------------------------------------+ | 328 | | ^ Nameprepped | ^ | ^ | 329 | MDN | | ACE compatible | | | | | 330 | | | unique name | | IP of | | | 331 | v | Nameprepped | | unique | | | 332 | +--------------+ ACE compatible | | name | | | 333 | | intermediate | unique name v | | | | 334 | | server | +----------+ | | | 335 | +--------------+ | Resolver | | | | 336 | | ^ Nameprepped +----------+ | | | 337 | MDN | | ACE compatible | ^ | | | 338 | | | unique name | | | | | 339 | +--------------+ | | Request | |Response| 340 | | Directory of | | | for | |from | 341 | | DNS | | | service | |server | 342 | +--------------+ | | | | | 343 | | | | | | 344 +---------------------------------|--|-------------|--|--------+ 345 Nameprepped ACE| | IP of | | 346 compatible | | unique | | 347 unique name v | name v | 348 +-------------+ +---------------------+ 349 | DNS servers | | Application servers | 350 +-------------+ +---------------------+ 352 6.3. Advantages 354 The strongest advantage to this solution is that: 355 a. This does not requires our users to download any special software 356 or upgrade their software since it is able to handle the native 357 encoding of the user directly 359 b. It will work immediately for ccTLD who wish to offers ML.ccTLD 360 services without any changes at the user client 362 c. It also retains the compatible with IDNA approach so long we keep 363 the unique name equivalent to NAMEPREPPED ACE. 365 d. Existing DNS hierarchy 367 6.4. Potential Loopholes 369 There are many loopholes within this solution that we need to take 370 note: 372 a. Some "smart" localized browser will send out "wrong" binary 373 string due to different. For example, English Internet Explorer will 374 not be able to handle Chinese double-byte legacy encoding properly. 375 But if there is the requirement of use double-byte encoding, the 376 appropriate application environment is necessary. 378 b. While Chinese have a handful (usually 2 to 3) representation 379 forms for a single IDN, other languages may have much more 380 complicated representations which may not be suitable to use this 381 approach. For example, if case-folding for Latin character is done 382 using this solution, for a string length of 32 characters, it will 383 requires 2^32 entries in the DNS. But this could be solved in some 384 other means. 386 c. It might be possible to construct a binary string in some legacy 387 encoding which gives the same binary representation for another 388 domain name (a.k.a. binary collision). The binary collesion of in the 389 same zone could be avoided by registration system and policy. If the 390 left side of UNAME (like ML1,ML2, ML3, ML4 or TCDN1, SCDN1) are not in 391 the same zone, they would not occur binary collesion. The intermediate 392 server would have the ability to decide which zone of DNS directory it 393 sould access. 395 Acknowledgement 397 Author(s) 399 Li Ming Tseng, Prof 400 National Central University, TWNIC 401 Email: tsenglm@cc.ncu.edu.tw 402 Tel: +886-3-490-4421 404 Jan Ming Ho, Prof 405 Academia Sinica, TWNIC 406 Email: hoho@iis.sinica.edu.tw 407 Tel: +886-2-2788-3799 x 1803 409 Hua lin Qian, Prof 410 Chinese Academy of Science, CNNIC 411 Email: hlqian@ns.cnc.ac.cn 412 Tel: +86-10-6256-9960 414 Kenny Huang 415 Asia Infra International Ltd, TWNIC 416 Email: huangk@alum.sinica.edu 417 Tel: +886-2-2658-6510 419 Editor: James SENG 420 i-DNS.net International 421 8 Temasek Boulevard 422 Suntec Tower Three #24-02 423 Singapore 038988 424 Email: jseng@i-dns.net 425 Tel: +65-2486-188 427 Editor: Erin Chen 428 Taiwan Network Information Center (TWNIC) 429 4F-2, No. 9, Sec. 2, Roosevelt Rd., Taipei, 100 Taiwan. 430 Email: erin@twnic.net.tw 431 Tel: +886-2-23411313#502 433 Reference 435 [IDNREQ] Requirements of Internationalized Domain Names, Zita Wenzel, 436 James Seng, draft-ietf-idn-requirements 438 [NAMEPREP] Preparation of Internationalized Host Names, P. Hoffman, 439 M. Blanchet, draft-ietf-idn-nameprep 441 [HAN] Han Ideograph (CJK) for Internationalized Domain Names, 442 J. Seng, Y. Yoneya, K. Huang, K. Kim, draft-ietf-idn-cjk 444 [LDAP] Lightweight Directory Access Protocol (v3), M. Wahl, 445 T. Howes, S. Kille, rfc2251.txt 447 [CNRP] Common Name Resolution Protocol, N. Popp, M. Mealing, 448 M. Moseley, draft-ietf-cnrp 450 [DNS] Domain Names - Implementation and Specification, 451 P. Mockapetris, RFC1035 453 [CJKV] CJKV Information Processing ISBN 1-56592-224-7 455 [UTR15] Unicode Normalization Forms, Mark Davis and Martin Duerst, 456 Unicode Technical Report 15. 458 [UTR21] Case Mappings, Mark Davis, Unicode Technical Report 21. 460 [TSCONV] Traditonal and Simplified Chinese Conversion, XiaoDong LEE, 461 HSU NAI-WEN, Erin Chen, GuoNian SUN, CNNIC, TWNIC, CDNC, 462 draft-ietf-idn-tsconv 464 [IDNA] Internationalizing Host Names In Applications (IDNA), 465 Patrik Faltstrom, Paul Hoffman, draft-ietf-idn-idna Cisco