idnits 2.17.1 draft-ietf-idn-requirements-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 643 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (23 November 2001) is 8188 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 340 -- Looks like a reference, but probably isn't: '2' on line 346 -- Looks like a reference, but probably isn't: '3' on line 350 -- Looks like a reference, but probably isn't: '4' on line 356 -- Looks like a reference, but probably isn't: '5' on line 360 -- Looks like a reference, but probably isn't: '6' on line 365 -- Looks like a reference, but probably isn't: '7' on line 371 -- Looks like a reference, but probably isn't: '8' on line 379 -- Looks like a reference, but probably isn't: '9' on line 384 -- Looks like a reference, but probably isn't: '10' on line 388 -- Looks like a reference, but probably isn't: '11' on line 392 -- Looks like a reference, but probably isn't: '12' on line 398 -- Looks like a reference, but probably isn't: '13' on line 403 -- Looks like a reference, but probably isn't: '14' on line 408 -- Looks like a reference, but probably isn't: '15' on line 411 -- Looks like a reference, but probably isn't: '16' on line 415 -- Looks like a reference, but probably isn't: '17' on line 421 -- Looks like a reference, but probably isn't: '18' on line 426 -- Looks like a reference, but probably isn't: '19' on line 444 -- Looks like a reference, but probably isn't: '20' on line 451 -- Looks like a reference, but probably isn't: '21' on line 454 -- Looks like a reference, but probably isn't: '22' on line 461 -- Looks like a reference, but probably isn't: '23' on line 468 -- Looks like a reference, but probably isn't: '24' on line 472 -- Looks like a reference, but probably isn't: '25' on line 476 == Missing Reference: 'UTR15' is mentioned on line 477, but not defined -- Looks like a reference, but probably isn't: '26' on line 479 -- Looks like a reference, but probably isn't: '27' on line 484 -- Looks like a reference, but probably isn't: '28' on line 486 -- Looks like a reference, but probably isn't: '29' on line 492 -- Looks like a reference, but probably isn't: '30' on line 496 == Unused Reference: 'RFC2119' is defined on line 539, but no explicit reference was found in the text == Unused Reference: 'RFC2279' is defined on line 551, but no explicit reference was found in the text == Unused Reference: 'RFC2825' is defined on line 560, but no explicit reference was found in the text == Unused Reference: 'IDNCOMP' is defined on line 567, but no explicit reference was found in the text == Unused Reference: 'UNICODE30' is defined on line 577, but no explicit reference was found in the text == Unused Reference: 'UAX15' is defined on line 586, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CHARREQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'DNSEXT' ** Downref: Normative reference to an Unknown state RFC: RFC 952 ** Obsolete normative reference: RFC 2278 (Obsoleted by RFC 2978) ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) ** Obsolete normative reference: RFC 2553 (Obsoleted by RFC 3493) ** Downref: Normative reference to an Informational RFC: RFC 2825 ** Downref: Normative reference to an Informational RFC: RFC 2826 == Outdated reference: A later version (-01) exists of draft-ietf-idn-compare-00 -- Possible downref: Normative reference to a draft: ref. 'IDNCOMP' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE30' -- Possible downref: Non-RFC (?) normative reference: ref. 'US-ASCII' -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR17' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR21' Summary: 12 errors (**), 0 flaws (~~), 11 warnings (==), 43 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IETF IDN Working Group Editors Zita Wenzel, James Seng 2 Internet Draft draft-ietf-idn-requirements-07.txt 3 23 May 2001 Expires 23 November 2001 5 Requirements of Internationalized Domain Names 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with 10 all provisions of Section 10 of RFC2026. 12 Internet-Drafts are working documents of the Internet Engineering 13 Task Force (IETF), its areas, and its working groups. Note that 14 other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other 19 documents at any time. It is inappropriate to use Internet- 20 Drafts as reference material or to cite them other than as 21 "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Intended Scope 31 The intended scope of this document is to explore requirements for the 32 internationalization of domain names on the Internet. It is not 33 intended to document user requirements. It is recommended that 34 solutions not necessarily be within the DNS itself, but could be a layer 35 interjected between the application and the DNS. Proposals SHOULD 36 fulfill most, if not all, of the requirements. This document MAY be 37 updated based on clinical trials. 39 Abstract 41 This document describes the requirement for encoding international 42 characters into DNS names and records. This document is guidance for 43 developing protocols for internationalized domain names. 45 1. Introduction 47 At present, the encoding of Internet domain names is restricted to a 48 subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many 49 other text based items on the Internet have already been at least 50 partially internationalized. It is important for domain names to be 51 similarly internationalized or for an equivalent solution to be found. 52 This document assumes that the most effective solution involves putting 53 non-ASCII names inside some parts of the overall DNS system although 54 such assumption may not be the consensus of the IETF community. 56 This document is being discussed on the "idn" mailing list. To join the 57 list, send a message to with the words 58 "subscribe idn" in the body of the message. Archives of the mailing 59 list can also be found at ftp://ops.ietf.org/pub/lists/idn*. 61 1.1 Definitions and Conventions 63 A language is a way that humans interact. In computerised form, a text 64 in a written language can be expressed as a string of characters. 65 The same set of characters can often be used for many written languages, 66 and many written languages can be expressed using different scripts. 67 The same characters are often shown with somewhat different glyphs 68 (shapes) for display of a text depending on the font used, the 69 automatic shaping applied, or the automatic formation of ligatures. In 70 addition, the same characters can be shown with somewhat different 71 glyphs (shapes) for display of a text depending on the language being 72 used, even within the same font or through automatic font change. 74 A character is a member of a set of elements used for organization, 75 control, or representation of textual data. 77 A graphic character is a character, other than a control function, 78 that has a visual representation normally handwritten, printed, or 79 displayed. 81 Characters mentioned in this document are identified by their position 82 in the Unicode [UNICODE] character set. This character set is also 83 known as the UCS [ISO10646]. The notation U+12AB, for example, indicates 84 the character at position 12AB (hexadecimal) in the Unicode character 85 set. Note that the use of this notation is not an indication of a 86 requirement to use Unicode. 88 Examples quoted in this document should be considered as a method to 89 further explain the meanings and principles adopted by the document. It 90 is not a requirement for the protocol to satisfy the examples. 92 Unicode Technical Report 17 [UTR17] defines a character encoding 93 model in several levels (much of the text below is quoted from 94 Unicode Technical Report 17 [UTR17]): 96 1. A abstract character repertoire (ACR) is defined as the set of 97 abstract characters to be encoded, normally a familiar alphabet 98 or symbol set. The word abstract just means that these objects 99 are defined by convention (such as the 26 letters of the English 100 alphabet, uppercase and lowercase forms). Examples: the ASCII 101 repertoire, the Latin-15 repertoire, the JIS X 0208 repertoire, 102 the UCS repertiore (of a particular version). 104 2. A coded character set (CCS) is defined to be a mapping from a 105 set of abstract characters to the set of non-negative integers. 106 This range of integers need not be contiguous. An abstract 107 character is defined to be in a coded character set if the coded 108 character set maps from it to an integer. That integer is said 109 to be the code point for the abstract character. That abstract 110 character is then an encoded character. Examples: ASCII, Latin-15, 111 JIS X 0208, the UCS. 113 3. A character encoding form (CEF) is a mapping from the set of integers 114 used in a CCS to the set of sequences of code units. A code unit 115 is an integer occupying a specified binary width in a computer 116 architecture, such as a septet, an octet, or a 16-bit unit. The 117 encoding form enables character representation as actual data in 118 a computer. The sequences of code units do not necessarily have the 119 same length. Examples: ASCII, Latin-15, Shift-JIS, UTF-16, UTF-8. 121 4. A character encoding scheme (CES) is a mapping of code units into 122 serialized octet sequences. Character encoding schemes are relevant 123 to the issue of cross-platform persistent data involving code units 124 wider than a byte, where byte-swapping may be required to put data 125 into the byte polarity canonical for a particular platform. 127 The CES may involve two or more CCS's, and may include code units 128 (e.g. single shifts, SI/SO, or escape sequences) that are not part 129 of the CCS per se, but which are defined by the character encoding 130 architecture and which may require an external registry of particular 131 values (as for the ISO 2022 escape sequences). In such a case, the 132 CES is called a compound CES. (A CES that only involves a single 133 CCS is called a simple CES.) 135 Examples: ASCII, Latin-15, Shift-JIS, UTF-16BE, UTF-16LE, UTF-8. 137 5. The mapping from an abstract character repertoire (ACR) to a 138 serialised sequence of octets is called a Character Map (CM). A simple 139 character map thus implicitly includes a CCS, a CEF, and a CES, 140 mapping from abstract characters to code units to octets. A compound 141 character map includes a compound CES, and thus includes more than one 142 CCS and CEF. In that case, the abstract character repertoire for the 143 character map is the union of the repertoires covered by the coded 144 character sets involved. 146 Character Maps are the things that in the IAB architecture get IANA 147 charset identifiers. A sequence of encoded characters must be 148 unambiguously mapped onto a sequence of octets by the charset. The 149 charset must be specified in all instances, as in Internet 150 protocols, where textual content is treated as a ordered sequence 151 of octets, and where the textual content must be reconstructible 152 from that sequence of octets. Charset names are registered by the 153 IANA according to procedures documented in [RFC2278]. In many cases, 154 the same name is used for both a character map and for a character 155 encoding scheme, such as UTF-16BE. Typically this is done for simple 156 character maps when such usage is clear from context. 158 6. A transfer encoding syntax (TES) is a reversible transform of encoded 159 data which may (or may not) include textual data represented in 160 one or more character encoding schemes. Examples: 8bit, 161 Quoted-Printable, BASE64, UTF-7 (defunct), (UTF-5, and RACE). 163 1.2 Description of the Domain Name System 165 The Domain Name System is defined by [RFC1034] and [RFC1035], with 166 clarifications, extensions and modifications given in [RFC1123], 167 [RFC1996], [RFC2181], and others. Of special importance here is the 168 security extensions described in [RFC2535] and companions. 170 Over the years, many different words have been used to describe the 171 components of resource naming on the Internet (e.g., URI, URN); to make 172 certain that the set of terms used in this document are well-defined and 173 non-ambiguous, the definitions are given here. 175 A master server for a zone holds the main copy of that zone. This copy 176 is sometimes stored in a zone file. A slave server for a zone holds a 177 complete copy of the records for that zone. Slave servers MAY be either 178 authorized by the zone owner (secondary servers) or unauthorized 179 (so-called "stealth secondaries"). Master and authorized slave servers 180 are listed in the NS records for the zone, and are termed 181 "authoritative" servers. In many contexts outside this document, the 182 term "primary" is used interchangeably with "master" and "secondary" is 183 used interchangeably with "slave". 185 A caching server holds temporary copies of DNS records; it uses records 186 to answer queries about domain names. Further explanation of these terms 187 can be found in [RFC1034] and [RFC1996]. 189 DNS names can be represented in multiple forms, with different 190 properties for internationalization. The most important ones are: 192 - Domain name: The binary representation of a name used internally in 193 the DNS protocol. This consists of a series of components of 1-63 194 octets, with an overall length limited to 255 octets (including the 195 length fields). 197 - Master file format domain name: This is a representation of the name 198 as a sequence of characters in some character sets; the common 199 convention (derived from [RFC1035] section 5.1) is to represent the 200 octets of the name as ASCII characters where the octet is in the set 201 corresponding to the ASCII values for [a-zA-Z0-9-], using an escape 202 mechanism (\x or \NNN) where not, and separating the components of the 203 name by the dot character ("."). 205 The form specified for most protocols using the DNS is a limited form of 206 the master file format domain name. This limited form is defined in 207 [RFC1034] Section 3.5 and [RFC1123]. In most implementations of 208 applications today, domain names in the Internet have been limited to 209 the much more restricted forms used, e.g., in email. Those names are 210 limited to the upper- and lower-case letters a-z (interpreted in a 211 case-independent fashion), the digits, and the hyphen-minus, all in 212 ASCII. 214 1.3 Definition of "hostname" and "Internationalized Domain Name" 216 In the DNS protocols, a name is referred to as a sequence of octets. 217 However, when discussing requirements for internationalized domain 218 names, what we are looking for are ways to represent characters that 219 are meaningful for humans. 221 In this document, this is referred to as a "hostname". While this term 222 has been used for many different purposes over the years, it is used 223 here in the sense of sequence of characters (not octets) representing a 224 domain name conforming to the limited hostname syntax [RFC952]. 226 This document attempts to define the requirements for an 227 "Internationalized Domain Name" (IDN). This is defined as a sequence of 228 characters that can be used in the context of functions where a hostname 229 is used today, but contains one or more characters that are outside the 230 set of characters specified as legal characters for host names 231 [RFC1123]. 233 1.4 A multilayer model of the DNS function 235 The DNS can be seen as a multilayer function: 237 - The bottom layer is where the packets are passed across the Internet 238 in a DNS query and a DNS response. At this level, what matters is 239 the format and meaning of bits and octets in a DNS packet. 241 - Above that is the "DNS service", created by an infrastructure of DNS 242 servers, NS records that point to those DNS servers, that is 243 pointed to by the root servers (listed in the "root cache file" on 244 each DNS server often called "named.cache"). It is at this level 245 that the statement "the DNS has a single root" [RFC2826] makes 246 sense, but still, what are being transferred are octets, not 247 characters. 249 - Interfacing to the user is a service layer, often called "the resolver 250 library", and often embedded in the operating system or system 251 libraries of the client machines. It is at the top of this layer that 252 the API calls commonly known as "gethostbyname" and "gethostbyaddress" 253 reside. These calls are modified to support IPv6 [RFC2553]. A 254 conceptually similar layer exists in authoritative DNS servers, 255 comprising the parts that generate "meaningful" strings in DNS files. 256 Due to the popularity of the "master file" format, this layer often 257 exists only in the administrative routines of the service maintainers. 259 - The user of this layer (resolver library) is the application programs 260 that use the DNS, such as mailers, mail servers, Web clients, Web 261 servers, Web caches, IRC clients, FTP clients, distributed file 262 systems, distributed databases, and almost all other applications on 263 TCP/IP. 265 Graphically, one can illustrate it like this: 267 +---------------+ +---------------------+ 268 | Application | | (Base data) | 269 +---------------+ +---------------------+ 270 | Application service interface | 271 | For ex. GethostbyXXXX interface | (no standard) 272 +---------------+ +---------------------+ 273 | Resolver | | Auth DNS server | 274 +---------------+ +---------------------+ 275 | <----- DNS service interface -----> | 276 +------------------------------------------------------------------+ 277 | DNS service | 278 | +-----------------------+ +--------------------+ | 279 | | Forwarding DNS server | | Caching DNS server | | 280 | +-----------------------+ +--------------------+ | 281 | | 282 | +-------------------------+ | 283 | | Parent-zone DNS servers | | 284 | +-------------------------+ | 285 | | 286 | +-------------------------+ | 287 | | Root DNS servers | | 288 | +-------------------------+ | 289 | | 290 +------------------------------------------------------------------+ 292 1.5 Service model of the DNS 294 The Domain Name Service is used for multiple purposes, each of which is 295 characterized by what it puts into the system (the query) and what it 296 expects as a result (the reply). 298 The most used ones in the current DNS are: 300 - Hostname-to-address service (A, AAAA, A6): Enter a hostname, and get 301 back an IPv4 or IPv6 address. 303 - Hostname-to-Mail server service (MX): As above, but the expected 304 return value is a hostname and a priority for SMTP servers. 306 - Address-to-hostname service (PTR): Enter an IPv4 or IPv6 address (in 307 in-addr.arpa or ip6.arpa form respectively) and get back a hostname. 309 - Domain delegation service (NS). Enter a domain name and get back 310 nameserver records (designated hosts which provide authoritive 311 nameservice) for the domain. 313 New services are being defined, either as entirely new services (IPv6 to 314 hostname mapping using binary labels) or as embellishments to other 315 services (DNSSEC returning information about whether a given DNS service 316 is performed securely or not). 318 These services exist, conceptually, at the Application/Resolver 319 interface, NOT at the DNS-service interface. This document attempts to 320 set requirements for an equivalent of the "used services" given above, 321 where "hostname" is replaced by "Internationalized Domain Name". This 322 doesn't preclude the fact that IDN should work with any kind of DNS 323 queries. IDN is a new service. Since existing protocols like SMTP or 324 HTTP use the old service, it is a matter of great concern how the new 325 and old services work together, and how other protocols can take 326 advantage of the new service. 328 2. General Requirements 330 These requirements address two concerns: The service offered to the 331 users (the application service), and the protocol extensions, if needed, 332 added to support this service. 334 In the requirements, we attempt to use the term "service" whenever a 335 requirement concerns the service, and "protocol" whenever a requirement 336 is believed to constrain the possible implementation. 338 2.1 Compatibility and Interoperability 340 [1] The DNS is essential to the entire Internet. Therefore, the service 341 MUST NOT damage present DNS protocol interoperability. It MUST make the 342 minimum number of changes to existing protocols on all layers of the 343 stack. It MUST continue to allow any system anywhere that implements 344 the IDN specification to resolve any internationalized domain name. 346 [2] The service MUST preserve the basic concept and facilities of domain 347 names as described in [RFC1034]. It MUST maintain a single, global, 348 universal, and consistent hierarchical namespace. 350 [3] The DNS protocol (the packet formats that go on the wire) MUST 351 NOT limit the codepoints that can be used. A service defined on top of 352 the DNS, for instance the IDN-to-address function, MAY limit the 353 codepoints that can be used. The service descriptions MUST describe 354 what limitations are imposed. 356 [4] The protocol MUST work for all features of DNS, IPv4, and 357 IPv6. The protocol MUST NOT allow an IDN to be returned to a requestor 358 that requests the IP-to-(old)-domain-name mapping service. 360 [5] The same name resolution request MUST generate the same response, 361 regardless of the location or localization settings in the resolver, in 362 the master server, and in any slave servers involved in the resolution 363 process. 365 [6] The protocol MUST NOT require that the current DNS cache 366 servers be modified to support IDN. If a cache server can have 367 additional functionality to support IDN better, this additional 368 functionality MUST NOT cause problems for resolving correctly 369 functioning current domain names. 371 [7] A caching server MUST NOT return data in response to a query that 372 would not have been returned if the same query had been presented to an 373 authoritative server. This applies fully for the cases when: 375 - The caching server does not know about IDN 376 - The caching server implements the whole specification 377 - The caching server implements a valid subset of the specification 379 [8] The service MAY modify the DNS protocol [RFC1035] and other related 380 work undertaken by the [DNSEXT] WG. However, these changes SHOULD be as 381 small as possible and any changes SHOULD be coordinated with the 382 [DNSEXT] WG. 384 [9] The protocol supporting the service SHOULD be as simple as possible 385 from the user's perspective. Ideally, users SHOULD NOT realize that IDN 386 was added on to the existing DNS. 388 [10] The best solution is one that maintains maximum feasible 389 compatibility with current DNS standards as long as it meets the other 390 requirements in this document. 392 [11] The protocol should handle with care new revisions of the CCS. 393 Undefined codepoints should not be allowed unless a new revision of 394 the protocol can handle it. Protocol revisions should be tagged. 396 2.2 Internationalization 398 [12] Internationalized characters MUST be allowed to be represented and 399 used in DNS names and records. The protocol MUST specify what charset is 400 used when resolving domain names and how characters are encoded in DNS 401 records. 403 [13] Codepoints SHOULD be from the Universal Set as defined in 404 ISO-10646 or Unicode. The specifics of versions MUST be defined in the 405 proposed solution. If multiple charsets are allowed, each charset MUST 406 be tagged and conform to [RFC2277]. 408 [14] The protocol MUST NOT reject any non-IDN characters (to be 409 defined) in any DNS queries or responses. 411 [15] The protocol SHOULD NOT invent a new CCS for the purpose of IDN 412 only and SHOULD use existing CES. The charset(s) chosen SHOULD also be 413 non-ambiguous. 415 [16] The protocol SHOULD NOT make any assumptions about the location 416 in a domain name where internationalization might appear. In other 417 words, it SHOULD NOT differentiate between any part of a domain name 418 because this MAY impose restrictions on future internationalization 419 efforts. For example, the TLDs can be internationalized. 421 [17] The protocol also SHOULD NOT make any localized restrictions in the 422 protocol. For example, an IDN implementation which only allows domain 423 names to use a single local script would immediately restrict 424 multinational organization. 426 [18] While there are a wide range of devices that use the DNS and a wide 427 range of characteristics of international scripts and methods of 428 domain name input and display, IDN is only concerned with the 429 protocol. Therefore, there MUST be a single way of encoding an 430 internationalized domain name within the DNS. 432 2.3 Canonicalization 434 Matching rules are a complicated process for IDN. Canonicalization 435 of characters MUST follow precise and predictable rules to ensure 436 consistency. [CHARREQ] is RECOMMENDED as a guide on canonicalization. 438 The DNS has to match a host name in a request with a host name held 439 in one or more zones. It also needs to sort names into order. It is 440 expected that some sort of canonicalization algorithm will be used as 441 the first step of this process. This section discusses some of the 442 properties which will be REQUIRED of that algorithm. 444 [19] To achieve interoperability, canonicalization MUST be done at a 445 single well-defined place in the DNS resolution process. The protocol 446 MUST specify canonicalization; it MUST specify exactly where in the 447 DNS that canonicalization happens and does not happen; it MUST specify 448 how additions to ISO 10646 will affect the stability of the DNS and 449 the amount of work done on the root DNS servers. 451 [20] The canonicalization algorithm MAY specify operations for case, 452 ligature, and punctuation folding. 454 [21] In order to retain backwards compatibility with the current DNS, 455 the service MUST retain the case-insensitive comparison for [US-ASCII] 456 as specified in [RFC1035]. For example, Latin capital letter A (U+0041) 457 MUST match Latin small letter a (U+0061). [UTR21] describes some of 458 the issues with case mapping. Case-insensitivity for non [US-ASCII] 459 MUST be discussed in the protocol proposal. 461 [22] Case folding MUST be locale independent. If it were 462 locale-dependent, then different clients would get different results. 463 For example, Latin capital letter I (U+0049) case folded to lower case 464 in the Turkish context will become Latin small letter dotless i 465 (U+0131). But in the English context, it will become Latin small 466 letter i (U+0069). 468 [23] If other canonicalization is done, it MUST be done before the 469 domain name is resolved. Further, the canonicalization MUST be easily 470 upgradable as new languages and writing systems are added. 472 [24] Any conversion (case, ligature folding, punctuation folding, etc) 473 from what the user enters into a client to what the client asks for 474 resolution MUST be done identically on any request from any client. 476 [25] If the charset can be normalized, then it SHOULD be normalized 477 before it is used in IDN. Normalization SHOULD follow [UTR15]. 479 [26] The protocol SHOULD avoid inventing a new normalization form 480 provided a technically sufficient one is available. 482 2.4 Operational Issues 484 [27] Zone files SHOULD remain easily editable. 486 [28] An IDN-capable resolver or server SHALL NOT generate more traffic 487 than a non-IDN-capable resolver or server would when resolving an 488 ASCII-only domain name. The amount of traffic generated when resolving 489 an IDN SHALL be similar to that generated when resolving an ASCII-only 490 name. 492 [29] The service SHOULD NOT add new centralized administration for the 493 DNS. A domain administrator SHOULD be able to create internationalized 494 names as easily as adding current domain names. 496 [30] The protocol MUST work with DNSSEC. The protocol MAY break 497 language sort order. 499 3. Security Considerations 501 Any solution that meets the requirements in this document MUST NOT be 502 less secure than the current DNS. Specifically, the mapping of 503 internationalized host names to and from IP addresses MUST have the 504 same characteristics as the mapping of today's host names. 506 Specifying requirements for internationalized domain names does not 507 itself raise any new security issues. However, any change to the DNS MAY 508 affect the security of any protocol that relies on the DNS or on 509 DNS names. A thorough evaluation of those protocols for security 510 concerns will be needed when they are developed. In particular, IDNs 511 MUST be compatible with DNSSEC and, if multiple charsets or 512 representation forms are permitted, the implications of this name-spoof 513 MUST be throughly understood. 515 4. References 517 [CHARREQ] "Requirements for string identity matching and String 518 Indexing", http://www.w3.org/TR/WD-charreq, July 1998, 519 World Wide Web Consortium. 521 [DNSEXT] "IETF DNS Extensions Working Group", 522 namedroppers@ops.ietf.org, Olafur Gudmundson, Randy Bush. 524 [RFC952] "DoD Internet Host Table Specification", rfc952.txt, 525 October 1985, K. Harrenstien, M.K. Stahl, E.J. Feinler. 527 [RFC1034] "Domain Names - Concepts and Facilities", rfc1034.txt, 528 November 1987, P. Mockapetris. 530 [RFC1035] "Domain Names - Implementation and Specification", 531 rfc1035.txt, November 1987, P. Mockapetris. 533 [RFC1123] "Requirements for Internet Hosts -- Application and 534 Support", rfc1123.txt, October 1989, R. Braden. 536 [RFC1996] "A Mechanism for Prompt Notification of Zone Changes 537 (DNS NOTIFY)", rfc1996.txt, August 1996, P. Vixie. 539 [RFC2119] "Key words for use in RFCs to Indicate Requirement 540 Levels", rfc2119.txt, March 1997, S. Bradner. 542 [RFC2181] "Clarifications to the DNS Specification", rfc2181.txt, 543 July 1997, R. Elz, R. Bush. 545 [RFC2277] "IETF Policy on Character Sets and Languages", 546 rfc2277.txt, January 1998, H. Alvestrand. 548 [RFC2278] "IANA Charset Registration Procedures", rfc2278.txt, 549 January 1998, N. Freed and J. Postel. 551 [RFC2279] "UTF-8, a transformation format of ISO 10646", 552 rfc2279.txt, F. Yergeau, January 1998. 554 [RFC2535] "Domain Name System Security Extensions", rfc2535.txt, 555 March 1999, D. Eastlake. 557 [RFC2553] "Basic Socket Interface Extensions for IPv6", rfc2553.txt, 558 March 1999, R. Gilligan et al. 560 [RFC2825] "A Tangled Web: Issues of I18N, Domain Names, and the 561 Other Internet protocols", rfc2825.txt, May 2000, 562 L. Daigle et al. 564 [RFC2826] "IAB Technical Comment on the Unique DNS Root", 565 rfc2826.txt, May 2000, Internet Architecture Board. 567 [IDNCOMP] "Comparison of Internationalized Domain Name Proposals", 568 draft-ietf-idn-compare-00.txt, June 2000, P. Hoffman. 570 [ISO10646] ISO/IEC 10646-1:2000 (note that an amendment 1 is in 571 preparation), ISO/IEC 10646-2 (in preparation), plus 572 corrigenda and amendments to these standards. 574 [UNICODE] The Unicode Consortium, "The Unicode Standard". Described at 575 http://www.unicode.org/unicode/standard/versions/. 577 [UNICODE30] The Unicode Consortium, "The Unicode Standard -- Version 578 3.0", ISBN 0-201-61633-5. Same repertoire as ISO/IEC 579 10646-1:2000. Described at http://www.unicode.org/unicode/ 580 standard/versions/Unicode3.0.html. 582 [US-ASCII] Coded Character Set -- 7-bit American Standard Code for 583 Information Interchange, ANSI X3.4-1986; also: ISO/IEC 584 646 (IRV). 586 [UAX15] "Unicode Normalization Forms", Unicode Standard Annex #15, 587 http://www.unicode.org/unicode/reports/tr15/, 2000-08-31, 588 M. Davis and M. Duerst, Unicode Consortium. 590 [UTR17] "Character Encoding Model", Unicode Technical Report #17, 591 http://www.unicode.org/unicode/reports/tr17/, 2000-08-31, 592 K. Whistler and M. Davis, Unicode Consortium. 594 [UTR21] "Case Mappings", Unicode Technical Report #21, 595 http://www.unicode.org/unicode/reports/tr21/, 2000-09-12, 596 M. Davis, Unicode Consortium. 598 5. Editors' Contact 600 Zita Wenzel, Ph.D. 601 Information Sciences Institute 602 University of Southern California 603 4676 Admiralty Way 604 Marina del Rey, CA 605 90292 USA 606 Tel: +1 310 448 8462 607 Fax: +1 310 823 6714 608 zita@isi.edu 610 James Seng 611 i-DNS.net International Pte Ltd. 612 8 Temesek Boulevand 613 #24-02 Suntec Tower 3 614 Singapore 038988 615 Tel: +65 248 6208 616 Fax: +65 248 6198 617 Email: jseng@pobox.org.sg 619 6. Acknowledgements 621 The editors gratefully acknowledge the contributions of: 623 Harald Tveit Alvestrand 624 Mark Andrews 625 RJ Atkinson 626 Alan Barret 627 Marc Blanchet 628 Randy Bush 629 Andrew Draper 630 Martin Duerst 631 Patrik Faltstrom 632 Ned Freed 633 Olafur Gudmundsson 634 Paul Hoffman 635 Simon Josefsson 636 Kent Karlsson 637 John Klensin 638 Tan Juay Kwang 639 Dongman Lee 640 Bill Manning 641 Dan Oscarsson 642 J. William Semich 643 Yoshiro Yoneda