idnits 2.17.1 draft-ietf-idn-requirements-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 638 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (04 March 2001) is 8454 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 332 -- Looks like a reference, but probably isn't: '2' on line 338 -- Looks like a reference, but probably isn't: '3' on line 352 -- Looks like a reference, but probably isn't: '4' on line 357 -- Looks like a reference, but probably isn't: '5' on line 363 -- Looks like a reference, but probably isn't: '7' on line 371 -- Looks like a reference, but probably isn't: '8' on line 376 -- Looks like a reference, but probably isn't: '10' on line 380 -- Looks like a reference, but probably isn't: '11' on line 386 -- Looks like a reference, but probably isn't: '12' on line 391 -- Looks like a reference, but probably isn't: '14' on line 399 -- Looks like a reference, but probably isn't: '15' on line 403 -- Looks like a reference, but probably isn't: '16' on line 409 -- Looks like a reference, but probably isn't: '17' on line 414 -- Looks like a reference, but probably isn't: '22' on line 432 -- Looks like a reference, but probably isn't: '23' on line 439 -- Looks like a reference, but probably isn't: '24' on line 442 -- Looks like a reference, but probably isn't: '25' on line 449 -- Looks like a reference, but probably isn't: '26' on line 454 -- Looks like a reference, but probably isn't: '27' on line 458 -- Looks like a reference, but probably isn't: '30' on line 462 == Missing Reference: 'UTR15' is mentioned on line 463, but not defined -- Looks like a reference, but probably isn't: '31' on line 466 -- Looks like a reference, but probably isn't: '32' on line 471 -- Looks like a reference, but probably isn't: '33' on line 473 -- Looks like a reference, but probably isn't: '34' on line 479 -- Looks like a reference, but probably isn't: '35' on line 483 -- Looks like a reference, but probably isn't: '36' on line 491 == Unused Reference: 'RFC2119' is defined on line 530, but no explicit reference was found in the text == Unused Reference: 'RFC2279' is defined on line 542, but no explicit reference was found in the text == Unused Reference: 'RFC2825' is defined on line 551, but no explicit reference was found in the text == Unused Reference: 'IDNCOMP' is defined on line 558, but no explicit reference was found in the text == Unused Reference: 'UNICODE30' is defined on line 568, but no explicit reference was found in the text == Unused Reference: 'UAX15' is defined on line 578, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CHARREQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'DNSEXT' ** Obsolete normative reference: RFC 2278 (Obsoleted by RFC 2978) ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) ** Obsolete normative reference: RFC 2553 (Obsoleted by RFC 3493) ** Downref: Normative reference to an Informational RFC: RFC 2825 ** Downref: Normative reference to an Informational RFC: RFC 2826 == Outdated reference: A later version (-01) exists of draft-ietf-idn-compare-00 -- Possible downref: Normative reference to a draft: ref. 'IDNCOMP' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE30' -- Possible downref: Non-RFC (?) normative reference: ref. 'US-ASCII' -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX15' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR17' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR21' Summary: 10 errors (**), 0 flaws (~~), 11 warnings (==), 40 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IETF IDN Working Group Editors Zita Wenzel, James Seng 2 Internet Draft draft-ietf-idn-requirements-04.txt 3 04 October 2000 Expires 04 March 2001 5 Requirements of Internationalized Domain Names 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with 10 all provisions of Section 10 of RFC2026. 12 Internet-Drafts are working documents of the Internet Engineering 13 Task Force (IETF), its areas, and its working groups. Note that 14 other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other 19 documents at any time. It is inappropriate to use Internet- 20 Drafts as reference material or to cite them other than as 21 "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Abstract 31 This document describes the requirement for encoding international 32 characters into DNS names and records. This document is guidance for 33 developing protocols for internationalized domain names. 35 1. Introduction 37 At present, the encoding of Internet domain names is restricted to a 38 subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many 39 other text based items on the Internet have already been at least 40 partially internationalized. It is important for domain names to be 41 similarly internationalized or for an equivalent solution to be found. 42 This document assumes that the most effective solution involves putting 43 non-ASCII names inside some parts of the overall DNS system. 45 This document is being discussed on the "idn" mailing list. To join the 46 list, send a message to with the words 47 "subscribe idn" in the body of the message. Archives of the mailing 48 list can also be found at ftp://ops.ietf.org/pub/lists/idn*. 50 1.1 Definitions and Conventions 52 A language is a way that humans interact. In computerised form, a text 53 in a written language can be expressed as a string of characters. 54 The same set of characters can often be used for many written languages, 55 and many written languages can be expressed using different scripts. 56 The same characters are often shown with somewhat different glyphs 57 (shapes) 58 for display of a text depending on the font used, the automatic shaping 59 applied, or the automatic formation of ligatures. In addition, the same 60 characters can be shown with somewhat different glyphs (shapes) for 61 display 62 of a text depending on the language being used, even within the same 63 font 64 or trough automatic font change. 66 A character is a member of a set of elements used for organization, 67 control, or representation of textual data. 69 A graphic character is a character, other than a control function, 70 that has a visual representation normally handwritten, printed, or 71 displayed. 73 Characters mentioned in this document are identified by their position 74 in the Unicode [UNICODE] character set. This character set is also 75 known as the UCS [ISO10646]. The notation U+12AB, for example, indicates 76 the character at position 12AB (hexadecimal) in the Unicode character 77 set. Note that the use of this notation is not an indication of a 78 requirement to use Unicode. 80 Examples quoted in this document should be considered as a method to 81 further explain the meanings and principles adopted by the document. It 82 is not a requirement for the protocol to satisfy the examples. 84 Unicode Technical Report 17 [UTR17] defines a character encoding 85 model in several levels (much of the text below is quoted from 86 Unicode Technical Report 17 [UTR17]): 88 1. A abstract character repertoire (ACR) is defined as the set of 89 abstract characters to be encoded, normally a familiar alphabet 90 or symbol set. The word abstract just means that these objects 91 are defined by convention (such as the 26 letters of the English 92 alphabet, uppercase and lowercase forms). Examples: the ASCII 93 repertoire, the Latin-15 repertoire, the JIS X 0208 repertoire, 94 the UCS repertiore (of a particular version). 96 2. A coded character set (CCS) is defined to be a mapping from a 97 set of abstract characters to the set of non-negative integers. 98 This range of integers need not be contiguous. An abstract 99 character is defined to be in a coded character set if the coded 100 character set maps from it to an integer. That integer is said 101 to be the code point for the abstract character. That abstract 102 character is then an encoded character. Examples: ASCII, Latin-15, 103 JIS X 0208, the UCS. 105 3. A character encoding form (CEF) is a mapping from the set of integers 106 used in a CCS to the set of sequences of code units. A code unit 107 is an integer occupying a specified binary width in a computer 108 architecture, such as a septet, an octet, or a 16-bit unit. The 109 encoding form enables character representation as actual data in 110 a computer. The sequences of code units do not necessarily have the 111 same length. Examples: ASCII, Latin-15, Shift-JIS, UTF-16, UTF-8. 113 4. A character encoding scheme (CES) is a mapping of code units into 114 serialized octet sequences. Character encoding schemes are relevant 115 to the issue of cross-platform persistent data involving code units 116 wider than a byte, where byte-swapping may be required to put data 117 into the byte polarity canonical for a particular platform. 119 The CES may involve two or more CCS's, and may include code units 120 (e.g. single shifts, SI/SO, or escape sequences) that are not part 121 of the CCS per se, but which are defined by the character encoding 122 architecture and which may require an external registry of particular 123 values (as for the ISO 2022 escape sequences). In such a case, the 124 CES is called a compound CES. (A CES that only involves a single 125 CCS is called a simple CES.) 127 Examples: ASCII, Latin-15, Shift-JIS, UTF-16BE, UTF-16LE, UTF-8. 129 5. The mapping from an abstract character repertoire (ACR) to a 130 serialised 131 sequence of octets is called a Character Map (CM). A simple character 132 map thus implicitly includes a CCS, a CEF, and a CES, mapping from 133 abstract characters to code units to octets. A compound character 134 map includes a compound CES, and thus includes more than one CCS 135 and CEF. In that case, the abstract character repertoire for the 136 character map is the union of the repertoires covered by the coded 137 character sets involved. 139 Character Maps are the things that in the IAB architecture get IANA 140 charset identifiers. A sequence of encoded characters must be 141 unambiguously mapped onto a sequence of octets by the charset. The 142 charset must be specified in all instances, as in Internet 143 protocols, where textual content is treated as a ordered sequence 144 of octets, and where the textual content must be reconstructible 145 from that sequence of octets. Charset names are registered by the 146 IANA according to procedures documented in [RFC2278]. In many cases, 147 the same name is used for both a character map and for a character 148 encoding scheme, such as UTF-16BE. Typically this is done for simple 149 character maps when such usage is clear from context. 151 6. A transfer encoding syntax (TES) is a reversible transform of encoded 152 data which may (or may not) include textual data represented in 153 one or more character encoding schemes. Examples: 8bit, 154 Quoted-Printable, BASE64, UTF-7 (defunct), (UTF-5, and RACE). 156 1.2 Description of the Domain Name System 158 The Domain Name System is defined by [RFC1034] and [RFC1035], with 159 clarifications, extensions and modifications given in [RFC1123], 160 [RFC1996], [RFC2181], and others. Of special importance here is the 161 security extensions described in [RFC2535] and companions. 163 Over the years, many different words have been used to describe the 164 components of resource naming on the Internet (e.g., URI, URN); to make 165 certain that the set of terms used in this document are well-defined and 166 non-ambiguous, the definitions are given here. 168 A master server for a zone holds the main copy of that zone. This copy 169 is sometimes stored in a zone file. A slave server for a zone holds a 170 complete copy of the records for that zone. Slave servers MAY be either 171 authorized by the zone owner (secondary servers) or unauthorized 172 (so-called "stealth secondaries"). Master and authorized slave servers 173 are listed in the NS records for the zone, and are termed 174 "authoritative" servers. In many contexts, outside this document the 175 term "primary" is used interchangeably with "master" and "secondary" is 176 used interchangeably with "slave". 178 A caching server holds temporary copies of DNS records; it uses records 179 to answer queries about domain names. Further explanation of these terms 180 can be found in [RFC1034] and [RFC1996]. 182 DNS names can be represented in multiple forms, with different 183 properties for internationalization. The most important ones are: 185 - Domain name: The binary representation of a name used internally in 186 the DNS protocol. This consists of a series of components of 1-63 187 octets, with an overall length limited to 255 octets (including the 188 length fields). 190 - Master file format domain name: This is a representation of the name 191 as a sequence of characters in some character sets; the common 192 convention (derived from [RFC1035] section 5.1) is to represent the 193 octets of the name as ASCII characters where the octet is in the set 194 corresponding to the ASCII values for [a-zA-Z0-9-], using an escape 195 mechanism (\x or \NNN) where not, and separating the components of the 196 name by the dot character ("."). 198 The form specified for most protocols using the DNS is a limited form of 199 the master file format domain name. This limited form is defined in 200 [RFC1034] Section 3.5 and [RFC1123]. In most implementations of 201 applications today, domain names in the Internet have been limited to 202 the much more restricted forms used, e.g., in email. Those names are 203 limited to the upper- and lower-case letters a-z (interpreted in a 204 case-independent fashion), the digits, and the hyphen-minus, all in 205 ASCII. 207 1.3 Definition of "hostname" and "Internationalized Domain Name" 209 In the DNS protocols, a name is referred to as a sequence of octets. 210 However, when discussing requirements for internationalized domain 211 names, what we are looking for are ways to represent characters that 212 are meaningful for humans. 214 In this document, this is referred to as a "hostname". While this term 215 has been used for many different purposes over the years, it is used 216 here in the sense of "sequence of characters (not octets) representing a 217 domain name conforming to the limited hostname syntax". 219 This document attempts to define the requirements for an 220 "Internationalized Domain Name" (IDN). This is defined as a sequence of 221 characters that can be used in the context of functions where a hostname 222 is used today, but contains one or more characters that are outside the 223 set of characters specified as legal characters for host names. 225 1.4 A multilayer model of the DNS function 227 The DNS can be seen as a multilayer function: 229 - The bottom layer is where the packets are passed across the Internet 230 in a DNS query and a DNS response. At this level, what matters is 231 the format and meaning of bits and octets in a DNS packet. 233 - Above that is the "DNS service", created by an infrastructure of DNS 234 servers, NS records that point to those DNS servers, that is 235 pointed to by the root servers (listed in the "root cache file" on 236 each DNS 237 server, often called "named.cache". It is at this level that the 238 statement "the DNS has a single root" [RFC2826] makes sense, but 239 still, what are being transferred are octets, not characters. 241 - Interfacing to the user is a service layer, often called "the resolver 242 library", and often embedded in the operating system or system 243 libraries of the client machines. It is at the top of this layer that 244 the API calls commonly known as "gethostbyname" and "gethostbyaddress" 245 reside. These calls are modified to support IPv6 [RFC2553]. A 246 conceptually similar layer exists in authoritative DNS servers, 247 comprising the parts that generate "meaningful" strings in DNS files. 248 Due to the popularity of the "master file" format, this layer often 249 exists only in the administrative routines of the service maintainers. 251 - The user of this layer (resolver library) is the application programs 252 that use the DNS, such as mailers, mail servers, Web clients, Web 253 servers, Web caches, IRC clients, FTP clients, distributed file 254 systems, distributed databases, and almost all other applications on 255 TCP/IP. 257 Graphically, one can illustrate it like this: 259 +---------------+ +---------------------+ 260 | Application | | (Base data) | 261 +---------------+ +---------------------+ 262 | Application service interface | 263 | For ex. GethostbyXXXX interface | (no standard) 264 +---------------+ +---------------------+ 265 | Resolver | | Auth DNS server | 266 +---------------+ +---------------------+ 267 | <----- DNS service interface -----> | 268 +------------------------------------------------------------------+ 269 | DNS service | 270 | +-----------------------+ +--------------------+ | 271 | | Forwarding DNS server | | Caching DNS server | | 272 | +-----------------------+ +--------------------+ | 273 | | 274 | +-------------------------+ | 275 | | Parent-zone DNS servers | | 276 | +-------------------------+ | 277 | | 278 | +-------------------------+ | 279 | | Root DNS servers | | 280 | +-------------------------+ | 281 | | 282 +------------------------------------------------------------------+ 284 1.5 Service model of the DNS 286 The Domain Name Service is used for multiple purposes, each of which is 287 characterized by what it puts into the system (the query) and what it 288 expects as a result (the reply). 290 The most used ones in the current DNS are: 292 - Hostname-to-address service (A, AAAA, A6): Enter a hostname, and get 293 back an IPv4 or IPv6 address. 295 - Hostname-to-Mail server service (MX): As above, but the expected 296 return value is a hostname and a priority for SMTP servers. 298 - Address-to-hostname service (PTR): Enter an IPv4 or IPv6 address (in 299 in-addr.arpa or ip6.int form respectively) and get back a hostname. 301 - Domain delegation service (NS). Enter a domain name and get back 302 nameserver records (designated hosts who provides authoritive 303 nameservice) for the domain. 305 New services are being defined, either as entirely new services (IPv6 to 306 hostname mapping using binary labels) or as embellishments to other 307 services (DNSSEC returning information about whether a given DNS service 308 is performed securely or not). 310 These services exist, conceptually, at the Application/Resolver 311 interface, NOT at the DNS-service interface. This document attempts to 312 set requirements for an equivalent of the "used services" given above, 313 where "hostname" is replaced by "Internationalized Domain Name". This 314 doesn't preclude the fact that IDN should work with any kind of DNS 315 queries. IDN is a new service. Since existing protocols like SMTP or 316 HTTP use the old service, it is a matter of great concern how the new 317 and old services work together, and how other protocols can take 318 advantage of the new service. 320 2. General Requirements 322 These requirements address two concerns: The service offered to the 323 users (the application service), and the protocol extensions, if needed, 324 added to support this service. 326 In the requirements, we attempt to use the term "service" whenever a 327 requirement concerns the service, and "protocol" whenever a requirement 328 is believed to constrain the possible implementation. 330 2.1 Compatibility and Interoperability 332 [1] The DNS is essential to the entire Internet. Therefore, the service 333 MUST NOT damage present DNS protocol interoperability. It MUST make the 334 minimum number of changes to existing protocols on all layers of the 335 stack. It MUST continue to allow any system anywhere to resolve any 336 internationalized domain name. 338 [2] The service MUST preserve the basic concept and facilities of domain 339 names as described in [RFC1034]. It MUST maintain a single, global, 340 universal, and consistent hierarchical namespace. 342 [2.5] The DNS protocol (the packet formats that go on the wire) MUST 343 NOT limit the codepoints that can be used. A service defined on top of 344 the DNS, for instance the IDN-to-address function, MAY limit the 345 codepoints that can be used. The service descriptions MUST describe 346 what limitations are imposed. 348 [2.6] The protocol MUST work for all features of DNS, IPv4, and 349 IPv6. The protocol MUST NOT allow an IDN to be returned to a requestor 350 that requests the IP-to-(old)-domain-name mapping service. 352 [3] The same name resolution request MUST generate the same response, 353 regardless of the location or localization settings in the resolver, in 354 the master server, and in any slave servers involved in the resolution 355 process. 357 [4] The protocol MUST NOT require that the current DNS cache 358 servers be modified to support IDN. If a cache server can have 359 additional functionality to support IDN better, this additional 360 functionality MUST NOT cause problems for resolving correctly 361 functioning current domain names. 363 [5] A caching server MUST NOT return data in response to a query that 364 would not have been returned if the same query had been presented to an 365 authoritative server. This applies fully for the cases when: 367 - The caching server does not know about IDN 368 - The caching server implements the whole specification 369 - The caching server implements a valid subset of the specification 371 [7] The service MAY modify the DNS protocol [RFC1035] and other related 372 work undertaken by the [DNSEXT] WG. However, these changes SHOULD be as 373 small as possible and any changes SHOULD be coordinated with the 374 [DNSEXT] WG. 376 [8] The protocol supporting the service SHOULD be as simple as possible 377 from the user's perspective. Ideally, users SHOULD NOT realize that IDN 378 was added on to the existing DNS. 380 [10] The best solution is one that maintains maximum feasible 381 compatibility with current DNS standards as long as it meets the other 382 requirements in this document. 384 2.2 Internationalization 386 [11] Internationalized characters MUST be allowed to be represented and 387 used in DNS names and records. The protocol MUST specify what charset is 388 used when resolving domain names and how characters are encoded in DNS 389 records. 391 [12] Codepoints SHOULD be from the Universal Set as defined in 392 ISO-10646 or Unicode. The specifics of versions MUST be defined in the 393 proposed solution. If multiple charsets are allowed, each charset MUST 394 be tagged and conform to [RFC2277]. 396 [12.5] The protocol MUST NOT reject any non-IDN characters (to be 397 defined) in any queries or responses. 399 [14] The protocol SHOULD NOT invent a new CCS for the purpose of IDN 400 only and SHOULD use existing CES. The charset(s) chosen SHOULD also be 401 non-ambiguous. 403 [15] The protocol SHOULD NOT make any assumptions about the location 404 in a domain name where internationalization might appear. In other 405 words, it SHOULD NOT differentiate between any part of a domain name 406 because this MAY impose restrictions on future internationalization 407 efforts. For example, the TLDs can be internationalized. 409 [16] The protocol also SHOULD NOT make any localized restrictions in the 410 protocol. For example, an IDN implementation which only allows domain 411 names to use a single local script would immediately restrict 412 multinational organization. 414 [17] While there are a wide range of devices that use the DNS and a wide 415 range of characteristics of international scripts and methods of 416 domain name input and display, IDN is only concerned with the 417 protocol. Therefore, there MUST be a single way of encoding an 418 internationalized domain name within the DNS. 420 2.4 Canonicalization 422 Matching rules are a complicated process for IDN. Canonicalization 423 of characters MUST follow precise and predictable rules to ensure 424 consistency. [CHARREQ] is RECOMMENDED as a guide on canonicalization. 426 The DNS has to match a host name in a request with a host name held 427 in one or more zones. It also needs to sort names into order. It is 428 expected that some sort of canonicalization algorithm will be used as 429 the first step of this process. This section discusses some of the 430 properties which will be REQUIRED of that algorithm. 432 [22] To achieve interoperability, canonicalization MUST be done at a 433 single well-defined place in the DNS resolution process. The protocol 434 MUST specify canonicalization; it MUST specify exactly where in the 435 DNS that canonicalization happens and does not happen; it MUST specify 436 how additions to ISO 10646 will affect the stability of the DNS and 437 the amount of work done on the root DNS servers. 439 [23] The canonicalization algorithm MAY specify operations for case, 440 ligature, and punctuation folding. 442 [24] In order to retain backwards compatibility with the current DNS, 443 the service MUST retain the case-insensitive comparison for [US-ASCII] 444 as specified in [RFC1035]. For example, Latin capital letter A (U+0041) 445 MUST match Latin small letter a (U+0061). [UTR21] describes some of 446 the issues with case mapping. Case-insensitivity for non [US-ASCII] 447 MUST be discussed in the protocol proposal. 449 [25] Case folding MUST be locale independent. For example, Latin 450 capital letter I (U+0049) case folded to lower case in the Turkish 451 context will become Latin small letter dotless i (U+0131). But in the 452 English context, it will become Latin small letter i (U+0069). 454 [26] If other canonicalization is done, it MUST be done before the 455 domain name is resolved. Further, the canonicalization MUST be easily 456 upgradable as new languages and writing systems are added. 458 [27] Any conversion (case, ligature folding, punctuation folding, etc) 459 from what the user enters into a client to what the client asks for 460 resolution MUST be done identically on any request from any client. 462 [30] If the charset can be normalized, then it SHOULD be normalized 463 before it is used in IDN. Normalization SHOULD follow [UTR15]. 464 (conflict) 466 [31] The protocol SHOULD avoid inventing a new normalization form 467 provided a technically sufficient one is available. 469 2.5 Operational Issues 471 [32] Zone files SHOULD remain easily editable. 473 [33] An IDN-capable resolver or server SHALL NOT generate more traffic 474 than a non-IDN-capable resolver or server would when resolving an 475 ASCII-only domain name. The amount of traffic generated when resolving 476 an IDN SHALL be similar to that generated when resolving an ASCII-only 477 name. 479 [34] The service SHOULD NOT add new centralized administration for the 480 DNS. A domain administrator SHOULD be able to create internationalized 481 names as easily as adding current domain names. 483 [35] Within a single zone, the zone manager MUST be able to define 484 equivalence rules that suit the purpose of the zone, such as, but not 485 limited to, and not necessarily, non-ASCII case folding, Unicode 486 normalizations (if Unicode is chosen), Cyrillic/Greek/Latin folding, or 487 traditional/simplified Chinese equivalence. Such defined equivalences 488 MUST NOT remove equivalences that are assumed by (old or 489 local-rule-ignorant) caches. 491 [36] The protocol MUST work with DNSSEC. 493 3. Security Considerations 495 Any solution that meets the requirements in this document MUST NOT be 496 less secure than the current DNS. Specifically, the mapping of 497 internationalized host names to and from IP addresses MUST have the 498 same characteristics as the mapping of today's host names. 500 Specifying requirements for internationalized domain names does not 501 itself raise any new security issues. However, any change to the DNS MAY 502 affect the security of any protocol that relies on the DNS or on 503 DNS names. A thorough evaluation of those protocols for security 504 concerns will be needed when they are developed. In particular, IDNs 505 MUST be compatible with DNSSEC and, if multiple charsets or 506 representation forms are permitted, the implications of this name-spoof 507 MUST be throughly understood. 509 4. References 511 [CHARREQ] "Requirements for string identity matching and String 512 Indexing", http://www.w3.org/TR/WD-charreq, July 1998, 513 World Wide Web Consortium. 515 [DNSEXT] "IETF DNS Extensions Working Group", 516 namedroppers@internic.net, Olafur Gudmundson, Randy Bush. 518 [RFC1034] "Domain Names - Concepts and Facilities", rfc1034.txt, 519 November 1987, P. Mockapetris. 521 [RFC1035] "Domain Names - Implementation and Specification", 522 rfc1035.txt, November 1987, P. Mockapetris. 524 [RFC1123] "Requirements for Internet Hosts -- Application and 525 Support", rfc1123.txt, October 1989, R. Braden. 527 [RFC1996] "A Mechanism for Prompt Notification of Zone Changes 528 (DNS NOTIFY)", rfc1996.txt, August 1996, P. Vixie. 530 [RFC2119] "Key words for use in RFCs to Indicate Requirement 531 Levels", rfc2119.txt, March 1997, S. Bradner. 533 [RFC2181] "Clarifications to the DNS Specification", rfc2181.txt, 534 July 1997, R. Elz, R. Bush. 536 [RFC2277] "IETF Policy on Character Sets and Languages", 537 rfc2277.txt, January 1998, H. Alvestrand. 539 [RFC2278] "IANA Charset Registration Procedures", rfc2278.txt, 540 January 1998, N. Freed and J. Postel. 542 [RFC2279] "UTF-8, a transformation format of ISO 10646", 543 rfc2279.txt, F. Yergeau, January 1998. 545 [RFC2535] "Domain Name System Security Extensions", rfc2535.txt, 546 March 1999, D. Eastlake. 548 [RFC2553] "Basic Socket Interface Extensions for IPv6", rfc2553.txt, 549 March 1999, R. Gilligan et al. 551 [RFC2825] "A Tangled Web: Issues of I18N, Domain Names, and the 552 Other Internet protocols", rfc2825.txt, May 2000, 553 L. Daigle et al. 555 [RFC2826] "IAB Technical Comment on the Unique DNS Root", 556 rfc2826.txt, May 2000, Internet Architecture Board. 558 [IDNCOMP] "Comparison of Internationalized Domain Name Proposals", 559 draft-ietf-idn-compare-00.txt, June 2000, P. Hoffman. 561 [ISO10646] ISO/IEC 10646-1:2000 (note that an amendment 1 is in 562 preparation), ISO/IEC 10646-2 (in preparation), plus 563 corrigenda and amendments to these standards. 565 [UNICODE] The Unicode Consortium, "The Unicode Standard". Described at 566 http://www.unicode.org/unicode/standard/versions/. 568 [UNICODE30] The Unicode Consortium, "The Unicode Standard -- Version 569 3.0", ISBN 0-201-61633-5. Same repertoire as ISO/IEC 570 10646-1:2000. Described at 572 http://www.unicode.org/unicode/standard/versions/Unicode3.0.html. 574 [US-ASCII] Coded Character Set -- 7-bit American Standard Code for 575 Information Interchange, ANSI X3.4-1986; also: ISO/IEC 576 646 (IRV). 578 [UAX15] "Unicode Normalization Forms", Unicode Standard Annex #15, 579 http://www.unicode.org/unicode/reports/tr15/, 2000-08-31, 580 M. Davis and M. Duerst, Unicode Consortium. 582 [UTR17] "Character Encoding Model", Unicode Technical Report #17, 583 http://www.unicode.org/unicode/reports/tr17/, 2000-08-31, 584 K. Whistler and M. Davis, Unicode Consortium. 586 [UTR21] "Case Mappings", Unicode Technical Report #21, 587 http://www.unicode.org/unicode/reports/tr21/, 2000-09-12, 588 M. Davis, Unicode Consortium. 590 5. Editors' Contact 592 Zita Wenzel, Ph.D. 593 Information Sciences Institute 594 University of Southern California 595 4676 Admiralty Way 596 Marina del Rey, CA 597 90292 USA 598 Tel: +1 310 448 8462 599 Fax: +1 310 823 6714 600 zita@isi.edu 602 James Seng 603 8 Temesek Boulevand 604 #24-02 Suntec Tower 3 605 Singapore 038988 606 Tel: +65 248 6208 607 Fax: +65 248 6198 608 Email: jseng@pobox.org.sg 610 6. Acknowledgements 612 The editors gratefully acknowledge the contributions of: 614 Harald Tveit Alvestrand 615 Mark Andrews 616 RJ Atkinson 617 Alan Barret 618 Randy Bush 619 Andrew Draper 620 Martin Duerst 621 Patrik Faltstrom 622 Ned Freed 623 Olafur Gudmundsson 624 Paul Hoffman 625 Simon Josefsson 626 Kent Karlsson 627 John Klensin 628 Tan Juay Kwang 629 Dongman Lee 630 Bill Manning 631 Dan Oscarsson 632 J. William Semich 633 James Seng