idnits 2.17.1 draft-ietf-idn-requirements-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 7 longer pages, the longest (page 1) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Mar 2000) is 8808 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'UTR-21' is mentioned on line 243, but not defined == Missing Reference: 'RFC-1034' is mentioned on line 315, but not defined == Unused Reference: 'DNSEXT' is defined on line 348, but no explicit reference was found in the text == Unused Reference: 'UNICODE' is defined on line 366, but no explicit reference was found in the text == Unused Reference: 'UTR15' is defined on line 374, but no explicit reference was found in the text == Unused Reference: 'UTR21' is defined on line 378, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CHARREQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'DNSEXT' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'US-ASCII' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR21' Summary: 4 errors (**), 0 flaws (~~), 9 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IETF IDN Working Group James Seng 2 Internet Draft draft-ietf-idn-requirements-01.txt 3 10th Mar 2000 Expires 10th Mar 2000 5 Requirements of Internationalized Domain Names 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with 10 all provisions of Section 10 of RFC2026. 12 Internet-Drafts are working documents of the Internet Engineering 13 Task Force (IETF), its areas, and its working groups. Note that 14 other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other 19 documents at any time. It is inappropriate to use Internet- 20 Drafts as reference material or to cite them other than as 21 "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Abstract 31 This document describes the requirement for encoding international 32 characters into DNS names and records. This document is guidance for 33 developing protocols for internationalized domain names. 35 1. Introduction 37 At present, the encoding of Internet domain names is restricted to a 38 subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many 39 other text based items on the Internet have already been 40 internationalized. It is important for domain names to be similarly 41 internationalized. 43 This document is being discussed on the "idn" mailing list. To join the 44 list, send a message to with the words 45 "subscribe idn" in the body of the message. Archives of the mailing 46 list can also be found at ftp://ops.ietf.org/pub/lists/idn*. 48 1.1 Definitions and Conventions 50 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 51 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 52 document are to be interpreted as described in [RFC2119]. 54 "IDN" is used in this document as an abbreviation for "internationalized 55 domain name". This is defined as a domain name that contains one or more 56 characters that are outside the set of characters specified as legal 58 Expires 10th of March 2000 [Page 1] 59 characters for domain names in [RFC1034] Section 3.5 and [RFC1123]. 61 It is important to note the difference between domain name and host 62 name. Current domain names has no restriction on what is legal 63 character (8bit). The only restrictions are the total and label 64 lengths. Host name on the other hand are restricted to alphanumeric 65 and '-' case insensitive with "." only allowed between labels. 67 A master server for a zone holds the main copy of that zone. This copy 68 is sometimes stored in a zone file. A slave server for a zone holds a 69 complete copy of the records for that zone. A caching server holds 70 temporary copies of DNS records; it uses records to answer queries 71 about domain names. Further explanation of these terms can be found in 72 [RFC1034] and [RFC1996]. 74 Characters mentioned in this document are identified by their position 75 in the Unicode character set. The notation U+12AB, for example, 76 indicates the character at position 12AB (hexadecimal) in the Unicode 77 character set. Note that the use of this notation is not an indication 78 of a requirement to use Unicode. 80 Examples quoted in this document should be considered as a method to 81 further explain the meanings and principles adopted by the document. It 82 is not a requirement for the protocol to satisfy the examples. 84 A character is a member of a set of elements used for organization, 85 control, or representation of data. 87 A coded character is a character with its coded representation. 89 A coded character set ("CCS") is a set of unambiguous rules that 90 establishes a character set and the relationship between the characters 91 of the set and their coded representation. 93 A graphic character or glyph is a character, other than a control 94 function, that has a visual representation normally handwritten, 95 printed, or displayed. 97 A character encoding scheme or "CES" is a mapping from one or more 98 coded character sets to a set of octets. Some CESs are associated with 99 a single CCS; for example, UTF-8 applies only to ISO 10646. Other CESs, 100 such as ISO 2022, are associated with many CCSs. 102 A charset is a method of mapping a sequence of octets to a sequence of 103 abstract characters. A charset is, in effect, a combination of one or 104 more CCS with a CES. Charset names are registered by the IANA according 105 to procedures documented in RFC 2278. 107 A language is a way that humans interact. In written form, a language 108 is expressed in characters. The same set of characters can often be 109 used in many languages, and many languages can be expressed using 110 different scripts. A particular charset may have different glyphs 111 (shapes) depending on the language being used. 113 Expires 10th of March 2000 [Page 2] 114 2. General Requirements 116 2.1 Compatibility and Interoperability 118 The DNS is essential to the entire Internet. Therefore, the protocol 119 must not damage present DNS protocol interoperability. It must make the 120 minimum number of changes to existing protocols on all layers of the 121 stack. It must continue to allow any system anywhere to resolve any 122 Internet internationalized domain name. 124 The protocol must preserve the basic concept and facilities of domain 125 names as described in [RFC1034]. It must maintain a single, global, 126 universal, and consistent hierarchical namespace. 128 The same name resolution request must generate the same response, 129 regardless of the location or localization settings in the resolver, in 130 the master server, and in any slave servers involved in the resolution 131 process. 133 If the protocol allows more than one charset, it should also allow 134 creation of caching servers that do not understand the charset in which 135 a request or response is encoded. Such caching servers should work as 136 well for IDNs as they do for current domain names. The caching server 137 performs correctly if it gives the essentially the same answer (without 138 the authoritative bit) as the master server would have if presented 139 with the same request. 141 A caching server must not return data in response to a query that would 142 not have been returned if the same query had been presented to an 143 authoritative server. This applies fully for the cases when: 145 - The caching server does not know about IDN 146 - The caching server implements the whole specification 147 - The caching server implements a legal subset of the specification 149 The protocol should be able to be upgraded at any time with new features 150 and retain backwards compatibility with the current specification. 152 The protocol may modify the DNS protocol [RFC1035] and other related 153 work undertaken by the DNSEXT WG. However, these changes should be as 154 small as possible and any changes must be approved by the DNSEXT WG. 156 The protocol should be as simple as possible from the user's 157 perspective. Ideally, users should not realize that IDN was added on to 158 the existing DNS. 160 A fall-back strategy or mechanism based upon ASCII may be needed during 161 a transition period during deployment and adoption of IDN. Therefore, 162 if an encoding is not mapped into ASCII, then there should be an ASCII- 163 only representation compatible with the current DNS and there should be 164 a way for a program to find the ASCII-only representation for IDN. 166 The best solution is one that maintains maximum feasible compatibility 167 with current DNS standards as long as it meets the other requirements 169 Expires 10th of March 2000 [Page 3] 170 in this document. 172 2.2 Internationalization 174 Internationalized characters must be allowed to be represented and used 175 in DNS names and records. The protocol must specify what charset is used 176 when resolving domain names and how characters are encoded in DNS 177 records. 179 This document does not recommend any charset for I18N. If more than one 180 charset is used in the protocol, then the protocol must specify all the 181 charsets being used and for what purpose. A CCS(s) chosen must at 182 least cover the range of characters as currently defined (and as being 183 added) by ISO 10646/Unicode. 185 CES(s) chosen should not encode ASCII characters differently depending 186 on the other characters in the string. In other words, ASCII 187 character should remain as specified in [US-ASCII]. 189 The protocol must not invent a new CCS for the purpose of IDN only 190 and should use existing CES. The charset(s) chosen should also be 191 non-ambiguous. 193 The protocol should not make any assumptions where in a domain name 194 that internationalization might appear. In other words, it should not 195 differentiate between any part of a domain name because this may impose 196 a restriction on future internationalization efforts. 198 The protocol should also not make any localized restrictions in the 199 protocol. For example, an IDN implementation which only allows domain 200 names to use a single local script would immediately restrict 201 multinational organization. 203 Because of the wide range of devices that use the DNS and the wide 204 range of characteristics of international scripts, the protocol should 205 allow more than one method of domain name input and display. However, 206 there has to be a single way of encoding an internationalized domain 207 name within the core of the DNS. 209 2.3 Localization 211 The protocol must be able to handle localized requirement of different 212 languages. For example, IDN must be able to handle bi-directional 213 writing for scripts such as Arabic. 215 Historically, "." has been the separator of labels in the host names. 216 The protocol should not use different separators for different 217 languages. 219 Most localization can be handled by the user interface. It should not 220 matter how the domain names are input or presented, such as in a 221 reverse order or bi-directional, or with the introduction of a new 222 separator. However, the final wire format must be in canonical order. 224 Expires 10th of March 2000 [Page 4] 225 2.4 Canonicalization 227 Matching rules are a complicated process for IDN. Canonicalization of 228 characters must follow precise and predictable rules to ensure 229 consistency. [CHARREQ] is a recommended as a guide on canonicalization. 231 The DNS has to match a host name in a request with a host name held 232 in one or more zones. It also needs to sort names into order. It is 233 expected that some sort of canonicalization algorithm will be used as 234 the first step of this process. This section discusses some of the 235 properties which will be required of that algorithm. 237 The canonicalization algorithm might specify operations for case, 238 ligature, and punctuation folding. 240 In order to retain backwards compatibility with the current DNS, the 241 protocol must retain the case-insensitive comparison for US-ASCII as 242 specified in [RFC1035]. For example, Latin capital letter A (U+0041) 243 must match Latin small letter A (U+0061). [UTR-21] describes some of 244 the issues with case mapping. 246 Case folding must not be locale dependent. For example, Latin capital 247 letter I (U+0049) case folded to lower case in the Turkish context will 248 become Latin small letter dotless I (U+0131). But in the English 249 context, it will become Latin small letter I (U+0069). 251 If other canonicalization is done, then it must be done before the 252 domain name is resolved. Further, the canonicalization must be easily 253 upgrade able as new languages and writing systems are added. 255 Any conversion (case, ligature folding, punctuation folding, ...) from 256 what the user enters into a client to what the client asks for 257 resolution must be done identically on all requests from any client. 259 If the protocol specifies a canonicalization algorithm, a caching 260 server should perform correctly regardless of how much (or how little) 261 of that algorithm it has implemented. [1 request to remove] 263 If the protocol requires a canonicalization algorithm, all requests 264 sent to a caching server must already be in the canonical form. 266 If the charset can be normalized, then it should be normalized before 267 it is used in IDN. (conflict) 269 The protocol should avoid inventing a new normalization form provided 270 a technically sufficient one is available (such as in an ISO standard). 272 2.5 Operational Issues 274 Zone files should remain easily editable. 276 An IDN-capable resolver or server should not generate more traffic than 277 a non-IDN-capable resolver or server would when resolving an ASCII-only 278 domain name. The amount of traffic generated when resolving an IDN 280 Expires 10th of March 2000 [Page 5] 281 should be similar to that generated when resolving an ASCII-only name. 283 The protocol should add no new centralized administration for the DNS. 284 A domain administrator should be able to create internationalized names 285 as easily as adding current domain names. 287 Within a single zone, the zone manager must be able to define 288 equivalence rules that suit the purpose of the zone, such as, but not 289 limited to, and not necessarily, non-ASCII case folding, Unicode 290 normalizations, Cyrillic/Latin folding, or traditional/simplified 291 Chinese equivalence. Such defined equivalences must not remove 292 equivalences that are assumed by (old or local-rule-ignorant) caches. 294 The character set of a signed zone file should be capable of being the 295 same as the character set of the unsigned zone file. The protocol must 296 allow offline DNSSEC signing. It should be possible to look at the 297 signed file and see that it is the same as the unsigned one. 299 2.6 Others 301 The protocol may provide the same DNS resources using internationalized 302 text as it currently provides using ASCII text. 304 To get full semantics for IDN, an upgrade of the DNS and related 305 software may be needed. 307 The protocol should consider new features of DNS such as DNSSEC and 308 DNAME. For example, DNAME might be useful to simplify canonicalization 309 for IDN. 311 3. Technical Analysis 313 There are many standard protocols and RFCs which are depend on 314 domain names and have make various assumptions about the characters 315 in them always conforming to [RFC-1034]. We expect that the protocols 316 listed below to be affected: 318 <...list the sets of RFCs which we would like to have an summary...> 319 RFC821, RFC822, ... 321 All idn protocol documents must fully detail the expected effects of 322 leaking of the specified encoding to protocols other than the DNS 323 resolution protocol. They must also contain a summary of the technical 324 opinions of the IDN Working Group. 326 4. Security Considerations 328 Any solution that meets the requirements in this document must not 329 be less secure than the current DNS. Specifically, the mapping of 330 internationalized host names to and from IP addresses must have the 331 same characteristics as the mapping of today's host names. 333 Specifying requirements for internationalized domain names does not 334 itself raise any new security issues. However, any change to the DNS 336 Expires 10th of March 2000 [Page 6] 337 may affect the security of any protocol that relies on the DNS or on 338 DNS names. A thorough evaluation of those protocols for security 339 concerns will be needed when they are developed. In particular, IDNs 340 must be compatible with DNSSEC. 342 5. References 344 [CHARREQ] "Requirements for string identity matching and String 345 Indexing", http://www.w3.org/TR/WD-charreq, July 1998, 346 World Wide Web Consortium. 348 [DNSEXT] "IETF DNS Extensions Working Group", 349 namedroppers@internic.net, Olafur Gudmundson, Randy Bush. 351 [RFC1034] "Domain Names - Concepts and Facilities", rfc1034.txt, 352 November 1987, P. Mockapetris. 354 [RFC1035] "Domain Names - Implementation and Specification", 355 rfc1035.txt, November 1987, P. Mockapetris. 357 [RFC1123] "Requirements for Internet Hosts -- Application and 358 Support", rfc1123.txt, October 1989, R. Braden. 360 [RFC1996] "A Mechanism for Prompt Notification of Zone Changes 361 (DNS NOTIFY)", rfc1996.txt, August 1996, P. Vixie. 363 [RFC2119] "Key words for use in RFCs to Indicate Requirement 364 Levels", rfc2119.txt, March 1997, S. Bradner. 366 [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 367 3.0", ISBN 0-201-61633-5. Described at 368 http://www.unicode.org/unicode/standard/versions/ 369 Unicode3.0.html 371 [US-ASCII] Coded Character Set -- 7-bit American Standard Code for 372 Information Interchange, ANSI X3.4-1986. 374 [UTR15] "Unicode Normalization Forms", Unicode Technical Report 375 #15, http://www.unicode.org/unicode/reports/tr15/, 376 Nov 1999, M. Davis & M. Duerst, Unicode Consortium. 378 [UTR21] "Case Mappings", Unicode Technical Report #21, 379 http://www.unicode.org/unicode/reports/tr21/, Dec 1999, 380 M. Davis, Unicode Consortium. 382 Appendix A. Acknowledgements 384 The editor gratefully acknowledges the contributions of: 386 Harald Tveit Alvestrand 387 Martin Duerst 388 Patrik Faltstrom 389 Andrew Draper 390 Bill Manning 392 Expires 10th of March 2000 [Page 7] 393 Paul Hoffman 394 James Seng 395 Randy Bush 396 Alan Barret 397 Olafur Gudmundsson 398 Karlsson Kent 399 Dan Oscarsson 400 J. William Semich 401 RJ Atkinson 402 Simon Josefsson 403 Ned Freed 404 Dongman Lee 405 Mark Andrews 407 Expires 10th of March 2000 [Page 8]