idnits 2.17.1 draft-ietf-idn-idna-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 379 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 4 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX9' Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Patrik Faltstrom 2 draft-ietf-idn-idna-03.txt Cisco 3 July 20, 2001 Paul Hoffman 4 Expires in six months IMC & VPNC 6 Internationalizing Host Names In Applications (IDNA) 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other groups 15 may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference material 20 or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Abstract 30 The current DNS infrastructure does not provide a way to use 31 internationalized host names (IDN). This document describes a mechanism 32 that requires no changes to any DNS server or resolver that will allow 33 internationalized host names to be used by end users with changes only 34 to applications. It allows flexibility for user input and display, and 35 assures that host names that have non-ASCII characters are not sent to 36 DNS servers or resolvers. 38 1. Introduction 40 In the discussion of IDN solutions, a great deal of discussion has 41 focused on transition issues and how IDN will work in a world where not 42 all of the components have been updated. Earlier proposed solutions 43 require that user applications, resolvers, and DNS servers to be updated 44 in order for a user to use an internationalized host name. Instead of 45 this requirement for widespread updating of all components, the current 46 proposal is that only user applications be updated; no changes are 47 needed to the DNS protocol or any DNS servers or the resolvers on user's 48 computers. 50 This document is being discussed on the ietf-idna@mail.apps.ietf.org 51 mailing list. To subscribe, send a message to 52 ietf-idna-request@mail.apps.ietf.org with the single word "subscribe" in 53 the body of the message. 55 1.1 Design philosophy 57 Many proposals for IDN protocols have required that DNS servers be 58 updated to handle internationalized host names. Because of this, the 59 person who wanted to use an internationalized host name had to be sure 60 that their request went to a DNS server that was updated for IDN. 61 Further, that server could only send queries to other servers that had 62 been updated for IDN because the queries contain new protocol elements 63 to differentiate IDN name parts from current host parts. In addition, 64 these proposals require that resolvers must be updated to use the new 65 protocols, and in most cases the applications would need to be updated 66 as well. 68 These proposals would require that the application protocols that use 69 host names as protocol elements to change. This is due to the 70 assumptions and requirements made in those protocols about the 71 characters that have always been used for host names, and the encoding 72 of those characters. Other proposals for IDN protocols do not require 73 changes to DNS servers but still require changes to most application 74 protocols to handle the new names. 76 Updating all (or even a significant percentage) of the existing servers 77 in the world will be difficult, to say the least. Updating applications, 78 application gateways, and clients to handle changes to the application 79 protocols is also daunting. Because of this, we have designed a protocol 80 that requires no updating of any name servers. IDNA still requires the 81 updating of applications, but only for input and display of names, not 82 for changes to the protocols. Once a user has updated these, she or he 83 could immediately start using internationalized host names. The cost of 84 implementing IDN may thus be much lower, and the speed of implementation 85 could be much higher. 87 1.2 Terminology 89 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and 90 "MAY" in this document are to be interpreted as described in RFC 2119 91 [RFC2119]. 93 2. Structural Overview 95 In IDNA, users' applications are updated to perform the processing 96 needed to input internationalized host names from users, display 97 internationalized host names that are returned from the DNS to users, 98 and process the inputs and outputs from the DNS. 100 2.1 Interfaces between DNS components in IDNA 102 The interfaces in IDNA can be represented pictorially as: 104 +------+ 105 | User | 106 +------+ 107 ^ 108 |Input and display: local interface methods 109 |(pen, keyboard, glowing phosphorus, ...) 110 +-----------------|------------------------------+ 111 | v | 112 | +--------------------------+ | 113 | | Application | | 114 | +--------------------------+ | 115 | ^ ^ | 116 | Call to resolver:| |Application-specific | 117 | nameprepped ACE| |protocol: | 118 | v |predefined by the | End system 119 | +----------+ |protocol or defaults | 120 | | Resolver | |to nameprepped ACE | 121 | +----------+ | | 122 | ^ | | 123 +---------------|----------|---------------------+ 124 DNS protocol:| | 125 nameprepped ACE| | 126 v v 127 +-------------+ +---------------------+ 128 | DNS servers | | Application servers | 129 +-------------+ +---------------------+ 131 This document uses the generic term "ACE" for an ASCII-compatible 132 encoding. After the IDN Working Group has chosen a specific ACE, this 133 document will be updated to refer to just that single ACE. Until that 134 time, an implementor creating experimental software must choose an ACE 135 to use, such as RACE or LACE or DUDE. 137 2.1.1 Entry and display in applications 139 Applications can accept host names using any character set or sets 140 desired by the application developer, and can display host names in any 141 charset. That is, this protocol does not affect the interface between 142 users and applications. 144 An IDNA-aware application can accept and display internationalized host 145 names in two formats: the internationalized character set(s) supported 146 by the application, and in an ACE. Applications MAY allow ACE input and 147 output, but are not encouraged to do so except as an interface for 148 special purposes, possibly for debugging. ACE encoding is opaque and 149 ugly, and should thus only be exposed to users who absolutely need it. 150 The optional use, especially during a transition period, of ACE 151 encodings in the user interface is described in section 3. Because name 152 parts encoded with ACE can be rendered either as the encoded ASCII 153 characters or the proper decoded characters, the application MAY have an 154 option for the user to select the preferred method of display; if it 155 does, rendering the ACE SHOULD NOT be the default. 157 Host names are often stored and transported in many places. For example, 158 they are part of documents such as mail messages and web pages. They are 159 transported in the many parts of many protocols, such as both the 160 control commands and the RFC 2822 body parts of SMTP, and the headers 161 and the body content in HTTP. 163 In protocols and document formats that define how to handle 164 specification or negotiation of charsets, IDN host name parts can be 165 encoded in any charset allowed by the protocol or document format. If a 166 protocol or document format only allows one charset, IDN host name parts 167 must be given in that charset. In any place where a protocol or document 168 format allows transmition of the characters in IDN host name parts, IDN 169 host name parts SHOULD be transmitted using whatever character encoding 170 and escape mechanism that the protocol or document format uses at that 171 place. 173 All protocols that have host names as protocol elements already have the 174 capacity for handling host names in the ASCII charset. Thus, IDN host 175 name parts can be specified in those protocols in the ACE charset, which 176 is a superset of the ASCII charset that uses the same set of octets. 178 2.1.2 Applications and resolvers 180 Applications communicate with resolver libraries through a programming 181 interface (API). Typically, the IETF does not standardize APIs, although 182 there are non-standard APIs specified for IPv6. This protocol does not 183 specify a specific API, but instead specifies only the input and output 184 formats of the host names to the resolver library. 186 Before converting the name parts into ACE, the application MUST prepare 187 each name part as specified in [NAMEPREP]. The application MUST use ACE 188 for the name parts that are sent to the resolver, and will always get 189 name parts encoded in ACE from the resolver. 191 IDNA-aware applications MUST be able to work with both 192 non-internationalized host name parts (those that conform to [STD13] and 193 [STD3]) and internationalized host name parts. An IDNA-aware application 194 that is resolving a non-internationalized host name part MUST NOT do 195 any preparation or conversion to ACE on any non-internationalized name 196 part. 198 2.1.3 Resolvers and DNS servers 200 An operating system might have a set of libraries for converting host 201 names to nameprepped ACE. The input to such a library might be in one or 202 more charsets that are used in applications (UTF-8 and UTF-16 are likely 203 candidates for almost any operating system, and script-specific charsets 204 are likely for localized operating systems). The output would be either 205 the unchanged name part (if the input already conforms to [STD13] and 206 [STD3]), or the nameprepped, ACE-encoded name part. 208 DNS servers MUST use the ACE format for internationalized host name 209 parts. 211 If a signalling system which makes negotiation possible between old and 212 new DNS clients and servers is standardized in the future, the encoding 213 of the query in the DNS protocol itself can be changed from ACE to 214 something else, such as UTF-8. The question whether or not this should 215 be used is, however, a separate problem and is not discussed in this 216 memo. 218 2.1.4 Avoiding exposing users to the raw ACE encoding 220 All applications that might show the user a host name that was received 221 from a gethostbyaddr or other such lookup SHOULD update as soon as 222 possible in order to prevent users from seeing the ACE. However, this is 223 not considered a big problem because so few applications show this type 224 of resolution to users. 226 If an application decodes an ACE name but cannot show all of the 227 characters in the decoded name, such as if the name contains characters 228 that the output system cannot display, the application SHOULD show the 229 name in ACE format instead of displaying the name with the replacement 230 character (U+FFFD). This is to make it easier for the user to transfer 231 the name correctly to other programs. Programs that by default show the 232 ACE form when they cannot show all the characters in a name part SHOULD 233 also have a mechanism to show the name with as many characters as 234 possible and replacement characters in the positions where characters 235 cannot be displayed. In many cases, the application doesn't know exactly 236 what the underlying rendering engine can or cannot display. 238 In addition to the condition above, if an application decodes an ACE 239 name but finds that the decoded name was not properly prepared according 240 to [NAMEPREP] (for example, if it has illegal characters in it), the 241 application SHOULD show the name in ACE format and SHOULD NOT display 242 the name in its decoded form. This is to avoid security issues described 243 in [NAMEPREP]. 245 2.1.5 Automatic detection of ACE 247 An application which receives a host name SHOULD verify whether or not 248 the host name is in ACE. This is possible by verifying the prefix in 249 each of the labels, and seeing whether or not the label is in ACE. This 250 MUST be done regardless of whether or not the communication channel used 251 (such as keyboard input, cut and paste, application protocol, 252 application payload, and so on) is encoding with ACE. 254 The reason for this requirement is that many applications are not 255 ACE-aware. Applications that are not ACE-aware will send host names in 256 ACE but mark the charset as being US-ASCII or some other charset which 257 has the characters that are valid in [STD13] as a subset. 259 2.1.6 Bidirectional text 261 In IDNA, text storage and display follows the rules in the Unicode standard 262 [Unicode3.1]. In particular, all Unicode text is stored in logical order; 263 the Unicode standard has an extensive discussion of how to deal with reorder 264 glyphs for display when dealing with bidirectional text such as Arabic or 265 Hebrew. See [UAX9] for more information. 267 3. Name Server Considerations 269 It is imperative that there be only one encoding for a particular host 270 name. ACE is an encoding for host name parts that use characters outside 271 those allowed for host names [STD13]. Thus, a primary master name server 272 MUST NOT contain an ACE-encoded name that decodes to a host name that is 273 allowed in [STD13] and [STD3]. 275 Name servers MUST NOT have any records with host names that contain 276 internationalized name parts unless those name parts have be prepared 277 according to [NAMEPREP]. If names that are not legal in [NAMEPREP] are 278 passed to an application, it will result in an error being passed to the 279 application with no error being reported to the name server. Further, no 280 application will ever ask for a name that is not legal in [NAMEPREP] 281 because requests always go through [NAMEPREP] before getting to the DNS. 282 Note that [NAMEPREP] describes how to handle versioning of unallocated 283 codepoints. 285 The host name data in zone files (as specified by section 5 of RFC 1035) 286 MUST be both nameprepped and ACE encoded. 288 4. Root Server Considerations 290 Because there are no changes to the DNS protocols, adopting this 291 protocol has no effect on the DNS root servers. 293 5. Security Considerations 295 Much of the security of the Internet relies on the DNS. Thus, any change 296 to the characteristics of the DNS can change the security of much of the 297 Internet. 299 This memo describes an algorithm which encodes characters that are not 300 valid according to STD3 and STD13 into octet values that are valid. No 301 security issues such as string length increases or new allowed values 302 are introduced by the encoding process or the use of these encoded 303 values, apart from those introduced by the ACE encoding itself. 305 When detecting an ACE-encoded host name, and decoding the ACE, care must 306 be taken that the resulting value(s) are valid characters which can be 307 handled by the application. This is described in more detail in section 308 2.1.4. 310 Host names are used by users to connect to Internet servers. The 311 security of the Internet would be compromised if a user entering a 312 single internationalized name could be connected to different servers 313 based on different interpretations of the internationalized host name. 315 Because this document normatively refers to [NAMEPREP], it includes the 316 security considerations from that document as well. 318 6. References 320 [NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of 321 Internationalized Host Names", draft-ietf-idn-nameprep. 323 [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate 324 Requirement Levels", March 1997, RFC 2119. 326 [STD3] Bob Braden, "Requirements for Internet Hosts -- Communication 327 Layers" (RFC 1122) and "Requirements for Internet Hosts -- Application 328 and Support" (RFC 1123), STD 3, October 1989. 330 [STD13] Paul Mockapetris, "Domain names - concepts and facilities" (RFC 331 1034) and "Domain names - implementation and specification" (RFC 1035, 332 STD 13, November 1987. 334 [UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm. 335 http://www.unicode.org/unicode/reports/tr9/ 337 [Unicode3.1] The Unicode Standard, Version 3.1.0: The Unicode 338 Consortium. The Unicode Standard, Version 3.0. Reading, MA, 339 Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5, as amended 340 by: Unicode Standard Annex #27: Unicode 3.1 341 . 343 B. Changes from the -02 draft 345 Editorial changes throughout 347 2.1.1: Major changes to the second paragraph. Added major text to fourth 348 paragraph. 350 2.1.4: Added to the end of the second paragraph. Added the third 351 paragraph. 353 2.1.6: Complete change. 355 6: Added [Unicode3.1] and [UAX9]. 357 C. Authors' Addresses 359 Patrik Faltstrom 360 Cisco Systems 361 Arstaangsvagen 31 J 362 S-117 43 Stockholm Sweden 363 paf@cisco.com 365 Paul Hoffman 366 Internet Mail Consortium and VPN Consortium 367 127 Segre Place 368 Santa Cruz, CA 95060 USA 369 phoffman@imc.org