idnits 2.17.1 draft-klensin-ima-constraints-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 694. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 671. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 678. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 684. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 26, 2006) is 6627 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 579, but no explicit reference was found in the text == Unused Reference: 'Klensin-emailaddr' is defined on line 617, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2821 (Obsoleted by RFC 5321) ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) == Outdated reference: A later version (-01) exists of draft-klensin-ima-framework-00 == Outdated reference: A later version (-06) exists of draft-iab-idn-nextsteps-03 -- Obsolete informational reference (is this intentional?): RFC 1341 (Obsoleted by RFC 1521) Summary: 6 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft February 26, 2006 4 Expires: August 30, 2006 6 Internationalization in Internet Applications: Issues, Tradeoffs, and 7 Email Addresses 8 draft-klensin-ima-constraints-00.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on August 30, 2006. 35 Copyright Notice 37 Copyright (C) The Internet Society (2006). 39 Abstract 41 The discussions of internationalized email addresses in the IETF have 42 led to a number of stated requirements. This document identifies 43 some of those requirements in the context of general issues of 44 internationalization of Internet name spaces, demonstrates that the 45 combination of all of the requirements that appear reasonable on 46 first glance adds up to a null solution space, and then suggests a 47 different model for proceeding. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Environment for Internationalization and Fragmentation 53 Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.1. Climate for Internationalization: The DNS History . . . . 5 55 2.2. Technology . . . . . . . . . . . . . . . . . . . . . . . . 7 56 3. Consequences and Implications . . . . . . . . . . . . . . . . 8 57 3.1. Choosing and mixing scripts and languages . . . . . . . . 9 58 3.2. Confusable characters and communcations accuracy . . . . . 10 59 3.3. Communication across languages and cultures . . . . . . . 10 60 3.4. The place of internationalization in a global Internet . . 11 61 4. Specific Impact of I18N Email Addressing . . . . . . . . . . . 12 62 5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 63 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 64 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 65 7.1. Normative References . . . . . . . . . . . . . . . . . . . 13 66 7.2. Informative References . . . . . . . . . . . . . . . . . . 14 67 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16 68 Intellectual Property and Copyright Statements . . . . . . . . . . 17 70 1. Introduction 72 In general, internationalization has been approached in the IETF on 73 the assumption that, if one can get the character sets and perhaps 74 language tags right, other issues will take care of themselves. An 75 "internationalization considerations" section is strongly suggested 76 for RFCs (see RFC 2277, Section 6 [RFC2277] and note that Section 3.1 77 of that document requires UTF-8 support of all protocols, hence all 78 protocols hence all protocol documents "deal with 79 internationalization issues at all"), but there are no real 80 guidelines about what should be in it and the requirement has not 81 always been enforced. There are also some additional requirements, 82 e.g., for UTF-8 support [RFC2277]. Particular protocols have gone 83 beyond these guidelines. In particular, the standards for 84 internationalized domain names, IDNA [RFC3490], use Unicode as a base 85 but utilize their own encoding of Unicode, punycode [RFC3492]. Those 86 standards carefully avoid identification of languages, since domain 87 names inherently consist of more or less arbitrary strings, not 88 "words" or other language elements. 90 That body of work generally ignores an important observation and its 91 consequences. When user-chosen words, names, and non-ASCII scripts 92 are used at the applications layer, users will often treat them as 93 language elements having meaning, and often pronunciations, in those 94 languages, not merely as strings of characters. The assumptions of 95 meaning or pronunciation, in turn, will often introduce age-old 96 problems of cross-language reading and understanding into the design 97 of applications, or applications protocols, that are intended to work 98 globally: if one person cannot read or understand the language of 99 another, the fact imposes limitations on communication that, in 100 general, cannot be solved by protocol design. In the most extreme 101 cases, differences in the languages and character sets that people 102 find normal and convenient impose practical limits on 103 interoperability: choices must be made between compatibility and 104 convenience within a linguistic and cultural community and global 105 interoperability that will, inevitably, be less convenient for some 106 groups and cultures than others. In some cases, solutions are 107 feasible that make things convenient within a cultural or linguistic 108 group and provide a less-convenient mechanism for getting between 109 groups, in others, even more difficult choices will need to be made. 110 And, in some cases (fortunately a gradually declining number), the 111 realities of character codings, presentation, and operating systems 112 make obvious solutions to problems impractical. 114 While these issues have appeared in the context of internationalized 115 domain names and in other applications, recent work to permit non- 116 ASCII local parts of electronic mail addresses without violating the 117 constraints of the mail protocols themselves have brought several of 118 the issues into better focus. This document discusses some of the 119 issues and problems -- both technical and in terms of user 120 expectations -- in general form and then reviews some of the 121 implications for email and other protocols that impose their own 122 constraints on strings and their interpretation. 124 While changes in lower-level Internet protocols and interfaces must 125 almost always occur at the protocol level (i.e., be visible "on the 126 wire" -- see below), there are at least three choices for 127 internationalization at the applications layer. Picking the right 128 one requires some understanding of how the features will be used, the 129 degree to which localization will be appropriately overlaid on the 130 basic internationalization features, and some general wisdom about 131 design. The option that is obvious at first is not necessarily the 132 best choice. The options are: 134 o Protocol changes, i.e., features that appear "on the wire" in the 135 interactions between client and server or between peer hosts. The 136 internationalization provisions for MIME body parts [RFC1341] are 137 examples of protocol-level mechanisms, since they appear in the 138 client-server interactions. 139 o Client-side changes, i.e., features that have characteristics 140 similar to protocol ones, but that are implemented entirely on the 141 client, without "on the wire" visibility. Domain name 142 internationalization using the IDNA specification [RFC3490] is an 143 example of a strictly client-side mechanism since non-ASCII 144 characters do not appear on the wire and the DNS server is not 145 required to be aware that internationalized names are being used. 146 o Adding a new layer or new abstraction, i.e., accomplishing 147 internationalization or localization not by somehow 148 internationalizing an existing protocol or introducing a 149 replacement protocol, but by adding new facilities that rest on 150 top of an unmodified non-internationalized protocol. Localization 151 facilities might also be added as a new layer on top of an 152 internationalized lower layer. Various efforts to add "keywords" 153 or other "above DNS search" mechanisms, the standardization of a 154 internationalized version of the URI [RFC3986] as an IRI 155 [RFC3987], and similar arrangements are "new layer" approaches. 157 2. Environment for Internationalization and Fragmentation Risks 159 In looking at the combination of efforts to internationalize the 160 Internet, especially at the protocol level, we encounter two large 161 groups of issues. One has to do with the social, cultural, and 162 political climate associated with the making of any decision about 163 internationalization in recent years and the other is about the 164 technology. The subsections that follow address both since, in 165 practice, it is impossible to deal with them separately. In 166 particular, as this document illustrates, if one examines the 167 technical issues, the desire to avoid constraints on global end to 168 end communications, and to minimize the risks of incorrect 169 identification of destination hosts or users, the conclusion would be 170 likely to be that almost any internationalization at the protocol 171 level is a bad idea. On the other hand, if the social and cultural 172 context is examined, it becomes clear that avoiding any 173 internationalization at the protocol level will lead to a different 174 type of fragmentation and, if that context is examined alone, demands 175 will arise for protocol changes that are not plausible in practice. 177 2.1. Climate for Internationalization: The DNS History 179 The biggest potential for network fragmentation due to introduction 180 of mutually-incomprehensible scripts occurred with the development of 181 domain names that are not intended to be presented as ASCII strings. 182 There was considerable resistance in the technical community to that 183 set of decisions based on the belief that domain names were 184 ultimately protocol elements that should remain, at least for 185 application purposes, in a restricted subset of ASCII (a subset that 186 is compatible with ISO 646 BV [ISO.646.1991]). At least part of that 187 community also concluded that internationalization should occur in a 188 protocol layer closer to the user, i.e., "above the DNS" [RFC3467]. 189 This layer might be thought of as the "presentation layer" of the 190 classical OSI model although the analogy is not exact. Those who 191 resisted DNS changes suggested that it might make sense to 192 distinguish what actions were taken in the DNS from a presentation 193 layer in which some new name spaces or resource identifiers might 194 occur. In that context, URIs [RFC3986], with their potentially 195 elaborate syntax, are no one's idea of "user friendly" even if one 196 ignores the desire for non-ASCII scripts entirely. The 197 internationalized form, IRIs [RFC3987] solve part of the non-ASCII 198 script problem, but are really no better: they permit 199 internationalization of the strings that make up URIs, but do not 200 address the complexity of the syntax or the ASCII syntax elements. 201 Such a presentation layer could make more culturally-reasonable forms 202 visible to the user while preserving clear layering over the 203 fundamental URI types and domain names that would remain unchanged. 204 That model would provide at least the potential for good localization 205 while preserving a common script, syntax, and set of conventions for 206 dealing with the actual elements of the network. 208 Although the idea of layering internationalization on top of an ASCII 209 protocol substrate seems to come back each time an application issue 210 is examined carefully, it has not gained significant traction in 211 practice other than as, e.g., DNS alternatives. Hence, the argument 212 has been lost, several times and in several different ways. It 213 became clear that, if the IETF had not provided some rational and 214 standardized ways to represent internationalized (non-ASCII) domain 215 names, we would have ended up with chaos -- different coded character 216 sets in different zones with some of them probably treated as binary 217 labels. We would see some shift-JIS form in Japan, GB forms in 218 China, ISO 8859-1 in Western Europe and other ISO 8859 variations in 219 some other areas, and unpredictable other variations in the rest of 220 the world. Worse, the only way to determine which particular coded 221 character set (CCS) was being used would be out of band knowledge, 222 since none of the people promoting those approaches came forward with 223 any realistic plans for how to label "charsets" (essentially a 224 combination of a script and a coding system for those who have not 225 followed the MIME version of that discussion; see [RFC2978] and 226 [RFC2277] for more precise definitions, further discussion, and 227 references) in the DNS. Indeed, in spite of the standard, we have 228 already seen the beginnings of fragmenting developments in some 229 domains along with special "improved, enhanced, and 230 internationalized" (and not quite interoperable) DNS servers being 231 offered by some companies. 233 So, despite some misgivings, the IETF defined IDNs via IDNA [RFC3490] 234 (including exclusive use of Unicode as the defining character set). 235 From the standpoint of this discussion, the interesting thing about 236 IDNA is that it doesn't change the DNS at all. It is a strictly 237 client-side protocol, with Unicode strings being pushed through a 238 canonicalization process and then transformed into an "ASCII- 239 compatible" form (called "punycode") that, to the DNS and 240 applications that have not been upgraded, looks like (and is) 241 hostname-format names, i.e., ASCII letters, digits and hyphens. It 242 was done that way because of a belief that the coding system would 243 lead to very rapid deployment without any negative impact on systems 244 or applications that had not been upgraded. Its most passionate 245 advocates were convinced that, once there was wide deployment, no one 246 would ever see the internal coding. 248 From the standpoint of global interoperability, the good news is that 249 they were wrong -- we have some other problems to cope with, but one 250 of them is not "you can't get there because you can't read or type 251 the string". If the application permits you to get to it, you can 252 always access and type the punycode string rather than whatever might 253 show up in characters you can't read, can't type, and maybe can't 254 even render. Of course, this requires that all applications support 255 entry of Roman characters, even if such entry is not convenient. 257 The choice of Unicode was, however, very important, not because it is 258 wonderful as a character set, but because it avoids the issues of 259 identifying what CCS is being used and, the WG hoped, of picking 260 which characters would be valid and which ones would not be. 262 Avoiding determining which characters should be valid and which ones 263 should not has also been less successful than one might have hoped; 264 both the IAB (see [IDN-Nextsteps]) and the Unicode Consortium (see 265 [UTR36] and [UTR39]) are struggling with approaches to that problem 266 for which they did not foresee a need when IDNA was adopted. 268 But, ultimately, it is important to remember as we talk about any of 269 this that the choice was never between "figure out some way to 270 internationalize the DNS" and "don't do it because it was a bad 271 idea". The choice was only between whether we did it on in a global, 272 standard, way that was fairly safe as far as DNS operations were 273 concerned or whether we ended up with a collection of different 274 mechanisms that would not interoperate cleanly and unambiguously 275 within a single domain name system. 277 2.2. Technology 279 As the result of these factors and tensions, IDNA became a completely 280 client-side IDN protocol. Several of the worst fears of the 281 pessimists have come true: we have confusion over look-alike 282 characters, we have the potential to receive and see characters we 283 can't read or type, the Unicode Consortium's beliefs about how widely 284 Unicode is available and about smooth conversions between codings 285 are, at best, very controversial, some implementers have "improved" 286 on the standard tables, and so on. Email MIME textual body parts 287 should be safe against character set problems due to the presence of 288 the "charset" parameter. However, in practice, problems in which one 289 character is mapped into an entirely different one are fairly 290 routine, most notably as the result of forwarding or otherwise 291 including all or part of one message in a body part that is 292 constructed locally according to different character set conventions. 293 Copying of text that was developed in one character coding context 294 and pasting it into another is not completely reliable for related 295 reasons. These problems are symptomatic of those we will certainly 296 encounter in the future as the Internet becomes increasingly 297 international and multilingual. Probably the worst is yet to come. 299 As was the case with the pre-MIME internationalized mail body 300 approaches and with the development of IDNA, the local solutions 301 --the ones that are not interoperable globally-- will work, and work 302 well, within the relevant cultural and linguistic communities. 303 Realistically, the IETF cannot ignore the issues and problems and 304 either hope they will go away or decide to do nothing because the 305 problems will cause disruption. To do so is to guarantee that local 306 solutions will be developed and that that people who use them will be 307 unable to communicate internationally (at least with the same tools 308 they use locally) and that people outside their communities will be 309 unable to communicate with them. 311 The key question is what the difficulties with the global solutions 312 or the development of local solutions actually do to 313 interoperability. The Internet community is probably in for a bad 314 time as reality catches up with many fantasies and delusions about 315 how systems and people work, but there is some reason for optimism 316 about the long term. To take one (admittedly-extreme) reality as an 317 example, suppose one user's primary language were written only in Old 318 Futhark Runic and that user does not read or speak any other 319 languages or write any other script. Assume further, stretching the 320 imagination a bit, that the only keyboards available to that user 321 have only runes on them. That user would have some serious problems 322 in communications. In particular, she would have been dead for 323 centuries: as far as is known, no living person really knows how 324 those languages and scripts worked (although there is a lot of 325 speculation) and it is unclear whether some of the Unicode decisions 326 in coding the runes are actually correct, much less optimal. She is 327 also not on the Internet in any significant way: the hypothetical 328 keyboard does not exist, there is no way to type a URL or email 329 address on it, etc. So, for that user, the net effect of permitting 330 IDNs in Runic, which IDNA now permits, is going to be just about zero 331 except maybe in terms of helping with her cultural pride. More 332 important, if she can find a few other living exclusive users of the 333 relevant scripts and languages, her ability to use those scripts and 334 languages in either content or domain names _might_ enhance their 335 ability to communicate with each other, but they certainly are not 336 going to increase or decrease anyone else's ability to communicate 337 with any of them. 339 On the other hand, suppose a different user can speak, read, and 340 write Russian as well as Old Viking Runic, but nothing else. If he 341 wants to communicate on the Internet, he can send notes (and use 342 domain names, etc.) that some reasonably large number of people will 343 be able to read easily, and a larger number will be able to get 344 through with a struggle, but, for anyone who does not read Russian or 345 recognize Cyrillic characters, he might as well have used Runic -- 346 the symbols are useless either way. This problem is, of course, 347 centuries old. IDNs don't make it any worse although they don't help 348 either. 350 While Runic is a far-fetched example, some of the African languages 351 and scripts are not. And, unlike Runic, some of those African 352 scripts have not even been coded into Unicode yet. 354 3. Consequences and Implications 356 The Internet community is probably in for a nasty learning curve, but 357 things should work out as people accept reality. Within a language 358 and cultural community, IDNs --and, even more important, email 359 addresses with non-ASCII characters in the local parts-- are almost 360 certain to be very important, especially among groups of people who 361 are not comfortable with Roman-based characters. They are going to 362 prove helpful just as the ability to use native/local characters in 363 content has proven helpful. That helpfulness is going to be 364 important to spreading accessibility to the Internet into some 365 population groups (although, until there is a great deal of content 366 in their languages, probably not as much as some of the IDN advocates 367 around WSIS and ICANN have believed). But, for communication between 368 different language and cultural groups, we are going to find that we 369 need to do what people have done through history, even before 370 computer networking entered the equation: we will have to figure out, 371 probably out of band, what languages and scripts we share with 372 particular correspondents and then pick a member of that set. 374 3.1. Choosing and mixing scripts and languages 376 The choice of a common and shared script or language is going to be 377 far more complicated for many cases than any of our existing content- 378 negotiation ideas anticipate. We will need to remember that some 379 people may be able understand a spoken language but not read it in 380 some or all of the scripts in which it is normally written and that, 381 especially for alphabetic scripts, the ability to read the script 382 (and even to crudely pronounce the sounds it implies) does not imply 383 the ability to understand any of the languages normally written in 384 it. These differences may relate to the ability to recognize 385 characters in a table, use a keyboard, recognize characters that 386 might appear in an IRI or email address, and so on. Ugly and nasty 387 as punycode may be, we will need to pass domain names around in it 388 unless we know in advance that our readers will know the relevant 389 scripts well and be able to type them, cut and paste them accurately, 390 and so on. If we choose to use non-ASCII email local parts, we will 391 discover that we need to keep ASCII alternative aliases around for 392 communicating more broadly and that those ASCII alternatives will 393 not, in the general case, be derivable algorithmically. Once we get 394 the email internationalization situation under control, nothing 395 should prevent a speaker of Norwegian, say Torbjorn Torbjornson (with 396 slashes across the second "o" in each name), from having an email 397 address of torbjorn@example.com (U+00F8 as the sixth character, i.e., 398 with a slash across the "o") but, if he and a Russian-speaker want to 399 communicate with each other, he would be well-advised to retain the 400 ability to receive mail at torbjorn@example.com (or some other 401 address), especially if the software of the Russian reader is going 402 to magically transform the U+00F8 character into "j", which would be 403 predicted by getting ISO 8859-1 and ISO 8859-5 confused. And, if his 404 alternative is not torbjorn@example.com but 405 torbjorn@torbjorn.example.com (with a slash over the sixth character 406 in the domain name), then the Russian users or their software must be 407 able to generate and use torbjorn@xn--torbjrn-u1a.example.com 408 instead. 410 It may be useful to note that "have an alternate address available 411 and let people know" bears a strong resemblance to the traditional 412 two-sided Asian business cards. The Chinese, Korean, or Japanese 413 characters on the front may be the correct ones but, if the owner of 414 the card wants to have communications with illiterate westerners, the 415 Roman characters on the back will rapidly become very important. Of 416 course, many people in those populations make exactly that choice: 417 their business cards do not have Roman characters on them. 418 Consequently, they have no expectations of communication with people 419 who do not read and speak the relevant languages. 421 3.2. Confusable characters and communcations accuracy 423 The common example of similarity between the printed form of a 424 Cyrillic "A" and a Roman one raises issues similar to the Norwegian 425 example above. If one sees the character in a domain name in context 426 with other Cyrillic (or Roman) characters, it will probably lead to 427 the right guess unless someone is being deliberately deceptive or 428 cute. If the context is not available, a good guess might still be 429 possible based on whether the character appears on a sign in a rural 430 community in Russia or the US (in Moscow or New York, one would 431 probably need to know about specific neighborhoods and the guess 432 would be less reliable). Reducing the odds of a deception based on 433 confusion between the characters that some would consider similar in 434 appearance is a topic of active discussion, mostly about what DNS 435 registries should be permitted to register. But, if the person 436 writing that message out is really concerned about accuracy, then 437 either some explicit hints or, for domain names the punycode string, 438 had best appear on the business card or sign... if they do not, the 439 negative reinforcement from confused and irritated users will 440 gradually get the message across that they should. 442 3.3. Communication across languages and cultures 444 All of this implies that those who communicate across language and 445 cultural groups will be required to learn, if they do not understand 446 already, to be quite self-aware about the use of internationalized 447 identifiers, as well as other examples of characters or languages, 448 across those boundaries. There will be a lower level of demands on 449 those who communicate only in a single language and within a single 450 culture. This is, of course, not an issue that originated with the 451 introduction of the Internet: it has been this way since languages 452 and scripts started to differentiate from each other and since 453 different cultures came into contact. As we internationalize the 454 network, a user of a given language that cannot be fully expressed in 455 ASCII will always be faced with a choice between insisting on the 456 purism of an email address local part and domain name in the script 457 associated with the local language and maximizing the number of 458 people who can communicate with her conveniently. In some cases, the 459 right answer will be "local language", in others, it will be "ASCII", 460 and in still others it will be "maintain two addresses". We are not 461 required, and should not try, to make that choice for users: the 462 users should make the best choices for their own needs, preferably 463 after understanding the consequences of the choices. As a community, 464 we will need to be very clever about user interfaces. As an example 465 much more general than email, if someone with no ability to read 466 Chinese characters sees a domain name written in those characters and 467 decides she wants to copy and paste it somewhere, the copy mechanism 468 is probably going to need to provide for both "copy the Chinese" and 469 "convert quietly to punycode and copy that". Either choice, by 470 itself, will be wrong sometimes. Users who both want to use Chinese- 471 script domain names and communicate outside that language or script 472 or culture are going to either learn to understand the difference and 473 relationship, or develop some good rituals that work, or the network 474 will keep slapping them in the head with failed lookups or bounced 475 mail until they do learn. Of course, substantially any language or 476 script could be substituted for "Chinese" in that example. 478 3.4. The place of internationalization in a global Internet 480 Does that make internationalized domain names a bad idea and 481 internationalized email addresses an even worse idea? Globally, 482 maybe... perhaps even probably if our exclusive focus is on global 483 uses of the Internet. But that is where we get back to examples 484 similar to the Runic one. If we have a population in an Arabic- 485 speaking country that only reads and writes in Arabic and only wants 486 to communicate with each other, internationalization extensions let 487 them get themselves onto the Internet and communicate with each other 488 and to do so without causing any harm to the rest of the Internet. 489 It appears that is A Good Thing or at least not harmful in any 490 significant way. Will it help them communicate with someone who 491 cannot read Arabic or help that person communicate with them? Not a 492 bit, at least in the absence of a translator who competent in Arabic 493 and has the right computer tools. The alternative, stated in its 494 most extreme form, is "everyone who really wants to be an effective 495 user of the global Internet had better be able to function in 496 English". At one level, that is probably true, politically-incorrect 497 though it may be. But, at another, it is a very different statement 498 than requiring that everyone who wants to communicate in Amharic, 499 with other Amharic-speakers, be forced to translate to and from 500 English (or at least to and from a subset of ASCII characters) to 501 manage that communication rather than being able to use their own 502 language and (Ethiopic) script. 504 We need to be very careful to not make interoperability (or 505 reliability of references and the like) worse among those who can now 506 communicate. It does not appear that either IDNs or i18n email 507 addresses will necessarily make things worse, but we should remain 508 vigilant to be sure that doesn't change. Until everyone learns good 509 habits we may rediscover an important part of the X.400 model-in- 510 practice: sooner or later, a non-speaker of Chinese will get a 511 message from a Chinese colleague with a return address that is all- 512 Chinese. The recipient will have no hope of using it in a reply 513 unless cut and paste works, and will not be able to reliably verify 514 whether or not it worked. That user (message recipient) will have to 515 deal with the message and replying to it by selecting an out-of-band 516 communications path --a different address or the telephone are the 517 most likely-- to get in touch with that person and either deliver the 518 reply over that path or use it to say "I just got something from you, 519 if in fact it was you, and I have no possible way to reply to it as 520 written. So what other address or path would you like me to use?" 522 Clearly, that would not be ideal. But there is no ideal solution as 523 long as people persist in speaking different languages and writing in 524 different scripts. It does not appear that the use of different 525 languages and scripts is likely to stop any time soon and, in 526 general, it is not desirable that it do so. 528 4. Specific Impact of I18N Email Addressing 530 As discussed in [I18Nemail-Framework], the requirement that nothing 531 inspect or alter an email local-part other than the final delivery 532 server (see [RFC2821]) imposes strong constraints on automatic 533 transformations of internationalized email addresses to ASCII form. 534 If we insist on reliable cutting and pasting, regardless of the 535 operational character coding of mail user agents, we are probably 536 constrained to avoid non-ASCII forms entirely: only putting the 537 internationalized string in encoded words and leaving the address 538 exclusively in ASCII will work in a large number of cases, but even 539 that can fail occasionally. So, if we try to impose a rule in which 540 the only email addresses that are permitted are those that will 541 always be usable globally, the consequence will be a conclusion that 542 non-ASCII local parts are impossible. 544 Unfortunately, that conclusion is a recipe for local, non- 545 interoperable, solutions -- probably ones based on "just use our 546 local characters and character coding" -- and the consequent de facto 547 network fragmentation that would follow from it, as discussed above. 548 A better approach is adopt a more realistic set of goals, starting 549 from the realization that people who have no need or desire to 550 communicate outside their language or cultural group are not going to 551 do so and then focusing on (i) permitting them to communicate as they 552 wish without creating risks for other Internet users and (ii) 553 providing reasonable facilities for those who do wish to communicate 554 across language groups to do so. 556 5. Security Considerations 558 This document discusses a series of internationalization issues that 559 bear on interoperability and might indirectly bear on security. As 560 such, it may suggest some issues that should be considered in 561 security evaluations of internationalized protocols. Its conclusions 562 also reinforce the well-understood point that expanding the range of 563 characters in which identifiers can be expressed will tend to 564 complicate the design of security-related protocols, and user 565 interfaces to them, that utilize such internationalized identifiers. 566 However, it raises no new security issues in itself. 568 6. Acknowledgements 570 The author would like to thank Alex Zinin and Dmitry Burkov for 571 initiating a conversation about the relationship between Internet 572 internationalization and fragmentation. That conversation ultimately 573 led to this memo. ...More to be supplied... 575 7. References 577 7.1. Normative References 579 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 580 Requirement Levels'", RFC 2119, March 1997. 582 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 583 April 2001. 585 [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration 586 Procedures", BCP 19, RFC 2978, October 2000. 588 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 589 "Internationalizing Domain Names in Applications (IDNA)", 590 RFC 3490, March 2003. 592 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 593 for Internationalized Domain Names in Applications 594 (IDNA)", RFC 3492, March 2003. 596 7.2. Informative References 598 [I18Nemail-Framework] 599 Klensin, J. and Y. Ko, "Overview and Framework for 600 Internationalized Email", 601 draft-klensin-ima-framework-00.txt (work in progress), 602 September 2005, . 605 [IDN-Nextsteps] 606 Klensin, J. and P. Faltstrom, "Review and Recommendations 607 for Internationalized Domain Names (IDN)", 608 draft-iab-idn-nextsteps-03.txt (work in progress), 609 February 2006, . 612 [ISO.646.1991] 613 International Organization for Standardization, 614 "Information technology - ISO 7-bit coded character set 615 for information interchange", ISO Standard 646, 1991. 617 [Klensin-emailaddr] 618 Klensin, J., "Internationalization of Email Addresses", 619 draft-klensin-emailaddr-i18n-03 (work in progress), 620 July 2005. 622 [RFC1341] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet 623 Mail Extensions): Mechanisms for Specifying and Describing 624 the Format of Internet Message Bodies", RFC 1341, 625 June 1992. 627 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 628 Languages", BCP 18, RFC 2277, January 1998. 630 [RFC3467] Klensin, J., "Role of the Domain Name System (DNS)", 631 RFC 3467, February 2003. 633 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 634 Resource Identifier (URI): Generic Syntax", STD 66, 635 RFC 3986, January 2005. 637 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 638 Identifiers (IRIs)", RFC 3987, January 2005. 640 [UTR36] Davis, M. and M. Suignard, "Unicode Technical Report #36: 641 Unicode Security Considerations", November 2005, 642 . 644 Working Draft for Proposed Update 646 [UTR39] Davis, M. and M. Suignard, "Unicode Technical Standard #39 647 (proposed): Unicode Security Considerations", July 2005, 648 . 650 Working Draft for Proposed Draft 652 Author's Address 654 John C Klensin 655 1770 Massachusetts Ave, #322 656 Cambridge, MA 02140 657 USA 659 Phone: +1 617 491 5735 660 Email: john-ietf@jck.com 662 Intellectual Property Statement 664 The IETF takes no position regarding the validity or scope of any 665 Intellectual Property Rights or other rights that might be claimed to 666 pertain to the implementation or use of the technology described in 667 this document or the extent to which any license under such rights 668 might or might not be available; nor does it represent that it has 669 made any independent effort to identify any such rights. Information 670 on the procedures with respect to rights in RFC documents can be 671 found in BCP 78 and BCP 79. 673 Copies of IPR disclosures made to the IETF Secretariat and any 674 assurances of licenses to be made available, or the result of an 675 attempt made to obtain a general license or permission for the use of 676 such proprietary rights by implementers or users of this 677 specification can be obtained from the IETF on-line IPR repository at 678 http://www.ietf.org/ipr. 680 The IETF invites any interested party to bring to its attention any 681 copyrights, patents or patent applications, or other proprietary 682 rights that may cover technology that may be required to implement 683 this standard. Please address the information to the IETF at 684 ietf-ipr@ietf.org. 686 Disclaimer of Validity 688 This document and the information contained herein are provided on an 689 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 690 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 691 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 692 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 693 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 694 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 696 Copyright Statement 698 Copyright (C) The Internet Society (2006). This document is subject 699 to the rights, licenses and restrictions contained in BCP 78, and 700 except as set forth therein, the authors retain all their rights. 702 Acknowledgment 704 Funding for the RFC Editor function is currently provided by the 705 Internet Society.