idnits 2.17.1 draft-klensin-emailaddr-i18n-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 485: '... EHLO response MUST NOT contain any ...' RFC 2119 keyword, line 487: '...is specification MUST treat the ESMTP ...' RFC 2119 keyword, line 521: '...ces this extension MUST be prepared to...' RFC 2119 keyword, line 527: '...arsing process, the local part MUST be...' RFC 2119 keyword, line 529: '...are to be looked up in the DNS MUST be...' (11 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: An SMTP Client that receives the I18N extension keyword MAY transmit a mailbox name as an internationalized string in UTF-8 form. It MAY transmit the domain part of that string in either punycode (derived from the IDNA process) or UTF-8 form but, if it sends the domain in UTF-8, it SHOULD first verify that the string is valid for a domain name according to IDNA rules. As required by RFC 2821, it MUST not attempt to parse, evaluate, or transform the local part in any way. If the I18N SMTP extension is not offered by the Server, the SMTP Client MUST not transmit an internationalized address. Instead, it MUST either return the message to the user as undeliverable or replace it, using some process outside the scope of this specification such as a directory lookup, with a local-part that conforms to the syntax rules of RFC 2821. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 3, 2003) is 7509 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Ldh-str' is mentioned on line 565, but not defined == Unused Reference: 'RFC3491' is defined on line 771, but no explicit reference was found in the text == Unused Reference: 'RFC3492' is defined on line 775, but no explicit reference was found in the text == Unused Reference: 'RFC2056' is defined on line 794, but no explicit reference was found in the text == Unused Reference: 'RFC2556' is defined on line 813, but no explicit reference was found in the text ** Obsolete normative reference: RFC 821 (Obsoleted by RFC 2821) ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2821 (Obsoleted by RFC 5321) ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (Obsoleted by RFC 5891) == Outdated reference: A later version (-03) exists of draft-hoffman-imaa-02 -- Obsolete informational reference (is this intentional?): RFC 2476 (Obsoleted by RFC 4409) -- Obsolete informational reference (is this intentional?): RFC 2554 (Obsoleted by RFC 4954) -- Obsolete informational reference (is this intentional?): RFC 2822 (Obsoleted by RFC 5322) -- Obsolete informational reference (is this intentional?): RFC 3454 (Obsoleted by RFC 7564) Summary: 8 errors (**), 0 flaws (~~), 9 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft October 3, 2003 4 Expires: April 2, 2004 6 Internationalization of Email Addresses 7 draft-klensin-emailaddr-i18n-00.txt 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that other 16 groups may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at http:// 24 www.ietf.org/ietf/1id-abstracts.txt. 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 This Internet-Draft will expire on April 2, 2004. 31 Copyright Notice 33 Copyright (C) The Internet Society (2003). All Rights Reserved. 35 Abstract 37 Internationalization of electronic mail addresses is, if anything, 38 more important than the already-completed effort for domain names. 39 In most of the contexts in which they are used, domain names can be 40 hidden within or as part of various types of references. Email 41 addresses, by contrast, are crucial: use of names of people or 42 organizations as, or as part of, the email local part is, for obvious 43 reasons, a well-established tradition on the network. Preventing 44 people from spelling their names correctly is, in the long term, 45 inexcusable. At the same time, email addresses pose a number of 46 special problems -- they are more difficult than simple domain names 47 in some respects, but actually easier in others. This document 48 discusses the issues with internationalization of email addresses, 49 explains why some obvious approaches are incompatible with the 50 definitions and use of Internet mail, and proposes a solution that is 51 likely to serve users and the network well for the long term. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. History, Context, and Design Constraints . . . . . . . . . . 4 57 2.1 MUAs, MTAs, addresses, and learning from MIME and ESMTP . . 4 58 2.2 An MUA-based Solution is Not Necessary . . . . . . . . . . . 6 59 2.2.1 Obtaining an Internationalized Email Address . . . . . . . . 7 60 2.2.2 Relay environment . . . . . . . . . . . . . . . . . . . . . 7 61 2.2.3 Internationalizing the Sender . . . . . . . . . . . . . . . 7 62 2.3 An MUA-based Solution is Unworkable . . . . . . . . . . . . 8 63 2.3.1 MX diversion . . . . . . . . . . . . . . . . . . . . . . . . 8 64 2.3.2 Embedded commands . . . . . . . . . . . . . . . . . . . . . 8 65 2.4 Encoding the Whole Address String . . . . . . . . . . . . . 9 66 2.5 Looking back and looking forward . . . . . . . . . . . . . . 10 67 2.6 Summary of Design Issues . . . . . . . . . . . . . . . . . . 10 68 3. A Mail Transport-level Protocol . . . . . . . . . . . . . . 10 69 3.1 General Principles and Objectives . . . . . . . . . . . . . 10 70 3.2 Framework for the Internationalization Extension . . . . . . 11 71 3.3 The Address Internationalization Service Extension . . . . . 11 72 3.4 Extended Mailbox Address Syntax . . . . . . . . . . . . . . 12 73 3.5 Additional ESMTP Changes and Clarifications . . . . . . . . 13 74 3.5.1 The Initial SMTP Exchange . . . . . . . . . . . . . . . . . 13 75 3.5.2 Trace Fields . . . . . . . . . . . . . . . . . . . . . . . . 13 76 3.6 Protocol Loose Ends . . . . . . . . . . . . . . . . . . . . 13 77 3.6.1 Punycode in Domain Names? . . . . . . . . . . . . . . . . . 14 78 3.6.2 Local Character Codes in Local Parts? . . . . . . . . . . . 14 79 3.6.3 Restrictions on Characters in Local Part? . . . . . . . . . 14 80 3.6.4 Requirement for 8BITMIME? . . . . . . . . . . . . . . . . . 14 81 3.6.5 Message Header and Body Issues with MTA Approach? . . . . . 15 82 3.6.6 Variant Addresses (Aliases) in a Command Verb . . . . . . . 15 83 3.6.7 The Received field 'for' clause . . . . . . . . . . . . . . 15 84 4. Advice to Designers and Operators of Mail-receiving 85 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 15 86 5. Security considerations . . . . . . . . . . . . . . . . . . 16 87 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 88 Normative References . . . . . . . . . . . . . . . . . . . . 16 89 Informative References . . . . . . . . . . . . . . . . . . . 17 90 Author's Address . . . . . . . . . . . . . . . . . . . . . . 18 91 Intellectual Property and Copyright Statements . . . . . . . 19 93 1. Introduction 95 Internationalization of electronic mail addresses is, if anything, 96 more important than the already-completed effort for domain names. 97 In most of the contexts in which they are used, domain names can be 98 hidden within, or as part of, various types of references or the 99 references themselves may be hidden. It also remains controversial 100 whether internationalization of domain names is actually necessary, 101 no matter how attractive and important it may appear at first glance. 102 Email addresses, by contrast, are crucial: use of names of people or 103 organizations as, or as part of, the email local part is, for obvious 104 reasons, a well-established tradition on the network. Preventing 105 people from spelling their names correctly is, in the long term, 106 inexcusable. However, while it is tempting to ignore them, email 107 addresses pose a number of special problems. Unlike domain names 108 --and, consequently, the domain part of an email address (after the 109 last "@")-- the local part (or mailbox name) is essentially 110 unconstrained with regard to syntax or the characters used. There 111 are no special delimiters comparable to the period used to separate 112 domain name labels, there is no standardized structure comparable to 113 the domain name system's hierarchy, and it has always been a firm 114 protocol requirement that no host other than the one to which final 115 delivery is made is permitted to parse or interpret the address (see 116 section 2.3.10 of [RFC2821]). In some respects, this makes things 117 much more difficult: it is far more difficult to know what behavior 118 will cause existing systems to cease working properly. In others, it 119 actually makes them easier, since the originating system is not 120 required, indeed, must not, understand how the receiving one will 121 interpret an address. 123 The balance of this document explores these issues in more detail. 125 While much of the description here depends on the abstractions of 126 "Mail Transfer Agent" ("MTA") and "Mail User Agent" ("MUA"), it is 127 important to understand that those terms and the underlying concepts 128 postdate the design of the Internet's email architecture and the 129 "protocols on the wire" principle. These two concepts have prevented 130 any strong and standardized distinctions about how MTAs and MUAs 131 interact on a given origin or destination host (or even whether they 132 are separate). 134 This document assumes a reasonable understanding of the protocols and 135 terminology of the most recent core email standards documented in RFC 136 2821 [RFC2821] and RFC 2822 [RFC2822]. 138 In its present internet-draft form, the document contains a great 139 deal of explanatory material and rationale for the approach chosen. 141 The actual protocol material appears almost entirely in Section 3, 142 especially Section 3.2 through Section 3.4. If it appears to be a 143 candidate for standards-track publication, the explanatory material, 144 rationale, and most of the other background materials should be 145 removed to a separate document. Those who wish to skip the 146 reasoning and comparison to other alternatives in this document and 147 examine the protocol proposal should skip to those sections. 149 2. History, Context, and Design Constraints 151 Several key issues in how email works and is handled impose 152 significant constraints on the solution space. Email is often used 153 as a transport mechanism for information that will be acted on by 154 computers, not merely read by people. While the approach is not 155 common, some of the systems that use it that way encode routing, 156 processing, or validation information into the envelope address 157 fields. More commonly, recipient systems use special address formats 158 to encode local routing or priority information. In recent years, 159 some of these addressing techniques have become important anti-spam 160 tools for some users and communities. These techniques have a long 161 history. Most or all of them conform to email standards and 162 practices that, in turn, go back to the first uses of email on the 163 ARPANet. Backward-compatibility --not damaging the interoperability 164 of standards-conforming programs that are now deployed and working 165 correctly-- makes it inappropriate to make decisions by conducting 166 user surveys and concluding that "not too many" people will be hurt. 167 Any new system must preserve existing practices and flexibilities 168 unless there are overwhelming reasons -- e.g., an absence of 169 plausible alternatives -- to not do so. 171 2.1 MUAs, MTAs, addresses, and learning from MIME and ESMTP 173 The development and deployment of MIME [RFC2045] provided a number of 174 important lessons for the community about how to design extensions 175 and enhanced features without harm to the installed and conforming 176 email system. Perhaps the most important of these was that it is 177 easier, and often more expedient, to make changes that have impact 178 only on mail user agents. If it is possible to make changes that way 179 --generally changes that involve only message headers and the message 180 body or body parts-- users who need particular features can switch to 181 user agents that support them or press for those features in the user 182 agents they have already selected. Even in the worst case in which 183 support for features the user considers critical is not readily 184 available, it is possible, with proper user agent design, to save the 185 entire message to a file and then use stand-alone software to 186 interpret the information and perform the desired functions. 188 Providing these functions in the message headers and body permits 189 them to be moved opaquely through the mail transport system, thus 190 avoiding any requirement to modify originating or delivery MTAs or 191 intermediate relays. In practice, the user may have little control 192 over those systems. Since changes to them typically impacts large 193 numbers of users, those who are responsible for them are often 194 reluctant to make changes in response to the needs of a few users. 196 It is hence reasonable to conclude that, if it is feasible to support 197 address internationalization strictly at the MUA level, keeping the 198 internationalized addresses opaque to the transport system, that is a 199 more desirable approach than requiring MTA changes. The MUA approach 200 has been carefully examined by others [I-D.hoffman-imaa]. This 201 document argues that 203 1. addressing is a fundamental MTA-level function, 205 2. some of the complexities encountered when trying to encode 206 addresses so as to avoid MTA interactions are symptoms that 207 attempting to "hide" the MTA function so that it can be handled 208 by MUAs is not an architecturally desirable approach, 210 3. the restrictions on email uses and syntax required to provide 211 internationalization at MUA level are unnecessarily risky, and 212 almost certainly damaging, to deployed email infrastructure, and 214 4. MTA-level solutions are feasible, architecturally more elegant, 215 and perhaps not as difficult to deploy in relevant communities as 216 the strongest advocates of the MUA approach appear to imagine. 218 The decision as to what to do in message bodies and formats (e.g., 219 [RFC2822] and MIME [RFC2045]) and what to handle in message 220 transport (i.e., [E]SMTP) is critical because, as discussed below, 221 the level at which something is handled is both determined by, and 222 determines, how information is appropriately encoded. This decision 223 ultimately depends on the application of two principles: 225 1. If body content is opaque, anything still visible to transport 226 requires transport negotiation. 228 2. Anything an MTA -- origin, relay, MX, gateway, delivery -- needs 229 to understand or process must be handled as part of mail 230 transport. The discussion below might be titled "why the MTA 231 must get involved". 233 Whether mail addresses meet these criteria, and hence must be 234 comprehensible in transport, depends on how much the sending MUA 235 needs to know to construct, and the delivery MTA needs to know to 236 deliver, a message. Traditionally, we have kept the former knowledge 237 level at zero: if a sender produces "!a!b!c@example.com" in response 238 to information that it is a valid address, it still does not know 239 whether this is a "bang path" or a slightly-perverse name for a 240 single mailbox. Is "xyz%def@example.com" a specification for routing 241 to mailbox "xyz" on host "def" or a mailbox on the example.com host 242 named "xyz%def". Are "foo+bar@..." or "foo-baz@..." subaddresses 243 "bar" and "baz" for the mailbox "foo", or are they simple addresses? 244 Is "jjoneschem@labs.example.com" a local mailbox on that host or an 245 instruction to route mail to "jjones" in the chemistry department? 247 Under the rules established in [RFC0821] and [RFC1123], as summarized 248 and updated in [RFC2821], all of those decisions are up to 249 "example.com", its MX alternatives, or hosts in that domain, and they 250 may make very local decisions about them. For example, "xyz%def" 251 might be a mailbox while "xyz%ghi" might be a route; "foo-baz" might 252 be a subaddress while "foo-blog" might be a mailbox. 254 The sender cannot, in the general case, know. 256 Worse, while non-alphanumeric characters like "+", "-", and "%" have 257 been used in these examples, delimiters for subaddresses, implicit 258 routing, embedded commands, and so on are, again, up to the 259 destination MTA and its interpretations. "X" might be as good a 260 delimiter as "+". It might even be a better one in some 261 applications. And, since local-parts are defined as case-sensitive, 262 "x" might be a normal address character in the same address in which 263 "X" was an important delimiter. Of course, in a completely non-ASCII 264 environment, it would make sense to substitute characters from the 265 local script for "+", "-", "%", and so on. 267 It is not even necessary to use a delimiter to support some forms or 268 subaddressing or local routing. Suppose an organization adopted the 269 convention that externally-visible email address local parts were 270 structured as, e.g., a three-letter department code, followed by a 271 five-letter code representing the individual, optionally followed by 272 a code representing a project. Many organizations use just such 273 systems and there is no way (and no need) for an email sender to 274 understand the system or whether it is actually used for mail routing 275 internally. 277 Consequently, the idea of a sender breaking an address up into its 278 component parts and encoding those parts separately is an 279 impossibility without major, incompatible, and retroactive changes in 280 how mail addressing is defined. 282 2.2 An MUA-based Solution is Not Necessary 283 2.2.1 Obtaining an Internationalized Email Address 285 One of the classic arguments for an MUA-based approach (to 286 international addresses or anything else) is that users will be able 287 to install and use solutions on their own, even if the administrators 288 of their systems are unenthused about the particular function or 289 extension and delay, or decline, to install it. That argument was 290 certainly true for MIME, especially in the presence of the capability 291 to store messages as files and apply post-MUA tools. But it does not 292 seem to apply for email addresses. In general, users cannot create 293 email accounts, or aliases controlling delivery of messages from 294 external systems. Those accounts and aliases must be created by 295 system administrators responsible for the mail servers. If they are 296 not sympathetic to internationalized mailbox names, such names will 297 not exist on the receiving system. Having apparatus to send those 298 names through the protocols will be essentially useless: a message 299 that bounces because the relevant account or mailbox does not exist 300 will bounce equally well whether the target address is in ASCII or in 301 some other script and whether or not the receiving MTA is required to 302 explicitly agree to access internationalized addresses. Conversely, 303 if the administrators of the mail system host are sympathetic to 304 internationalization, it is reasonable to expect that appropriate 305 software can and will be installed at the MTA level. 307 2.2.2 Relay environment 309 As in many other areas with email, the difficulties with an MTA-based 310 model for internationalization of addresses arise, not when the 311 originating MTA communicates directly with the delivery MTA, but when 312 relay MTAs are involved. If the both the sending and receiving 313 systems support internationalized addresses, it is still possible 314 that an intermediate relay will not do so, forcing mail to bounce 315 that could be delivered if there were a direct connection between 316 sender and receiver. But, as with the installation of email 317 addresses on a system, relays do not get inserted in the mail path by 318 accident. If internationalized addresses are important to the 319 destination host, its administrators will chose lower-preference MX 320 hosts or other relays that can support internationalized addresses. 322 2.2.3 Internationalizing the Sender 324 If we assume a destination host that can accept, and properly handle, 325 an internationalized address, and we assume that any MX-designated 326 intermediaries for that host will be chosen to be similarly capable, 327 one situation is left in which it would be advantageous to have an 328 MUA-based solution. If a originating/ sending system is not capable 329 of generating or sending an internationalized address, but the 330 prospective receiving system is, it would be good to enable the 331 originating user to generate and somehow send to the relevant 332 address. 334 This is a real issue, and deserves some serious consideration. But it 335 seems better to find a good temporary, transitional, mechanism for it 336 than to permanently burden the email system with an uncomfortable 337 mechanism just to accommodate this case. One example of a 338 transitional mechanism might be to use ESMTP tunneling over MIME 339 [RFC2442] to route the address and message to a friendly gateway host 340 that would unpack the message and transmit it using this 341 specification. Other examples, less attractive at first glance but 342 still plausible, would include defining and using small variations on 343 the message encapsulation mechanisms that are integral to MIME 344 [RFC2046], or the more complex encapsulation designed for HTML 345 [RFC2557], to accomplish the same purpose. 347 So, a user with an MUA that has the capability to handle an 348 internationalized address, but who does not have access to an 349 originating MTA with the capabilities defined here, may be given 350 access to a reasonable transition strategy until the needed 351 capabilities are available. Note that this does not require an open 352 relay, since all of the user authentication capabilities of ESMTP 353 [RFC2554] and SUBMIT [RFC2476] would be available. One can even 354 imagine a service with a per-message charging system, which would 355 presumably encourage rapid upgrading. 357 2.3 An MUA-based Solution is Unworkable 359 The examples given above are, perhaps obviously, not the only ones. 360 Other issues arise with intermediate MX relay and gateway hosts, 361 commands embedded in local parts, and special formats used in 362 gateways to other environments, among other cases. 364 2.3.1 MX diversion 366 If the domain part of an email address is associated with several MX 367 records and the mail is delivered to one of them that is not the best 368 preference host, the receiving host is not required to use SMTP. If, 369 instead, it performs some gateway function, it may need to inspect or 370 alter the local part to determine how to route and deliver the 371 message. If the local part were encoded in some fashion that 372 prevented that inspection process, and the MTA was not aware that it 373 needed to apply special techniques, mail delivery might well fail. 375 2.3.2 Embedded commands 377 In addition to the address forms with special syntax or semantics 378 described elsewhere, systems have been developed that embed commands 379 in address local parts. These might, of course, use entirely 380 different syntax parts and formats than are typical in conventional 381 addresses and, in an internationalized environment, might reasonably 382 use character coding conventions that are neither ASCII nor 383 Unicode-based. 385 A number of specialized applications of email do require, or 386 recommend, specific syntax in the local part. These are identified, 387 not to indicate that they are the only cases (they are not) but to 388 reinforce the point that one must be quite cautious in doing anything 389 that makes global assumptions about local part syntax and significant 390 characters. These applications include local part explicit routing 391 with the "percent hack" [RFC1123], gateways to and from X.400 392 environments [RFC2156], and gateways to fax systems [RFC3192]. 394 2.4 Encoding the Whole Address String 396 Much of the above demonstrates why selective encoding of parts of the 397 local-part string is not practical. Why, then, not encode the entire 398 string and insist that the delivery MTA recognize the presence of an 399 encoded form and do whatever decoding is needed before it does other 400 processing? There are three major reasons to approach the problem 401 this way: 403 1. Any change in address syntax interpretation is likely to be a 404 major, incompatible, change, since we do not now impose any 405 restrictions on how an MTA is organized or even on how, or 406 whether, the MTA and MUA functions are actually divided up on a 407 given host. Converting user agents to handle international forms 408 of addresses in a way that does not produce user astonishment is 409 likely to be a major undertaking, regardless of what is done to 410 the protocols and at what level. 412 2. Imposing a requirement that MTAs "understand" local-parts so that 413 they can be partially decoded as part of mail routing would seem 414 to defeat the main goal of encoding internationalized strings 415 into a compact ASCII-compatible form, i.e., to keep MTAs from 416 needing to understand the extended naming system 418 3. We potentially have three different encodings of an 419 internationalized string: the one used by the MTA, the one used 420 by the MUA, and the one seen by the user through applications 421 software or the operating system's display interface. Having all 422 three of these identical or closely compatible is desirable from 423 the standpoint of user understanding and debugging. Having them 424 different can cause many "interesting" problems, e.g., having to 425 return an error message that uses different coding, and hence 426 might represent an entirely different string, than the string the 427 user put into the process. 429 Instead, it would seem sensible to move from a straightforward 430 encoding of mail addresses in ASCII to a straightforward encoding in 431 Unicode via UTF-8 [RFC2277], imposing only those restrictions on the 432 characters in the local part that are implied by Unicode itself. 434 2.5 Looking back and looking forward 436 Another principle is implied by some of the discussion above. 437 Internationalization measures for the Internet will be with us for as 438 long as there are multiple languages and scripts in the world, i.e., 439 probably forever. If a satisfactory long-term solution can be found, 440 and a reasonable transition strategy can be defined for it, it is 441 much better to optimize for the long term. The alternative of making 442 things more difficult or less functional forever in order to save 443 some small effort in transition, of even to make the transition a few 444 months faster, represents a very poor tradeoff. 446 2.6 Summary of Design Issues 448 Each of the above subsections describes a strong case for continuing 449 to treat addressing as an MTA function, opaque except at the end 450 systems. The main alternative is to rely on the sending system being 451 able to understand the addressing system of the target host, and any 452 relays accessed through MX relays, potentially needing to be able to 453 remove IDN encoding ("punycode" or otherwise) in order to determine 454 how to process or route the message. That alternative violates a 455 long-standing and important design principle of Internet email, 456 complicates a number of other cases, and does not offer sufficient 457 transition advantages to be worth any of those difficulties. 459 3. A Mail Transport-level Protocol 461 3.1 General Principles and Objectives 463 1. Whatever encoding used should apply to the whole address and be 464 directly compatible with software used at the user interface. 466 2. An SMTP relay must either recognize the format explicitly, 467 agreeing to do so via an ESMTP option, or bounce the message so 468 that the sender can make another plan. 470 3. If any charset other than UTF-8 or punycode is permitted and used 471 for the local part, its interpretation at the "what does this 472 mean" level is the responsibility of the receiving MTA. 474 3.2 Framework for the Internationalization Extension 476 The following service extension is defined: 478 1. the name of the SMTP service extension is "Internationalized 479 Addresses"; 481 2. the EHLO keyword value associated with this extension is "I18N"; 483 3. No parameter values are defined for this EHLO keyword value. In 484 order to permit future (although unanticipated) extensions, the 485 EHLO response MUST NOT contain any parameters. If a parameter 486 appears, the SMTP client that is conformant to this version of 487 this specification MUST treat the ESMTP response as if the I18N 488 keyword did not appear. 490 4. no parameters are added to any SMTP command. 492 [[Note in draft: A variation on this is probably excess 493 complexity, rather than a good tradeoff, but should be considered 494 in terms of whether it would be a good transitional aid. It would 495 be possible to permit an optional parameter on the MAIL and RCPT 496 commands that would specify an all-ASCII address to be used if an 497 MTA (SMTP Sender) encounters an SMTP Receiver that does not 498 support this extension. Such a parameter might be called 499 "AddressVariant" or even just "alias". It would be especially 500 useful in error handling if used on the MAIL command. ]] 502 5. no additional SMTP verbs are defined by this extension. 504 The remainder of this memo specifies how support for the extension 505 affects the behavior of an SMTP client and server. 507 3.3 The Address Internationalization Service Extension 509 In the absence of this extension, SMTP clients and servers are 510 constrained to using only those addresses permitted by RFC 2821. The 511 local parts of those addresses may be made up of any ASCII 512 characters, although certain of them must be quoted as specified 513 there. It is notable in an internationalization context that there 514 is a long history on some systems of using over struck ASCII 515 characters (a character, a backspace, and another character) within a 516 quoted string to approximate non-ASCII characters. This form of 517 internationalization should probably be phased out as this extension 518 becomes widely deployed but backward-compatibility considerations 519 require that it continue to be supported. 521 An SMTP Server that announces this extension MUST be prepared to 522 accept a UTF-8 string [RFC2279] in any position in which RFC 2821 523 specifies that a "mailbox" may appear. That string must be parsed 524 only as specified in RFC 2821, i.e., by separating the mailbox into 525 source route, local part and domain part, using only the characters 526 colon (U+003A), comma (U+002C), and at-sign (U+0040) as specified 527 there. Once isolated by this parsing process, the local part MUST be 528 treated as opaque unless the SMTP Server is the final delivery MTA. 529 Any domain names that are to be looked up in the DNS MUST be 530 processed into punycode form as specified in IDNA [RFC3490] unless 531 they are already in that form. Any domain names that are to be 532 compared to local strings SHOULD be checked for validity and then 533 MUST be compared as specified in IDNA. 535 An SMTP Client that receives the I18N extension keyword MAY transmit 536 a mailbox name as an internationalized string in UTF-8 form. It MAY 537 transmit the domain part of that string in either punycode (derived 538 from the IDNA process) or UTF-8 form but, if it sends the domain in 539 UTF-8, it SHOULD first verify that the string is valid for a domain 540 name according to IDNA rules. As required by RFC 2821, it MUST not 541 attempt to parse, evaluate, or transform the local part in any way. 542 If the I18N SMTP extension is not offered by the Server, the SMTP 543 Client MUST not transmit an internationalized address. Instead, it 544 MUST either return the message to the user as undeliverable or 545 replace it, using some process outside the scope of this 546 specification such as a directory lookup, with a local-part that 547 conforms to the syntax rules of RFC 2821. 549 3.4 Extended Mailbox Address Syntax 551 RFC 2821, section 4.1.2, defines the syntax of a mailbox as 553 Mailbox = Local-part "@" Domain 555 Local-part = Dot-string / Quoted-string 556 ; MAY be case-sensitive 558 Dot-string = Atom *("." Atom) 560 Atom = 1*atext 562 Quoted-string = DQUOTE *qcontent DQUOTE 564 Domain = (sub-domain 1*("." sub-domain)) / address-literal 565 sub-domain = Let-dig [Ldh-str] 567 (see that document for productions and definitions not provided here 568 -- their details are not important to understanding this 569 specification). The key changes made by this specification are, 570 informally, to 572 o Change the definition of "sub-domain" to permit either the 573 definition above or a UTF-8 (or other, see Section 3.6.1) string 574 representing a label that is conformant with IDNA [RFC3490]. That 575 sub-domain string MUST NOT contain the characters "@" or ".". 577 o Change the definition of "Atom" to permit either the definition 578 above or a UTF-8 (or other, see Section 3.6.3) string. That 579 string MUST NOT contain any of the ASCII characters (either 580 graphics or controls) that are not permitted in "atext"; it is 581 otherwise unrestricted. 583 3.5 Additional ESMTP Changes and Clarifications 585 The mail transport process involves addresses ("mailboxes") and 586 domain names in contexts in addition to the MAIL and RCPT commands 587 and extended alternatives to them. In general, the rule is that, 588 when RFC 2821 specifies a mailbox, UTF-8 is used for the entire 589 string; when it specifies a domain name, the name should be in 590 punycode form if its raw form is non-ASCII. 592 The following subsections list and discuss all of the relevant cases. 593 [[Note in draft: I hope]] 595 3.5.1 The Initial SMTP Exchange 597 When an SMTP or ESMTP connection is opened, the server sends a 598 "banner" response consisting of the 220 reply code and some 599 information. The client then sends the EHLO command. Since the 600 client cannot know whether the server supports internationalized 601 addresses until after it receives the response from EHLO, any domain 602 names that appear in this dialogue, or in responses to EHLO, must be 603 in hostname form, i.e., internationalized ones must be in punycode 604 form. 606 3.5.2 Trace Fields 608 Internationalized domain names in Received fields should be 609 transmitted in Unicode form. Addresses in "for" clauses need 610 further examination and might be treated differently depending on 611 whether 8BITMIME is a requirement for internationalized addresses. 613 3.6 Protocol Loose Ends 614 These issues should be resolved, and this section eliminated, before 615 the document is considered complete. 617 3.6.1 Punycode in Domain Names? 619 It is not clear whether the flexibility of being able to pass domain 620 names in punycode, as well as UTF-8, form is needed. If it is not, 621 it should be eliminated as excess complexity. 623 3.6.2 Local Character Codes in Local Parts? 625 There are some reasons for permitting local-parts to be written in 626 locally-used character codes, i.e., in other than the UTF-8 encoding 627 of UNICODE. It clearly increases flexibility, and the mailbox part 628 can be defined as a simple octet string (as it essentially is in the 629 sections above). We can reasonably expect that some systems, 630 operating in local environments, will use local character codes no 631 matter what we specify. On the other hand, having an application 632 presented with an octet (or bit) string and not knowing what charset 633 is involved would wreak havoc on any attempt to intelligently display 634 local parts: if one cannot know the character coding being used, then 635 it is not possible to accurately decode the characters and display 636 appropriate character glyphs. 638 Use of local coding also implies an encoding for the local part 639 different from that for the domain part -- any MTA in the path must 640 be able to resolve the domain part into something that can be looked 641 up in the DNS and resolved and that, in turn, requires a 642 globally-known encoding. 644 3.6.3 Restrictions on Characters in Local Part? 646 The specification is extremely liberal about what can be included in 647 a UTF-8 string that represents a local-part. In return, it 648 effectively prohibits the use of quoted strings, or quoted 649 characters, in non-ASCII local parts. Those have, in general, been 650 nothing but trouble and there appears to be no reason to carry that 651 trouble forward into an internationalized world (and the much greater 652 complexity that quoting in that environment might imply). There may 653 be a strong case for applying restrictions, e.g., by use of a 654 stringprep [RFC3454] profile that would eliminate particularly 655 problematic characters while not forcing, e.g., even an approximation 656 to case-mapping (remember that ASCII local-parts are inherently case 657 sensitive, even though local systems are encouraged to not take 658 advantage of that feature). 660 3.6.4 Requirement for 8BITMIME? 661 This extension is carefully defined to be independent of "8BITMIME". 662 However, given the length of time 8BITMIME has been around, the 663 amount of deployment of it that exists, and the rather low likelihood 664 that any MTA implementer in his or her right mind will go to the 665 trouble of implementing this without also implementing 8BITMIME, it 666 may be sensible to permit this extension only if 8BITMIME also 667 appears. 669 3.6.5 Message Header and Body Issues with MTA Approach? 671 By viewing i18n addresses as an MTA problem, this document does not 672 address a number of interesting 2822/MIME issues. In particular, if 673 both this extension and 8BITMIME are in use, is it sensible to drop 674 the requirement for RFC 2047/ 2231 encoding of personal name fields? 676 3.6.6 Variant Addresses (Aliases) in a Command Verb 678 A determination should be made as to whether a parameter to the MAIL 679 and RCPT commands that would specify an alternate, ASCII-only, 680 address is desirable and the text in Section 3.2, item 4, corrected 681 accordingly. 683 3.6.7 The Received field 'for' clause 685 Decide what to do about the value of the "for" clause in Received 686 fields. See Section 3.5.2. 688 4. Advice to Designers and Operators of Mail-receiving Systems 690 As discussed above, in the historical Internet email context, the 691 interpretation and permitted syntax for an email local-part is 692 entirely the responsibility of the receiving system. Systems can get 693 themselves into trouble and, more particularly, can seriously 694 restrict the number and type of users who can send mail to their 695 users, by poor choices of format and syntax. For example, general 696 advice to system designers has long included "treat addresses in a 697 case-independent fashion" and "do not use addresses that require 698 quoting" in order to increase the odds that remote users will be able 699 to properly compose and transmit intended addresses. In a way, that 700 advice is an extreme generalization of the "receiver" side of the 701 robustness principle: being generous in what one accepts implies 702 accepting as many plausible variations of an address local-part 703 string as possible and designing the strict forms of those strings to 704 facilitate differentiation when it is appropriate. 706 As one moves toward internationalization of local parts, an expanded 707 version of these principles is useful and may be even more 708 appropriate, even though it is neither necessary nor desirable to 709 turn those principles into protocol requirements. For example, a 710 receiving host should normally consider any string that would match 711 under nameprep rules --or perhaps any string that would match under 712 an expanded stringprep protocol-- as matching for local-part 713 purposes. An even more "liberal" receiving host might use some sort 714 of variant tables for its script(s) of interest to further expand the 715 matching rules. 717 But, whatever extended matching rules the local host adopts, those 718 rules are a property of that host. Senders should continue to be 719 conservative about what they send, and relays should continue to 720 avoid presumptions about their understanding of the content of 721 local-parts. Receiving systems that have reason to adopt more 722 restricted syntax rules, or interpretations of matching, should 723 continue to be able to do so. 725 5. Security considerations 727 Any expansion of permitted characters and encoding forms in email 728 addresses raises the risk, however slight, of misdirected or 729 undeliverable mail. The problem is worsened if address information 730 is carried in local character sets and must be converted to some 731 standard form. Any conversion of character sets may also be 732 problematic for digitally-signed information. Modulo those concerns, 733 the ideas proposed here do not introduce new security issues. 735 6. Acknowledgements 737 The author acknowledges the contributions and comments of Dave 738 Crocker in a personal conversation, and the efforts of a private 739 discussion group, led by Paul Hoffman and Adam Costello, to develop 740 an MUA-only solution to this problem. The author had hoped that 741 effort would succeed, since the idea of requiring transport changes 742 to support internationalization (or any other new function) is 743 unattractive and should be avoided when possible. Difficulties that 744 group has encountered in properly defining a number of boundary 745 conditions, including appropriate delimiters for permitting internal 746 parsing of the local part and problems with right-to-left characters 747 and substrings, have led to the conclusion that it is time to get a 748 specific, transport-based, approach on the table. While their ideas 749 have inspired several of the properties of this proposal they are, of 750 course, not responsible for the result and will probably disagree 751 with it. 753 Normative References 755 [RFC0821] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 756 821, August 1982. 758 [RFC1123] Braden, R., "Requirements for Internet Hosts - Application 759 and Support", STD 3, RFC 1123, October 1989. 761 [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 762 10646", RFC 2279, January 1998. 764 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 765 April 2001. 767 [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, 768 "Internationalizing Domain Names in Applications (IDNA)", 769 RFC 3490, March 2003. 771 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 772 Profile for Internationalized Domain Names (IDN)", RFC 773 3491, March 2003. 775 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 776 for Internationalized Domain Names in Applications 777 (IDNA)", RFC 3492, March 2003. 779 Informative References 781 [I-D.hoffman-imaa] 782 Hoffman, P. and A. Costello, "Internationalizing Mail 783 Addresses in Applications (IMAA)", draft-hoffman-imaa-02 784 (work in progress), August 2003. 786 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 787 Extensions (MIME) Part One: Format of Internet Message 788 Bodies", RFC 2045, November 1996. 790 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 791 Extensions (MIME) Part Two: Media Types", RFC 2046, 792 November 1996. 794 [RFC2056] Denenberg, R., Kunze, J. and D. Lynch, "Uniform Resource 795 Locators for Z39.50", RFC 2056, November 1996. 797 [RFC2156] Kille, S., "MIXER (Mime Internet X.400 Enhanced Relay): 798 Mapping between X.400 and RFC 822/MIME", RFC 2156, January 799 1998. 801 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and 802 Languages", BCP 18, RFC 2277, January 1998. 804 [RFC2442] Freed, N., Newman, D. and Hoy, M., "The Batch SMTP Media 805 Type", RFC 2442, November 1998. 807 [RFC2476] Gellens, R. and J. Klensin, "Message Submission", RFC 808 2476, December 1998. 810 [RFC2554] Myers, J., "SMTP Service Extension for Authentication", 811 RFC 2554, March 1999. 813 [RFC2556] Bradner, S., "OSI connectionless transport services on top 814 of UDP Applicability Statement for Historic Status", RFC 815 2556, March 1999. 817 [RFC2557] Palme, F., Hopmann, A., Shelness, N. and E. Stefferud, 818 "MIME Encapsulation of Aggregate Documents, such as HTML 819 (MHTML)", RFC 2557, March 1999. 821 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 822 2001. 824 [RFC3192] Allocchio, C., "Minimal FAX address format in Internet 825 Mail", RFC 3192, October 2001. 827 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 828 Internationalized Strings ("stringprep")", RFC 3454, 829 December 2002. 831 Author's Address 833 John C Klensin 834 1770 Massachusetts Ave, #322 835 Cambridge, MA 02140 836 USA 838 Phone: +1 617 491 5735 839 EMail: john-ietf@jck.com 841 Intellectual Property Statement 843 The IETF takes no position regarding the validity or scope of any 844 intellectual property or other rights that might be claimed to 845 pertain to the implementation or use of the technology described in 846 this document or the extent to which any license under such rights 847 might or might not be available; neither does it represent that it 848 has made any effort to identify any such rights. Information on the 849 IETF's procedures with respect to rights in standards-track and 850 standards-related documentation can be found in BCP-11. Copies of 851 claims of rights made available for publication and any assurances of 852 licenses to be made available, or the result of an attempt made to 853 obtain a general license or permission for the use of such 854 proprietary rights by implementors or users of this specification can 855 be obtained from the IETF Secretariat. 857 The IETF invites any interested party to bring to its attention any 858 copyrights, patents or patent applications, or other proprietary 859 rights which may cover technology that may be required to practice 860 this standard. Please address the information to the IETF Executive 861 Director. 863 Full Copyright Statement 865 Copyright (C) The Internet Society (2003). All Rights Reserved. 867 This document and translations of it may be copied and furnished to 868 others, and derivative works that comment on or otherwise explain it 869 or assist in its implementation may be prepared, copied, published 870 and distributed, in whole or in part, without restriction of any 871 kind, provided that the above copyright notice and this paragraph are 872 included on all such copies and derivative works. However, this 873 document itself may not be modified in any way, such as by removing 874 the copyright notice or references to the Internet Society or other 875 Internet organizations, except as needed for the purpose of 876 developing Internet standards in which case the procedures for 877 copyrights defined in the Internet Standards process must be 878 followed, or as required to translate it into languages other than 879 English. 881 The limited permissions granted above are perpetual and will not be 882 revoked by the Internet Society or its successors or assignees. 884 This document and the information contained herein is provided on an 885 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 886 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 887 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 888 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 889 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 891 Acknowledgment 893 Funding for the RFC Editor function is currently provided by the 894 Internet Society.