idnits 2.17.1 draft-ietf-eai-frmwrk-4952bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 25, 2010) is 5052 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 1652 (Obsoleted by RFC 6152) -- Obsolete informational reference (is this intentional?): RFC 2368 (Obsoleted by RFC 6068) -- Obsolete informational reference (is this intentional?): RFC 3851 (Obsoleted by RFC 5751) -- Obsolete informational reference (is this intentional?): RFC 4409 (Obsoleted by RFC 6409) -- Obsolete informational reference (is this intentional?): RFC 4952 (Obsoleted by RFC 6530) -- Obsolete informational reference (is this intentional?): RFC 5335 (Obsoleted by RFC 6532) -- Obsolete informational reference (is this intentional?): RFC 5336 (Obsoleted by RFC 6531) -- Obsolete informational reference (is this intentional?): RFC 5337 (Obsoleted by RFC 6533) -- Obsolete informational reference (is this intentional?): RFC 5504 (Obsoleted by RFC 6530) -- Obsolete informational reference (is this intentional?): RFC 5721 (Obsoleted by RFC 6856) -- Obsolete informational reference (is this intentional?): RFC 5738 (Obsoleted by RFC 6855) -- Obsolete informational reference (is this intentional?): RFC 5825 (Obsoleted by RFC 6530) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Email Address Internationalization J. Klensin 3 (EAI) 4 Internet-Draft Y. Ko 5 Obsoletes: RFC4952 ICU 6 (if approved) June 25, 2010 7 Intended status: Informational 8 Expires: December 27, 2010 10 Overview and Framework for Internationalized Email 11 draft-ietf-eai-frmwrk-4952bis-00 13 Abstract 15 Full use of electronic mail throughout the world requires that, 16 subject to other constraints, people be able to use close variations 17 on their own names, written correctly in their own languages and 18 scripts, as mailbox names in email addresses. This document 19 introduces a series of specifications that define mechanisms and 20 protocol extensions needed to fully support internationalized email 21 addresses. These changes include an SMTP extension and extension of 22 email header syntax to accommodate UTF-8 data. The document set also 23 includes discussion of key assumptions and issues in deploying fully 24 internationalized email. This document is an update of RFC 4952 that 25 reflects additional issues identified since that document was 26 published. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on December 27, 2010. 45 Copyright Notice 47 Copyright (c) 2010 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 This document may contain material from IETF Documents or IETF 61 Contributions published or made publicly available before November 62 10, 2008. The person(s) controlling the copyright in some of this 63 material may not have granted the IETF Trust the right to allow 64 modifications of such material outside the IETF Standards Process. 65 Without obtaining an adequate license from the person(s) controlling 66 the copyright in such materials, this document may not be modified 67 outside the IETF Standards Process, and derivative works of it may 68 not be created outside the IETF Standards Process, except to format 69 it for publication as an RFC or to translate it into languages other 70 than English. 72 Table of Contents 74 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 2. Role of This Specification . . . . . . . . . . . . . . . . . . 4 76 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5 77 4. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 78 4.1. Mail User and Mail Transfer Agents . . . . . . . . . . . . 6 79 4.2. Address Character Sets . . . . . . . . . . . . . . . . . . 7 80 4.3. User Types . . . . . . . . . . . . . . . . . . . . . . . . 7 81 4.4. Messages . . . . . . . . . . . . . . . . . . . . . . . . . 7 82 4.5. Mailing Lists . . . . . . . . . . . . . . . . . . . . . . 8 83 4.6. Conventional Message and Internationalized Message . . . . 8 84 4.7. Undeliverable Messages and Notification . . . . . . . . . 8 85 5. Overview of the Approach . . . . . . . . . . . . . . . . . . . 9 86 6. Document Plan . . . . . . . . . . . . . . . . . . . . . . . . 9 87 7. Overview of Protocol Extensions and Changes . . . . . . . . . 9 88 7.1. SMTP Extension for Internationalized Email Address . . . . 9 89 7.2. Transmission of Email Header Fields in UTF-8 Encoding . . 11 90 8. Downgrading before and after SMTP Transactions . . . . . . . . 11 91 8.1. Downgrading before or during Message Submission . . . . . 12 92 8.2. Downgrading or Other Processing After Final SMTP 93 Delivery . . . . . . . . . . . . . . . . . . . . . . . . . 13 94 9. Downgrading in Transit . . . . . . . . . . . . . . . . . . . . 13 95 10. User Interface and Configuration Issues . . . . . . . . . . . 13 96 10.1. Choices of Mailbox Names and Unicode Normalization . . . . 14 97 11. Additional Issues . . . . . . . . . . . . . . . . . . . . . . 15 98 11.1. Impact on URIs and IRIs . . . . . . . . . . . . . . . . . 15 99 11.2. Interaction with Delivery Notifications . . . . . . . . . 15 100 11.3. Use of Email Addresses as Identifiers . . . . . . . . . . 16 101 11.4. Encoded Words, Signed Messages, and Downgrading . . . . . 16 102 11.5. LMTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 103 11.6. SMTP Service Extension for DSNs . . . . . . . . . . . . . 17 104 11.7. Other Uses of Local Parts . . . . . . . . . . . . . . . . 17 105 11.8. Non-Standard Encapsulation Formats . . . . . . . . . . . . 17 106 12. Experimental Targets . . . . . . . . . . . . . . . . . . . . . 17 107 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 108 14. Security Considerations . . . . . . . . . . . . . . . . . . . 18 109 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 110 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 111 16.1. Normative References . . . . . . . . . . . . . . . . . . . 20 112 16.2. Informative References . . . . . . . . . . . . . . . . . . 21 114 1. Introduction 116 [[anchor1: Note to EAI WG: this initial draft is intended to initiate 117 discussion on what should, and should not, be in the Framework 118 document and how we want those topics covered. As such, it is more 119 of an intermediate draft between RFC 4952 and the first draft of 120 4952bis that could be a Last Call candidate. If we are going to keep 121 the rather aggressive schedule we agreed to in the charter, we need 122 to have enough discussion on critical-path points that a revision 123 suitable (at least) for final review prior to Last Call can be posted 124 before the 12 July I-D cutoff. For that to happen, we should have 125 enough discussion to start determining consensus within the next ten 126 days. So, focused comments and soon, please.]] 128 In order to use internationalized email addresses, we need to 129 internationalize both the domain part and the local part of email 130 addresses. The domain part of email addresses is already 131 internationalized [RFC5890], while the local part is not. Without 132 the extensions specified in this document, the mailbox name is 133 restricted to a subset of 7-bit ASCII [RFC5321]. Though MIME 134 [RFC2045] enables the transport of non-ASCII data, it does not 135 provide a mechanism for internationalized email addresses. In RFC 136 2047 [RFC2047], MIME defines an encoding mechanism for some specific 137 message header fields to accommodate non-ASCII data. However, it 138 does not permit the use of email addresses that include non-ASCII 139 characters. Without the extensions defined here, or some equivalent 140 set, the only way to incorporate non-ASCII characters in any part of 141 email addresses is to use RFC 2047 coding to embed them in what RFC 142 5322 [RFC5322] calls the "display name" (known as a "name phrase" or 143 by other terms elsewhere) of the relevant headers. Information coded 144 into the display name is invisible in the message envelope and, for 145 many purposes, is not part of the address at all. 147 This document is an update of RFC 4952 [RFC4952] that reflects 148 additional issues, shared terminology, and some architectural changes 149 identified since that document was published. 151 The pronouns "he" and "she" are used interchangeably to indicate a 152 human of indeterminate gender. 154 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", 155 and "MAY" in this document are to be interpreted as described in RFC 156 2119 [RFC2119]. 158 2. Role of This Specification 160 This document presents the overview and framework for an approach to 161 the next stage of email internationalization. This new stage 162 requires not only internationalization of addresses and headers, but 163 also associated transport and delivery models. A prior version of 164 this specification, RFC 4952 [RFC4952], also provided an introduction 165 to a series of experimental protocols [RFC5335] [RFC5336] [RFC5337] 166 [RFC5504] [RFC5721] [RFC5738] [RFC5825]. 167 [[anchor2: Note in Draft: Is 5825 still relevant, or is a victim of 168 the "no in-transit downgrade" decision.??]] 169 This revised form provides overview and conceptual information for 170 the standards-track successors of those protocols. Details of the 171 documents and the relationships among them appear in Section 6. 173 Taken together, these specifications provide the details for a way to 174 implement and support internationalized email. The document itself 175 describes how the various elements of email internationalization fit 176 together and the relationships among the [[anchor3: ??? provides a 177 roadmap for navigating the]] various documents are involved. 179 3. Problem Statement 181 Internationalizing Domain Names in Applications (IDNA) [RFC5890] 182 permits internationalized domain names, but deployment has not yet 183 reached most users. One of the reasons for this is that we do not 184 yet have fully internationalized naming schemes. Domain names are 185 just one of the various names and identifiers that are required to be 186 internationalized. In many contexts, until more of those identifiers 187 are internationalized, internationalized domain names alone have 188 little value. 190 Email addresses are prime examples of why it is not good enough to 191 just internationalize the domain name. As most of us have learned 192 from experience, users strongly prefer email addresses that resemble 193 names or initials to those involving seemingly meaningless strings of 194 letters or numbers. Unless the entire email address can use familiar 195 characters and formats, users will perceive email as being culturally 196 unfriendly. If the names and initials used in email addresses can be 197 expressed in the native languages and writing systems of the users, 198 the Internet will be perceived as more natural, especially by those 199 whose native language is not written in a subset of a Roman-derived 200 script. 202 Internationalization of email addresses is not merely a matter of 203 changing the SMTP envelope; or of modifying the From, To, and Cc 204 headers; or of permitting upgraded Mail User Agents (MUAs) to decode 205 a special coding and respond by displaying local characters. To be 206 perceived as usable, the addresses must be internationalized and 207 handled consistently in all of the contexts in which they occur. 208 This requirement has far-reaching implications: collections of 209 patches and workarounds are not adequate. Even if they were 210 adequate, a workaround-based approach may result in an assortment of 211 implementations with different sets of patches and workarounds having 212 been applied with consequent user confusion about what is actually 213 usable and supported. Instead, we need to build a fully 214 internationalized email environment, focusing on permitting efficient 215 communication among those who share a language or other community. 216 That, in turn, implies changes to the mail header environment to 217 permit the full range of Unicode characters where that makes sense, 218 an SMTP Extension to permit UTF-8 [RFC3629] mail addressing and 219 delivery of those extended headers, and (finally) a requirement for 220 support of the 8BITMIME SMTP Extension [RFC1652] so that all of these 221 can be transported through the mail system without having to overcome 222 the limitation that headers do not have content-transfer-encodings. 224 4. Terminology 226 This document assumes a reasonable understanding of the protocols and 227 terminology of the core email standards as documented in [RFC5321] 228 and [RFC5322]. 230 4.1. Mail User and Mail Transfer Agents 232 Much of the description in this document depends on the abstractions 233 of "Mail Transfer Agent" ("MTA") and "Mail User Agent" ("MUA"). 234 However, it is important to understand that those terms and the 235 underlying concepts postdate the design of the Internet's email 236 architecture and the application of the "protocols on the wire" 237 principle to it. That email architecture, as it has evolved, and 238 that "wire" principle have prevented any strong and standardized 239 distinctions about how MTAs and MUAs interact on a given origin or 240 destination host (or even whether they are separate). 242 However, the term "final delivery MTA" is used in this document in a 243 fashion equivalent to the term "delivery system" or "final delivery 244 system" of RFC 5321. This is the SMTP server that controls the 245 format of the local parts of addresses and is permitted to inspect 246 and interpret them. It receives messages from the network for 247 delivery to mailboxes or for other local processing, including any 248 forwarding or aliasing that changes envelope addresses, rather than 249 relaying. From the perspective of the network, any local delivery 250 arrangements such as saving to a message store, handoff to specific 251 message delivery programs or agents, and mechanisms for retrieving 252 messages are all "behind" the final delivery MTA and hence are not 253 part of the SMTP transport or delivery process. 255 4.2. Address Character Sets 257 In this document, an address is "all-ASCII", or just an "ASCII 258 address", if every character in the address is in the ASCII character 259 repertoire [ASCII]; an address is "non-ASCII", or an "i18n-address", 260 if any character is not in the ASCII character repertoire. Such 261 addresses may be restricted in other ways, but those restrictions are 262 not relevant to this definition. The term "all-ASCII" is also 263 applied to other protocol elements when the distinction is important, 264 with "non-ASCII" or "internationalized" as its opposite. 266 The umbrella term to describe the email address internationalization 267 specified by this document and its companion documents is 268 "UTF8SMTPbis". 269 [[anchor7: Note in Draft: Keyword to be changed before publication.]] 270 For example, an address permitted by this specification is referred 271 to as a "UTF8SMTPbis (compliant) address". 273 Please note that, according to the definitions given here, the set of 274 all "all-ASCII" addresses and the set of all "non-ASCII" addresses 275 are mutually exclusive. The set of all addresses permitted when 276 UTF8SMTPbis appears is the union of these two sets. 278 4.3. User Types 280 An "ASCII user" (i) exclusively uses email addresses that contain 281 ASCII characters only, and (ii) cannot generate recipient addresses 282 that contain non-ASCII characters. 284 An "i18mail user" has one or more non-ASCII email addresses. Such a 285 user may have ASCII addresses too; if the user has more than one 286 email account and a corresponding address, or more than one alias for 287 the same address, he or she has some method to choose which address 288 to use on outgoing email. Note that under this definition, it is not 289 possible to tell from an ASCII address if the owner of that address 290 is an i18mail user or not. (A non-ASCII address implies a belief 291 that the owner of that address is an i18mail user.) There is no such 292 thing as an "i18mail message"; the term applies only to users and 293 their agents and capabilities. 295 4.4. Messages 297 A "message" is sent from one user (sender) using a particular email 298 address to one or more other recipient email addresses (often 299 referred to just as "users" or "recipient users"). 301 4.5. Mailing Lists 303 A "mailing list" is a mechanism whereby a message may be distributed 304 to multiple recipients by sending it to one recipient address. An 305 agent (typically not a human being) at that single address then 306 causes the message to be redistributed to the target recipients. 307 This agent sets the envelope return address of the redistributed 308 message to a different address from that of the original single 309 recipient message. Using a different envelope return address 310 (reverse-path) causes error (and other automatically generated) 311 messages to go to an error handling address. 313 Special provisions for managing mailing lists that might contain non- 314 ASCII addresses are discussed in a document that is specific to that 315 topic [EAI-Mailinglist]. 317 4.6. Conventional Message and Internationalized Message 319 o A conventional message is one that does not use any extension 320 defined in the SMTP extension document [RFC5336] or in the 321 UTF8header specification [RFC5335], and is strictly conformant to 322 RFC 5322 [RFC5322]. 324 o An internationalized message is a message utilizing one or more of 325 the extensions defined in this specification or in the UTF8header 326 specification [RFC5335], so that it is no longer conformant to the 327 RFC 5322 specification of a message. 329 4.7. Undeliverable Messages and Notification 331 As specified in RFC 5321, a message that is undeliverable for some 332 reason is expected to result in notification to the sender. This can 333 occur in either of two ways. One, typically called "Rejection", 334 occurs when an SMTP server returns a reply code indicating a fatal 335 error (a "5yz" code) or persistently returns a temporary failure 336 error (a "4yz" code). The other involves accepting the message 337 during SMTP processing and then generating a message to the sender, 338 typically known as a "Non-delivery Notification" or "NDN". Current 339 practice often favors rejection over NDNs because of the reduced 340 likelihood that the generation of NDNs will be used as a spamming 341 technique. The latter, NDN, case is unavoidable if an intermediate 342 MTA accepts a message that is then rejected by the next-hop server. 343 [[anchor13: ??? The term "bounce" is used informally below to cover 344 both the rejection and NDN cases.]] 346 5. Overview of the Approach 348 This set of specifications changes both SMTP and the format of email 349 headers to permit non-ASCII characters to be represented directly. 350 Each important component of the work is described in a separate 351 document. The document set, whose members are described in the next 352 section, also contains informational documents whose purpose is to 353 provide implementation suggestions and guidance for the protocols. 355 6. Document Plan 357 In addition to this document, the following documents make up this 358 specification and provide advice and context for it. 360 [[anchor15: ... Note to WG: if we actually include a list here, the 361 result will be that this document can be approved, but not published, 362 until those documents on the list are complete. I'm inclined to list 363 the SMTP extension and headers documents only and hand-wave about the 364 rest, but we need to discuss. Version -00 simply refers to the 365 current Experimental documents --Editor.]] 367 o SMTP extensions. This document [RFC5336] provides an SMTP 368 extension (as provided for in RFC 5321) for internationalized 369 addresses. 371 o Email headers in UTF-8. This document [RFC5335] essentially 372 updates RFC 5322 to permit some information in email headers to be 373 expressed directly by Unicode characters encoded in UTF-8 when the 374 SMTP extension described above is used. This document, possibly 375 with one or more supplemental ones, will also need to address the 376 interactions with MIME, including relationships between 377 UTF8SMTPbis and internal MIME headers and content types. 379 o Extensions to the IMAP protocol to support internationalized 380 headers [RFC5738]. 382 o Parallel extensions to the POP protocol [RFC5721]. 384 o Description of internationalization changes for delivery 385 notifications (DSNs) [EAI-DSN]. 387 7. Overview of Protocol Extensions and Changes 389 7.1. SMTP Extension for Internationalized Email Address 391 An SMTP extension, "UTF8SMTPbis" is specified as follows: 393 o Permits the use of UTF-8 strings in email addresses, both local 394 parts and domain names. 396 o Permits the selective use of UTF-8 strings in email headers (see 397 Section 7.2). 399 o Requires that the server advertise the 8BITMIME extension 400 [RFC1652] and that the client support 8-bit transmission so that 401 header information can be transmitted without using a special 402 content-transfer-encoding. 404 Some general principles affect the development decisions underlying 405 this work. 407 1. Email addresses enter subsystems (such as a user interface) that 408 may perform charset conversions or other encoding changes. When 409 the left hand side of the address includes characters outside the 410 US-ASCII character repertoire, use of punycode on the right hand 411 side is discouraged to promote consistent processing of 412 characters throughout the address. 414 2. An SMTP relay must 416 * Either recognize the format explicitly, agreeing to do so via 417 an ESMTP option, or 419 * Reject the message or, if necessary, return a non-delivery 420 notification message, so that the sender can make another 421 plan. 423 3. If the message cannot be forwarded because the next-hop system 424 cannot accept the extension it MUST be rejected or a non-delivery 425 message generated and sent. 427 4. In the interest of interoperability, charsets other than UTF-8 428 are prohibited in mail addresses and headers being transmitted 429 over the Internet. There is no practical way to identify 430 multiple charsets properly with an extension similar to this 431 without introducing great complexity. 433 Conformance to the group of standards specified here for email 434 transport and delivery requires implementation of the SMTP Extension 435 specification, including recognition of the keywords associated with 436 alternate addresses, and the UTF-8 Header specification. If the 437 system implements IMAP or POP, it MUST conform to the i18n IMAP or 438 POP specifications respectively. 440 7.2. Transmission of Email Header Fields in UTF-8 Encoding 442 There are many places in MUAs or in a user presentation in which 443 email addresses or domain names appear. Examples include the 444 conventional From, To, or Cc header fields; Message-ID and 445 In-Reply-To header fields that normally contain domain names (but 446 that may be a special case); and in message bodies. Each of these 447 must be examined from an internationalization perspective. The user 448 will expect to see mailbox and domain names in local characters, and 449 to see them consistently. If non-obvious encodings, such as 450 protocol-specific ASCII-Compatible Encoding (ACE) variants, are used, 451 the user will inevitably, if only occasionally, see them rather than 452 "native" characters and will find that discomfiting or astonishing. 453 Similarly, if different codings are used for mail transport and 454 message bodies, the user is particularly likely to be surprised, if 455 only as a consequence of the long-established "things leak" 456 principle. The only practical way to avoid these sources of 457 discomfort, in both the medium and the longer term, is to have the 458 encodings used in transport be as similar to the encodings used in 459 message headers and message bodies as possible. 461 When email local parts are internationalized, it seems clear that 462 they should be accompanied by arrangements for the email headers to 463 be in the fully internationalized form. That form should presumably 464 use UTF-8 rather than ASCII as the base character set for the 465 contents of header fields (protocol elements such as the header field 466 names themselves will remain entirely in ASCII). For transition 467 purposes and compatibility with legacy systems, this can done by 468 extending the encoding models of [RFC2045] and [RFC2231]. However, 469 target is fully internationalized headers, as discussed in [RFC5335] 470 and not an extended and painful transition. 472 8. Downgrading before and after SMTP Transactions 474 An important issue with these extensions is how to handle 475 interactions between systems that support non-ASCII addresses and 476 legacy systems that expect ASCII. There is, of course, no problem 477 with ASCII-only systems sending to those that can handle 478 internationalized forms because the ASCII forms are just a proper 479 subset. But, when systems that support these extensions send mail, 480 they may include non-ASCII addresses for senders, receivers, or both 481 and might also provide non-ASCII header information other than 482 addresses. If the extension is not supported by the first-hop system 483 (SMTP server accessed by the Submission server acting as an SMTP 484 client), message originating systems should be prepared to either 485 send conventional envelopes and headers or to return the message to 486 the originating user so the message may be manually downgraded to the 487 traditional form, possibly using encoded words [RFC2047] in the 488 headers. Of course, such transformations imply that the originating 489 user or system must have ASCII-only addresses available for all 490 senders and recipients. Mechanisms by which such addresses may be 491 found or identified are outside the scope of these specifications as 492 are decisions about the design of originating systems such as whether 493 any required transformations are made by the user, the originating 494 MUA, or the Submission server. 496 A somewhat more complex situation arises when the first-hop system 497 supports these extensions but some subsequent server in the SMTP 498 transmission chain does not. It is important to note that most cases 499 of that situation will be the result of configuration errors: 500 especially if it hosts non-ASCII addresses, a final delivery server 501 that accepts these extensions should not be configured with lower- 502 preference MX hosts that do not. While the experiments that preceded 503 these specifications included a mechanism for passing backup ASCII 504 addresses to intermediate relay systems and having those systems 505 alter the headers and substitute the addresses, the requirements and 506 long-term implications of that system proved too complex to be 507 satisfactory. Consequently, if an intermediate SMTP relay that is 508 transmitting a message that requires these extensions and discovers 509 that the next system in the chain does not support them, it will have 510 little choice other than to reject or return the message. 512 As discussed above, downgrading to an ASCII-only form may occur 513 before or during the initial message submission. It might also occur 514 after the delivery to the final delivery MTA in order to accommodate 515 messages stores or IMAP or POP servers or clients that have different 516 capabilities than the delivery MTA. These two cases are discussed in 517 the subsections below. 519 8.1. Downgrading before or during Message Submission 521 Perhaps obviously, the most convenient time to find an ASCII address 522 corresponding to an internationalized address is at the originating 523 MUA. This can occur either before the message is sent or after the 524 internationalized form of the message is rejected. It is also the 525 most convenient time to convert a message from the internationalized 526 form into conventional ASCII form or to generate a non-delivery 527 message to the sender if either is necessary. At that point, the 528 user has a full range of choices available, including contacting the 529 intended recipient out of band for an alternate address, consulting 530 appropriate directories, arranging for translation of both addresses 531 and message content into a different language, and so on. While it 532 is natural to think of message downgrading as optimally being a 533 fully-automated process, we should not underestimate the capabilities 534 of a user of at least moderate intelligence who wishes to communicate 535 with another such user. 537 In this context, one can easily imagine modifications to message 538 submission servers (as described in [RFC4409]) so that they would 539 perform downgrading, or perhaps even upgrading, operations, receiving 540 messages with one or more of the internationalization extensions 541 discussed here and adapting the outgoing message, as needed, to 542 respond to the delivery or next-hop environment it encounters. 544 8.2. Downgrading or Other Processing After Final SMTP Delivery 546 When an email message is received by a final delivery SMTP server, it 547 is usually stored in some form. Then it is retrieved either by 548 software that reads the stored form directly or by client software 549 via some email retrieval mechanisms such as POP or IMAP. 551 The SMTP extension described in Section 7.1 provides protection only 552 in transport. It does not prevent MUAs and email retrieval 553 mechanisms that have not been upgraded to understand 554 internationalized addresses and UTF-8 headers from accessing stored 555 internationalized emails. 557 Since the final delivery SMTP server (or, to be more specific, its 558 corresponding mail storage agent) cannot safely assume that agents 559 accessing email storage will always be capable of handling the 560 extensions proposed here, it MAY either downgrade internationalized 561 emails or specially identify messages that utilize these extensions, 562 or both. If this is done, the final delivery SMTP server SHOULD 563 include a mechanism to preserve or recover the original 564 internationalized forms without information loss to support access by 565 UTF8SMTPbis-aware agents. 567 9. Downgrading in Transit 569 [[anchor19: Note in Draft and Question for the WG: We could discuss 570 the various issues with in-transit downgrading including the 571 complexities of carrying backup addresses, the problems that 572 motivated the "don't mess with addresses in transit" (paraphrased, 573 obviously) rule in RFC 5321 and friends, and so on. Or we could omit 574 it (and this section). Pragmatically, I think it would take us some 575 time to reach consensus on what, exactly, should be said and that 576 might delay progress. But input is clearly needed.]] 578 10. User Interface and Configuration Issues 580 Internationalization of addresses and headers, especially in 581 combination with variations on character coding that are inherent to 582 Unicode, may make careful choices of addresses and careful 583 configuration of servers and DNS records even more important than 584 they are for traditional Internet email. It is likely that, as 585 experience develops with the use of these protocols, it will be 586 desirable to produce one or more additional documents that offer 587 guidance for configuration and interfaces. A document that discusses 588 issues with mail user agents (MUAs), especially with regard to 589 downgrading, is expected to be developed in the EAI Working Group. 590 The subsections below address some other issues. 592 10.1. Choices of Mailbox Names and Unicode Normalization 594 It has long been the case the email syntax permits choices about 595 mailbox names that that are unwise in practice if one actually 596 intends the mailboxes to be accessible to a broad range of senders. 597 The most-often-cited examples involve the use of case-sensitivity and 598 tricky quoting of embedded characters in mailbox local parts. While 599 these are permitted by the protocols and servers are expected to 600 support them and there are special cases where they can provide 601 value, taking advantage of those features is almost always bad 602 practice. 604 In the absence of this extension, SMTP clients and servers are 605 constrained to using only those addresses permitted by RFC 5321. The 606 local parts of those addresses MAY be made up of any ASCII characters 607 except the control characters that 5321 prohibits, although some of 608 them MUST be quoted as specified there. It is notable in an 609 internationalization context that there is a long history on some 610 systems of using overstruck ASCII characters (a character, a 611 backspace, and another character) within a quoted string to 612 approximate non-ASCII characters. This form of internationalization 613 was permitted by RFC 821 but is prohibited by RFC 5321 because it 614 requires a backspace character (a prohibited C0 control). The 615 practice SHOULD be phased out as this extension becomes widely 616 deployed but backward-compatibility considerations may require that 617 it continue to be recognized. 619 For the particular case of EAI mailbox names, special attention must 620 be paid to Unicode normalization, in part because Unicode strings may 621 be normalized by other processes independent of what a mail protocol 622 specifies (this is exactly analogous to what may happen with quoting 623 and dequoting in traditional addresses). Consequently, the following 624 principles are offered as advice to those who are selecting names for 625 mailboxes: 627 o In general, it is wise to support addresses in Normalized form, 628 using either Normalization Form NFC and, except in unusual 629 circumstances, NFKC. 631 o It may be wise to support other forms of the same local-part 632 string, either as aliases or by normalization of strings reaching 633 the delivery server, in the event that the sender does not send 634 the strings in normalized form. 636 o Stated differently and in more specific terms, the rules of the 637 protocol for local-part strings essentially provide that: 639 * Unnormalized strings are valid, but sufficiently bad practice 640 that they may not work reliably on a global basis. 642 * C0 (and presumably C1) controls (see The Unicode Standard) are 643 prohibited, the first in RFC 5321 and the second by an obvious 644 extension from it. 646 * Other kinds of punctuation, spaces, etc., are risky practice. 647 Perhaps they will work, and SMTP receiver code is required to 648 handle them, but creating dependencies on them in mailbox names 649 that are chosen is usually a bad practice and may lead to 650 interoperability problems. 652 11. Additional Issues 654 This section identifies issues that are not covered, or not covered 655 comprehensively, as part of this set of specifications, but that will 656 require ongoing review as part of deployment of email address and 657 header internationalization. 659 11.1. Impact on URIs and IRIs 661 The mailto: schema defined in [RFC2368] and discussed in the 662 Internationalized Resource Identifier (IRI) specification [RFC3987] 663 may need to be modified when this work is completed and standardized. 664 In particular, providing an alternate address as part of a mailto: 665 URI may require some fairly careful work on the syntax of that URI. 667 11.2. Interaction with Delivery Notifications 669 The advent of UTF8SMTPbis will make necessary consideration of the 670 interaction with delivery notification mechanisms, including the SMTP 671 extension for requesting delivery notifications [RFC3461], and the 672 format of delivery notifications [RFC3464]. These issues are 673 discussed in a forthcoming document that will update those RFCs as 674 needed [EAI-DSN]. 675 [[anchor25: Note in draft: we could just eliminate this section and 676 add the DSN document to the "Document Plan" in Section 6. 677 Opinions?]] 679 11.3. Use of Email Addresses as Identifiers 681 There are a number of places in contemporary Internet usage in which 682 email addresses are used as identifiers for individuals, including as 683 identifiers to Web servers supporting some electronic commerce sites. 684 These documents do not address those uses, but it is reasonable to 685 expect that some difficulties will be encountered when 686 internationalized addresses are first used in those contexts, many of 687 which cannot even handle the full range of addresses permitted today. 689 11.4. Encoded Words, Signed Messages, and Downgrading 691 One particular characteristic of the email format is its persistency: 692 MUAs are expected to handle messages that were originally sent 693 decades ago and not just those delivered seconds ago. As such, MUAs 694 and mail filtering software, such as that specified in Sieve 695 [RFC5228], will need to continue to accept and decode header fields 696 that use the "encoded word" mechanism [RFC2047] to accommodate non- 697 ASCII characters in some header fields. While extensions to both 698 POP3 and IMAP have been proposed to enable automatic EAI-upgrade -- 699 including RFC 2047 decoding -- of messages by the POP3 or IMAP 700 server, there are message structures and MIME content-types for which 701 that cannot be done or where the change would have unacceptable side 702 effects. 704 For example, message parts that are cryptographically signed, using 705 e.g., S/MIME [RFC3851] or Pretty Good Privacy (PGP) [RFC3156], cannot 706 be upgraded from the RFC 2047 form to normal UTF-8 characters without 707 breaking the signature. Similarly, message parts that are encrypted 708 may contain, when decrypted, header fields that use the RFC 2047 709 encoding; such messages cannot be 'fully' upgraded without access to 710 cryptographic keys. 712 Similar issues may arise if signed messages are downgraded in transit 713 ??? and then an attempt is made to upgrade them to the original form 714 and then verify the signatures. Even the very subtle changes that 715 may result from algorithms to downgrade and then upgrade again may be 716 sufficient to invalidate the signatures if they impact either the 717 primary or MIME bodypart headers. When signatures are present, 718 downgrading must be performed with extreme care if at all. 720 11.5. LMTP 722 LMTP [RFC2033] may be used as the final delivery agent. In such 723 cases, LMTP may be arranged to deliver the mail to the mail store. 724 The mail store may not have UTF8SMTPbis capability. LMTP need to be 725 updated to deal with these situations. 727 11.6. SMTP Service Extension for DSNs 729 The existing Draft Standard Delivery status notifications 730 (DSNs)[RFC3461] specification is limited to ASCII text in the machine 731 readable portions of the protocol. "International Delivery and 732 Disposition Notifications" [EAI-DSN] adds a new address type for 733 international email addresses so an original recipient address with 734 non-ASCII characters can be correctly preserved even after 735 downgrading. If an SMTP server advertises both the UTF8SMTPbis and 736 the DSN extension, that server MUST implement internationalized DSNs 737 [EAI-DSN] including support for the ORCPT parameter. 739 11.7. Other Uses of Local Parts 741 Local parts are sometimes used to construct domain labels, e.g., the 742 local part "user" in the address user@domain.example could be 743 converted into a vanity host user.domain.example with its Web space 744 at and the catchall addresses 745 any.thing.goes@user.domain.example. 747 Such schemes are obviously limited by, among other things, the SMTP 748 rules for domain names, and will not work without further 749 restrictions for other local parts such as the 750 specified in [RFC5335]. Whether this issue is relevant to these 751 specifications is an open question. It may be simply another case of 752 the considerable flexibility accorded to delivery MTAs in determining 753 the mailbox names they will accept and how they are interpreted. 755 11.8. Non-Standard Encapsulation Formats 757 Some applications use formats similar to the application/mbox format 758 defined in [RFC4155] instead of the message/digest RFC 2046, Section 759 5.1.5 [RFC2046] form to transfer multiple messages as single units. 760 Insofar as such applications assume that all stored messages use the 761 message/rfc822 RFC 2046, Section 5.2.1 [RFC2046] format with US-ASCII 762 headers, they are not ready for the extensions specified in this 763 series of documents and special measures may be needed to properly 764 detect and process them. 766 12. Experimental Targets 768 [[anchor31: Note in draft: this section is left in this draft for 769 convenience in review. It will be removed with -01.]] 771 In addition to the simple question of whether the model outlined here 772 can be made to work in a satisfactory way for upgraded systems and 773 provide adequate protection for un-upgraded ones, we expect that 774 actually working with the systems will provide answers to two 775 additional questions: what restrictions such as character lists or 776 normalization should be placed, if any, on the characters that are 777 permitted to be used in address local-parts and how useful, in 778 practice, will downgrading turn out to be given whatever restrictions 779 and constraints that must be placed upon it. 781 13. IANA Considerations 783 This overview description and framework document does not contemplate 784 any IANA registrations or other actions. Some of the documents in 785 the group have their own IANA considerations sections and 786 requirements. 788 14. Security Considerations 790 Any expansion of permitted characters and encoding forms in email 791 addresses raises some risks. There have been discussions on so 792 called "IDN-spoofing" or "IDN homograph attacks". These attacks 793 allow an attacker (or "phisher") to spoof the domain or URLs of 794 businesses. The same kind of attack is also possible on the local 795 part of internationalized email addresses. It should be noted that 796 the proposed fix involving forcing all displayed elements into 797 normalized lower-case works for domain names in URLs, but not email 798 local parts since those are case sensitive. 800 Since email addresses are often transcribed from business cards and 801 notes on paper, they are subject to problems arising from confusable 802 characters (see [RFC4690]). These problems are somewhat reduced if 803 the domain associated with the mailbox is unambiguous and supports a 804 relatively small number of mailboxes whose names follow local system 805 conventions. They are increased with very large mail systems in 806 which users can freely select their own addresses. 808 The internationalization of email addresses and headers must not 809 leave the Internet less secure than it is without the required 810 extensions. The requirements and mechanisms documented in this set 811 of specifications do not, in general, raise any new security issues. 813 They do require a review of issues associated with confusable 814 characters -- a topic that is being explored thoroughly elsewhere 815 (see, e.g., [RFC4690]) -- and, potentially, some issues with UTF-8 816 normalization, discussed in [RFC3629], and other transformations. 817 Normalization and other issues associated with transformations and 818 standard forms are also part of the subject of ongoing work discussed 819 in [RFC5198], in [RFC5893] and elsewhere. 821 Some issues specifically related to internationalized addresses and 822 headers are discussed in more detail in the other documents in this 823 set. However, in particular, caution should be taken that any 824 "downgrading" mechanism, or use of downgraded addresses, does not 825 inappropriately assume authenticated bindings between the 826 internationalized and ASCII addresses. Expecting and most or all 827 such transformations prior to final delivery be done by systems that 828 are presumed to be under the administrative control of the sending 829 user ameliorates the potential problem somewhat as compared to what 830 it would be if the relationships were changed in transit. 832 The new UTF-8 header and message formats might also raise, or 833 aggravate, another known issue. If the model creates new forms of an 834 'invalid' or 'malformed' message, then a new email attack is created: 835 in an effort to be robust, some or most agents will accept such 836 message and interpret them as if they were well-formed. If a filter 837 interprets such a message differently than the final MUA, then it may 838 be possible to create a message that appears acceptable under the 839 filter's interpretation but should be rejected under the 840 interpretation given to it by the final MUA. Such attacks already 841 exist for existing messages and encoding layers, e.g., invalid MIME 842 syntax, invalid HTML markup, and invalid coding of particular image 843 types. 845 In addition, email addresses are used in many contexts other than 846 sending mail, such as for identifiers under various circumstances 847 (see Section 11.3). Each of those contexts will need to be 848 evaluated, in turn, to determine whether the use of non-ASCII forms 849 is appropriate and what particular issues they raise. 851 This work will clearly affect any systems or mechanisms that are 852 dependent on digital signatures or similar integrity protection for 853 mail headers (see also the discussion in Section 11.4). Many 854 conventional uses of PGP and S/MIME are not affected since they are 855 used to sign body parts but not headers. On the other hand, the 856 developing work on domain keys identified mail (DKIM [RFC5863]) will 857 eventually need to consider this work and vice versa: while this 858 specification does not address or solve the issues raised by DKIM and 859 other signed header mechanisms, the issues will have to be 860 coordinated and resolved eventually if the two sets of protocols are 861 to co-exist. In addition, to the degree to which email addresses 862 appear in PKI (Public Key Infrastructure) certificates, standards 863 addressing such certificates will need to be upgraded to address 864 these internationalized addresses. Those upgrades will need to 865 address questions of spoofing by look-alikes of the addresses 866 themselves. 868 15. Acknowledgements 870 [[anchor34: To be upgraded in -01 to point back to 4952]] 872 This document, and the related ones, were originally derived from 873 documents by John Klensin and the JET group [Klensin-emailaddr], 874 [JET-IMA]. The work drew inspiration from discussions on the "IMAA" 875 mailing list, sponsored by the Internet Mail Consortium and 876 especially from an early document by Paul Hoffman and Adam Costello 877 [Hoffman-IMAA] that attempted to define an MUA-only solution to the 878 address internationalization problem. 880 More recent documents have benefited from considerable discussion 881 within the IETF EAI Working Group and especially from suggestions and 882 text provided by Martin Duerst, Frank Ellermann, Philip Guenther, 883 Kari Hurtta, and Alexey Melnikov, and from extended discussions among 884 the editors and authors of the core documents cited in Section 6: 885 Harald Alvestrand, Kazunori Fujiwara, Chris Newman, Pete Resnick, 886 Jiankang Yao, Jeff Yeh, and Yoshiro Yoneya. 888 Additional comments received during IETF Last Call, including those 889 from Paul Hoffman and Robert Sparks, were helpful in making the 890 document more clear and comprehensive. 892 16. References 894 16.1. Normative References 896 [ASCII] American National Standards Institute (formerly 897 United States of America Standards Institute), 898 "USA Code for Information Interchange", 899 ANSI X3.4-1968, 1968. 901 ANSI X3.4-1968 has been replaced by newer 902 versions with slight modifications, but the 1968 903 version remains definitive for the Internet. 905 [RFC1652] Klensin, J., Freed, N., Rose, M., Stefferud, E., 906 and D. Crocker, "SMTP Service Extension for 907 8bit-MIMEtransport", RFC 1652, July 1994. 909 [RFC2119] Bradner, S., "Key words for use in RFCs to 910 Indicate Requirement Levels'", RFC 2119, BCP 14, 911 March 1997. 913 [RFC3629] Yergeau, F., "UTF-8, a transformation format of 914 ISO 10646", STD 63, RFC 3629, November 2003. 916 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", 917 RFC 5321, October 2008. 919 [RFC5890] Klensin, J., "Internationalized Domain Names for 920 Applications (IDNA): Definitions and Document 921 Framework", RFC 5890, June 2010. 923 16.2. Informative References 925 [EAI-DSN] Newman, C., "UTF-8 Delivery and Disposition 926 Notification", Work in Progress, January 2007. 928 [EAI-Mailinglist] Gellens, R., "Mailing Lists and 929 Internationalized Email Addresses", March 2010, 930 . 933 [Hoffman-IMAA] Hoffman, P. and A. Costello, "Internationalizing 934 Mail Addresses in Applications (IMAA)", Work 935 in Progress, October 2003. 937 [JET-IMA] Yao, J. and J. Yeh, "Internationalized eMail 938 Address (IMA)", Work in Progress, June 2005. 940 [Klensin-emailaddr] Klensin, J., "Internationalization of Email 941 Addresses", Work in Progress, July 2005. 943 [RFC2033] Myers, J., "Local Mail Transfer Protocol", 944 RFC 2033, October 1996. 946 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose 947 Internet Mail Extensions (MIME) Part One: Format 948 of Internet Message Bodies", RFC 2045, 949 November 1996. 951 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose 952 Internet Mail Extensions (MIME) Part Two: Media 953 Types", RFC 2046, November 1996. 955 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail 956 Extensions) Part Three: Message Header 957 Extensions for Non-ASCII Text", RFC 2047, 958 November 1996. 960 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value 961 and Encoded Word Extensions: 962 Character Sets, Languages, and Continuations", 963 RFC 2231, November 1997. 965 [RFC2368] Hoffman, P., Masinter, L., and J. Zawinski, "The 966 mailto URL scheme", RFC 2368, July 1998. 968 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. 969 Roessler, "MIME Security with OpenPGP", 970 RFC 3156, August 2001. 972 [RFC3461] Moore, K., "Simple Mail Transfer Protocol (SMTP) 973 Service Extension for Delivery Status 974 Notifications (DSNs)", RFC 3461, January 2003. 976 [RFC3464] Moore, K. and G. Vaudreuil, "An Extensible 977 Message Format for Delivery Status 978 Notifications", RFC 3464, January 2003. 980 [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail 981 Extensions (S/MIME) Version 3.1 Message 982 Specification", RFC 3851, July 2004. 984 [RFC3987] Duerst, M. and M. Suignard, "Internationalized 985 Resource Identifiers (IRIs)", RFC 3987, 986 January 2005. 988 [RFC4155] Hall, E., "The application/mbox Media Type", 989 RFC 4155, September 2005. 991 [RFC4409] Gellens, R. and J. Klensin, "Message Submission 992 for Mail", RFC 4409, April 2006. 994 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, 995 "Review and Recommendations for 996 Internationalized Domain Names (IDNs)", 997 RFC 4690, September 2006. 999 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework 1000 for Internationalized Email", RFC 4952, 1001 July 2007. 1003 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format 1004 for Network Interchange", RFC 5198, March 2008. 1006 [RFC5228] Guenther, P. and T. Showalter, "Sieve: An Email 1007 Filtering Language", RFC 5228, January 2008. 1009 [RFC5322] Resnick, P., Ed., "Internet Message Format", 1010 RFC 5322, October 2008. 1012 [RFC5335] Abel, Y., "Internationalized Email Headers", 1013 RFC 5335, September 2008. 1015 [RFC5336] Yao, J. and W. Mao, "SMTP Extension for 1016 Internationalized Email Addresses", RFC 5336, 1017 September 2008. 1019 [RFC5337] Newman, C. and A. Melnikov, "Internationalized 1020 Delivery Status and Disposition Notifications", 1021 RFC 5337, September 2008. 1023 [RFC5504] Fujiwara, K. and Y. Yoneya, "Downgrading 1024 Mechanism for Email Address 1025 Internationalization", RFC 5504, March 2009. 1027 [RFC5721] Gellens, R. and C. Newman, "POP3 Support for 1028 UTF-8", RFC 5721, February 2010. 1030 [RFC5738] Resnick, P. and C. Newman, "IMAP Support for 1031 UTF-8", RFC 5738, March 2010. 1033 [RFC5825] Fujiwara, K. and B. Leiba, "Displaying 1034 Downgraded Messages for Email Address 1035 Internationalization", RFC 5825, April 2010. 1037 [RFC5863] Hansen, T., Siegel, E., Hallam-Baker, P., and D. 1038 Crocker, "DomainKeys Identified Mail (DKIM) 1039 Development, Deployment, and Operations", 1040 RFC 5863, May 2010. 1042 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left 1043 Scripts for Internationalized Domain Names for 1044 Applications (IDNA)", RFC 5893, June 2010. 1046 Authors' Addresses 1048 John C Klensin 1049 1770 Massachusetts Ave, #322 1050 Cambridge, MA 02140 1051 USA 1053 Phone: +1 617 491 5735 1054 EMail: john-ietf@jck.com 1055 YangWoo Ko 1056 ICU 1057 119 Munjiro 1058 Yuseong-gu, Daejeon 305-732 1059 Republic of Korea 1061 EMail: yw@mrko.pe.kr