idnits 2.17.1 draft-ietf-eai-frmwrk-4952bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 3, 2010) is 5039 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'Hoffman-IMAA' is defined on line 906, but no explicit reference was found in the text == Unused Reference: 'JET-IMA' is defined on line 910, but no explicit reference was found in the text == Unused Reference: 'Klensin-emailaddr' is defined on line 913, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1652 (Obsoleted by RFC 6152) -- Obsolete informational reference (is this intentional?): RFC 2368 (Obsoleted by RFC 6068) -- Obsolete informational reference (is this intentional?): RFC 3851 (Obsoleted by RFC 5751) -- Obsolete informational reference (is this intentional?): RFC 4409 (Obsoleted by RFC 6409) -- Obsolete informational reference (is this intentional?): RFC 4952 (Obsoleted by RFC 6530) -- Obsolete informational reference (is this intentional?): RFC 5335 (Obsoleted by RFC 6532) -- Obsolete informational reference (is this intentional?): RFC 5336 (Obsoleted by RFC 6531) -- Obsolete informational reference (is this intentional?): RFC 5337 (Obsoleted by RFC 6533) -- Obsolete informational reference (is this intentional?): RFC 5504 (Obsoleted by RFC 6530) -- Obsolete informational reference (is this intentional?): RFC 5721 (Obsoleted by RFC 6856) -- Obsolete informational reference (is this intentional?): RFC 5738 (Obsoleted by RFC 6855) -- Obsolete informational reference (is this intentional?): RFC 5825 (Obsoleted by RFC 6530) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Email Address Internationalization J. Klensin 3 (EAI) 4 Internet-Draft Y. Ko 5 Obsoletes: RFC4952 ICU 6 (if approved) July 3, 2010 7 Intended status: Informational 8 Expires: January 4, 2011 10 Overview and Framework for Internationalized Email 11 draft-ietf-eai-frmwrk-4952bis-01 13 Abstract 15 Full use of electronic mail throughout the world requires that, 16 subject to other constraints, people be able to use close variations 17 on their own names, written correctly in their own languages and 18 scripts, as mailbox names in email addresses. This document 19 introduces a series of specifications that define mechanisms and 20 protocol extensions needed to fully support internationalized email 21 addresses. These changes include an SMTP extension and extension of 22 email header syntax to accommodate UTF-8 data. The document set also 23 includes discussion of key assumptions and issues in deploying fully 24 internationalized email. This document is an update of RFC 4952 that 25 reflects additional issues identified since that document was 26 published. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on January 4, 2011. 45 Copyright Notice 47 Copyright (c) 2010 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 This document may contain material from IETF Documents or IETF 61 Contributions published or made publicly available before November 62 10, 2008. The person(s) controlling the copyright in some of this 63 material may not have granted the IETF Trust the right to allow 64 modifications of such material outside the IETF Standards Process. 65 Without obtaining an adequate license from the person(s) controlling 66 the copyright in such materials, this document may not be modified 67 outside the IETF Standards Process, and derivative works of it may 68 not be created outside the IETF Standards Process, except to format 69 it for publication as an RFC or to translate it into languages other 70 than English. 72 Table of Contents 74 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 2. Role of This Specification . . . . . . . . . . . . . . . . . . 4 76 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5 77 4. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 78 4.1. Mail User and Mail Transfer Agents . . . . . . . . . . . . 6 79 4.2. Address Character Sets . . . . . . . . . . . . . . . . . . 7 80 4.3. User Types . . . . . . . . . . . . . . . . . . . . . . . . 7 81 4.4. Messages . . . . . . . . . . . . . . . . . . . . . . . . . 7 82 4.5. Mailing Lists . . . . . . . . . . . . . . . . . . . . . . 8 83 4.6. Undeliverable Messages and Notification . . . . . . . . . 8 84 5. Overview of the Approach . . . . . . . . . . . . . . . . . . . 8 85 6. Document Plan . . . . . . . . . . . . . . . . . . . . . . . . 9 86 7. Overview of Protocol Extensions and Changes . . . . . . . . . 9 87 7.1. SMTP Extension for Internationalized Email Address . . . . 9 88 7.2. Transmission of Email Header Fields in UTF-8 Encoding . . 10 89 8. Downgrading before and after SMTP Transactions . . . . . . . . 11 90 8.1. Downgrading before or during Message Submission . . . . . 12 91 8.2. Downgrading or Other Processing After Final SMTP 92 Delivery . . . . . . . . . . . . . . . . . . . . . . . . . 13 93 9. Downgrading in Transit . . . . . . . . . . . . . . . . . . . . 13 94 10. User Interface and Configuration Issues . . . . . . . . . . . 13 95 10.1. Choices of Mailbox Names and Unicode Normalization . . . . 14 96 11. Additional Issues . . . . . . . . . . . . . . . . . . . . . . 15 97 11.1. Impact on URIs and IRIs . . . . . . . . . . . . . . . . . 15 98 11.2. Interaction with Delivery Notifications . . . . . . . . . 15 99 11.3. Use of Email Addresses as Identifiers . . . . . . . . . . 16 100 11.4. Encoded Words, Signed Messages, and Downgrading . . . . . 16 101 11.5. LMTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 102 11.6. Other Uses of Local Parts . . . . . . . . . . . . . . . . 16 103 11.7. Non-Standard Encapsulation Formats . . . . . . . . . . . . 17 104 12. Experimental Targets . . . . . . . . . . . . . . . . . . . . . 17 105 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 106 14. Security Considerations . . . . . . . . . . . . . . . . . . . 17 107 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 108 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 109 16.1. Normative References . . . . . . . . . . . . . . . . . . . 19 110 16.2. Informative References . . . . . . . . . . . . . . . . . . 20 111 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 22 112 A.1. Changes between -00 and -01 . . . . . . . . . . . . . . . 23 114 1. Introduction 116 [[anchor1: Note to EAI WG: these two initial drafts are intended to 117 initiate discussion on what should, and should not, be in the 118 Framework document and how we want those topics covered. As such, it 119 is more of an intermediate draft between RFC 4952 and the first draft 120 of 4952bis that could be a Last Call candidate. If we are going to 121 keep the rather aggressive schedule we agreed to in the charter, we 122 need to have enough discussion on critical-path points that a 123 revision suitable (at least) for final review prior to Last Call can 124 be posted before the 12 July I-D cutoff. For that to happen, we 125 should have enough discussion to start determining consensus within 126 the next ten days. So, focused comments and soon, please.]] 128 In order to use internationalized email addresses, we need to 129 internationalize both the domain part and the local part of email 130 addresses. The domain part of email addresses is already 131 internationalized [RFC5890], while the local part is not. Without 132 the extensions specified in this document, the mailbox name is 133 restricted to a subset of 7-bit ASCII [RFC5321]. Though MIME 134 [RFC2045] enables the transport of non-ASCII data, it does not 135 provide a mechanism for internationalized email addresses. In RFC 136 2047 [RFC2047], MIME defines an encoding mechanism for some specific 137 message header fields to accommodate non-ASCII data. However, it 138 does not permit the use of email addresses that include non-ASCII 139 characters. Without the extensions defined here, or some equivalent 140 set, the only way to incorporate non-ASCII characters in any part of 141 email addresses is to use RFC 2047 coding to embed them in what RFC 142 5322 [RFC5322] calls the "display name" (known as a "name phrase" or 143 by other terms elsewhere) of the relevant header fields. Information 144 coded into the display name is invisible in the message envelope and, 145 for many purposes, is not part of the address at all. 147 This document is an update of RFC 4952 [RFC4952] that reflects 148 additional issues, shared terminology, and some architectural changes 149 identified since that document was published. 151 The pronouns "he" and "she" are used interchangeably to indicate a 152 human of indeterminate gender. 154 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", 155 and "MAY" in this document are to be interpreted as described in RFC 156 2119 [RFC2119]. 158 2. Role of This Specification 160 This document presents the overview and framework for an approach to 161 the next stage of email internationalization. This new stage 162 requires not only internationalization of addresses and header 163 fields, but also associated transport and delivery models. A prior 164 version of this specification, RFC 4952 [RFC4952], also provided an 165 introduction to a series of experimental protocols [RFC5335] 166 [RFC5336] [RFC5337] [RFC5504] [RFC5721] [RFC5738] [RFC5825]. 167 [[anchor2: Note in Draft: Is 5825 still relevant, or is a victim of 168 the "no in-transit downgrade" decision.??]] 169 This revised document provides overview and conceptual information 170 for the standards-track successors of those protocols. Details of 171 the documents and the relationships among them appear in Section 6. 173 Taken together, these specifications provide the details for a way to 174 implement and support internationalized email. The document itself 175 describes how the various elements of email internationalization fit 176 together and the relationships among the [[anchor3: ??? provides a 177 roadmap for navigating the]] various documents are involved. 179 3. Problem Statement 181 Internationalizing Domain Names in Applications (IDNA) [RFC5890] 182 permits internationalized domain names, but deployment has not yet 183 reached most users. One of the reasons for this is that we do not 184 yet have fully internationalized naming schemes. Domain names are 185 just one of the various names and identifiers that are required to be 186 internationalized. In many contexts, until more of those identifiers 187 are internationalized, internationalized domain names alone have 188 little value. 190 Email addresses are prime examples of why it is not good enough to 191 just internationalize the domain name. As most of us have learned 192 from experience, users strongly prefer email addresses that resemble 193 names or initials to those involving seemingly meaningless strings of 194 letters or numbers. Unless the entire email address can use familiar 195 characters and formats, users will perceive email as being culturally 196 unfriendly. If the names and initials used in email addresses can be 197 expressed in the native languages and writing systems of the users, 198 the Internet will be perceived as more natural, especially by those 199 whose native language is not written in a subset of a Roman-derived 200 script. 202 Internationalization of email addresses is not merely a matter of 203 changing the SMTP envelope; or of modifying the From, To, and Cc 204 header fields; or of permitting upgraded Mail User Agents (MUAs) to 205 decode a special coding and respond by displaying local characters. 206 To be perceived as usable, the addresses must be internationalized 207 and handled consistently in all of the contexts in which they occur. 208 This requirement has far-reaching implications: collections of 209 patches and workarounds are not adequate. Even if they were 210 adequate, a workaround-based approach may result in an assortment of 211 implementations with different sets of patches and workarounds having 212 been applied with consequent user confusion about what is actually 213 usable and supported. Instead, we need to build a fully 214 internationalized email environment, focusing on permitting efficient 215 communication among those who share a language or other community. 216 That, in turn, implies changes to the mail header environment to 217 permit the full range of Unicode characters where that makes sense, 218 an SMTP Extension to permit UTF-8 [RFC3629] mail addressing and 219 delivery of those extended header fields, and (finally) a requirement 220 for support of the 8BITMIME SMTP Extension [RFC1652] so that all of 221 these can be transported through the mail system without having to 222 overcome the limitation that header fields do not have content- 223 transfer-encodings. 225 4. Terminology 227 This document assumes a reasonable understanding of the protocols and 228 terminology of the core email standards as documented in [RFC5321] 229 and [RFC5322]. 231 4.1. Mail User and Mail Transfer Agents 233 Much of the description in this document depends on the abstractions 234 of "Mail Transfer Agent" ("MTA") and "Mail User Agent" ("MUA"). 235 However, it is important to understand that those terms and the 236 underlying concepts postdate the design of the Internet's email 237 architecture and the application of the "protocols on the wire" 238 principle to it. That email architecture, as it has evolved, and 239 that "wire" principle have prevented any strong and standardized 240 distinctions about how MTAs and MUAs interact on a given origin or 241 destination host (or even whether they are separate). 243 However, the term "final delivery MTA" is used in this document in a 244 fashion equivalent to the term "delivery system" or "final delivery 245 system" of RFC 5321. This is the SMTP server that controls the 246 format of the local parts of addresses and is permitted to inspect 247 and interpret them. It receives messages from the network for 248 delivery to mailboxes or for other local processing, including any 249 forwarding or aliasing that changes envelope addresses, rather than 250 relaying. From the perspective of the network, any local delivery 251 arrangements such as saving to a message store, handoff to specific 252 message delivery programs or agents, and mechanisms for retrieving 253 messages are all "behind" the final delivery MTA and hence are not 254 part of the SMTP transport or delivery process. 256 4.2. Address Character Sets 258 In this document, an address is "all-ASCII", or just an "ASCII 259 address", if every character in the address is in the ASCII character 260 repertoire [ASCII]; an address is "non-ASCII", or an "i18n-address", 261 if any character is not in the ASCII character repertoire. Such 262 addresses may be restricted in other ways, but those restrictions are 263 not relevant to this definition. The term "all-ASCII" is also 264 applied to other protocol elements when the distinction is important, 265 with "non-ASCII" or "internationalized" as its opposite. 267 The umbrella term to describe the email address internationalization 268 specified by this document and its companion documents is 269 "UTF8SMTPbis". 270 [[anchor7: Note in Draft: Keyword to be changed before publication.]] 271 For example, an address permitted by this specification is referred 272 to as a "UTF8SMTPbis (compliant) address". 274 Please note that, according to the definitions given here, the set of 275 all "all-ASCII" addresses and the set of all "non-ASCII" addresses 276 are mutually exclusive. The set of all addresses permitted when 277 UTF8SMTPbis appears is the union of these two sets. 279 4.3. User Types 281 An "ASCII user" (i) exclusively uses email addresses that contain 282 ASCII characters only, and (ii) cannot generate recipient addresses 283 that contain non-ASCII characters. 285 An "i18mail user" has one or more non-ASCII email addresses. Such a 286 user may have ASCII addresses too; if the user has more than one 287 email account and a corresponding address, or more than one alias for 288 the same address, he or she has some method to choose which address 289 to use on outgoing email. Note that under this definition, it is not 290 possible to tell from an ASCII address if the owner of that address 291 is an i18mail user or not. (A non-ASCII address implies a belief 292 that the owner of that address is an i18mail user.) There is no such 293 thing as an "i18mail message"; the term applies only to users and 294 their agents and capabilities. 296 4.4. Messages 298 A "message" is sent from one user (sender) using a particular email 299 address to one or more other recipient email addresses (often 300 referred to just as "users" or "recipient users"). 302 A conventional message is one that does not use any extension defined 303 in the SMTP extension document [RFC5336] or in the UTF8header 304 specification [RFC5335], and is strictly conformant to RFC 5322 305 [RFC5322]. 307 An internationalized message is a message utilizing one or more of 308 the extensions defined in this specification or in the UTF8header 309 specification [RFC5335], so that it is no longer conformant to the 310 RFC 5322 specification of a message. 312 4.5. Mailing Lists 314 A "mailing list" is a mechanism whereby a message may be distributed 315 to multiple recipients by sending it to one recipient address. An 316 agent (typically not a human being) at that single address then 317 causes the message to be redistributed to the target recipients. 318 This agent sets the envelope return address of the redistributed 319 message to a different address from that of the original single 320 recipient message. Using a different envelope return address 321 (reverse-path) causes error (and other automatically generated) 322 messages to go to an error handling address. 324 Special provisions for managing mailing lists that might contain non- 325 ASCII addresses are discussed in a document that is specific to that 326 topic [EAI-Mailinglist]. 328 4.6. Undeliverable Messages and Notification 330 As specified in RFC 5321, a message that is undeliverable for some 331 reason is expected to result in notification to the sender. This can 332 occur in either of two ways. One, typically called "Rejection", 333 occurs when an SMTP server returns a reply code indicating a fatal 334 error (a "5yz" code) or persistently returns a temporary failure 335 error (a "4yz" code). The other involves accepting the message 336 during SMTP processing and then generating a message to the sender, 337 typically known as a "Non-delivery Notification" or "NDN". Current 338 practice often favors rejection over NDNs because of the reduced 339 likelihood that the generation of NDNs will be used as a spamming 340 technique. The latter, NDN, case is unavoidable if an intermediate 341 MTA accepts a message that is then rejected by the next-hop server. 343 5. Overview of the Approach 345 This set of specifications changes both SMTP and the character 346 encoding of email message headers to permit non-ASCII characters to 347 be represented directly. Each important component of the work is 348 described in a separate document. The document set, whose members 349 are described in the next section, also contains informational 350 documents whose purpose is to provide implementation suggestions and 351 guidance for the protocols. 353 6. Document Plan 355 In addition to this document, the following documents make up this 356 specification and provide advice and context for it. 358 [[anchor12: ... Note to WG: if we actually include a list here, the 359 result will be that this document can be approved, but not published, 360 until those documents on the list are complete. I'm inclined to list 361 the SMTP extension and headers documents only and hand-wave about the 362 rest, but we need to discuss. Versions -00 and -01 simply refer to 363 the current Experimental documents --Editor.]] 365 o SMTP extensions. This document [RFC5336] provides an SMTP 366 extension (as provided for in RFC 5321) for internationalized 367 addresses. 369 o Email message headers in UTF-8. This document [RFC5335] 370 essentially updates RFC 5322 to permit some information in email 371 message headers to be expressed directly by Unicode characters 372 encoded in UTF-8 when the SMTP extension described above is used. 373 This document, possibly with one or more supplemental ones, will 374 also need to address the interactions with MIME, including 375 relationships between UTF8SMTPbis and internal MIME headers and 376 content types. 378 o Extensions to the IMAP protocol to support internationalized 379 message headers [RFC5738]. 381 o Parallel extensions to the POP protocol [RFC5721]. 383 o Description of internationalization changes for delivery 384 notifications (DSNs) [RFC5337]. 386 7. Overview of Protocol Extensions and Changes 388 7.1. SMTP Extension for Internationalized Email Address 390 An SMTP extension, "UTF8SMTPbis" is specified as follows: 392 o Permits the use of UTF-8 strings in email addresses, both local 393 parts and domain names. 395 o Permits the selective use of UTF-8 strings in email message 396 headers (see Section 7.2). 398 o Requires that the server advertise the 8BITMIME extension 399 [RFC1652] and that the client support 8-bit transmission so that 400 header information can be transmitted without using a special 401 content-transfer-encoding. 403 Some general principles affect the development decisions underlying 404 this work. 406 1. Email addresses enter subsystems (such as a user interface) that 407 may perform charset conversions or other encoding changes. When 408 the left hand side of the address includes characters outside the 409 US-ASCII character repertoire, use of punycode on the right hand 410 side is discouraged to promote consistent processing of 411 characters throughout the address. 413 2. An SMTP relay must 415 * Either recognize the format explicitly, agreeing to do so via 416 an ESMTP option, or 418 * Reject the message or, if necessary, return a non-delivery 419 notification message, so that the sender can make another 420 plan. 422 3. If the message cannot be forwarded because the next-hop system 423 cannot accept the extension it MUST be rejected or a non-delivery 424 message generated and sent. 426 4. In the interest of interoperability, charsets other than UTF-8 427 are prohibited in mail addresses and message headers being 428 transmitted over the Internet. There is no practical way to 429 identify multiple charsets properly with an extension similar to 430 this without introducing great complexity. 432 Conformance to the group of standards specified here for email 433 transport and delivery requires implementation of the SMTP Extension 434 specification, including recognition of the keywords associated with 435 alternate addresses, and the UTF-8 Header specification. If the 436 system implements IMAP or POP, it MUST conform to the i18n IMAP or 437 POP specifications respectively. 439 7.2. Transmission of Email Header Fields in UTF-8 Encoding 441 There are many places in MUAs or in a user presentation in which 442 email addresses or domain names appear. Examples include the 443 conventional From, To, or Cc header fields; Message-ID and 444 In-Reply-To header fields that normally contain domain names (but 445 that may be a special case); and in message bodies. Each of these 446 must be examined from an internationalization perspective. The user 447 will expect to see mailbox and domain names in local characters, and 448 to see them consistently. If non-obvious encodings, such as 449 protocol-specific ASCII-Compatible Encoding (ACE) variants, are used, 450 the user will inevitably, if only occasionally, see them rather than 451 "native" characters and will find that discomfiting or astonishing. 452 Similarly, if different codings are used for mail transport and 453 message bodies, the user is particularly likely to be surprised, if 454 only as a consequence of the long-established "things leak" 455 principle. The only practical way to avoid these sources of 456 discomfort, in both the medium and the longer term, is to have the 457 encodings used in transport be as similar to the encodings used in 458 message headers and message bodies as possible. 460 When email local parts are internationalized, it seems clear that 461 they should be accompanied by arrangements for the message headers to 462 be in the fully internationalized form. That form should use UTF-8 463 rather than ASCII as the base character set for the contents of 464 header fields (protocol elements such as the header field names 465 themselves will remain entirely in ASCII). For transition purposes 466 and compatibility with legacy systems, this can done by extending the 467 encoding models of [RFC2045] and [RFC2231]. However, the target is 468 fully internationalized message headers, as discussed in [RFC5335] 469 and not an extended and painful transition. 471 8. Downgrading before and after SMTP Transactions 473 An important issue with these extensions is how to handle 474 interactions between systems that support non-ASCII addresses and 475 legacy systems that expect ASCII. There is, of course, no problem 476 with ASCII-only systems sending to those that can handle 477 internationalized forms because the ASCII forms are just a proper 478 subset. But, when systems that support these extensions send mail, 479 they may include non-ASCII addresses for senders, receivers, or both 480 and might also provide non-ASCII header information other than 481 addresses. If the extension is not supported by the first-hop system 482 (SMTP server accessed by the Submission server acting as an SMTP 483 client), message originating systems should be prepared to either 484 send conventional envelopes and message headers or to return the 485 message to the originating user so the message may be manually 486 downgraded to the traditional form, possibly using encoded words 487 [RFC2047] in the message headers. Of course, such transformations 488 imply that the originating user or system must have ASCII-only 489 addresses available for all senders and recipients. Mechanisms by 490 which such addresses may be found or identified are outside the scope 491 of these specifications as are decisions about the design of 492 originating systems such as whether any required transformations are 493 made by the user, the originating MUA, or the Submission server. 495 A somewhat more complex situation arises when the first-hop system 496 supports these extensions but some subsequent server in the SMTP 497 transmission chain does not. It is important to note that most cases 498 of that situation will be the result of configuration errors: 499 especially if it hosts non-ASCII addresses, a final delivery server 500 that accepts these extensions should not be configured with lower- 501 preference MX hosts that do not. While the experiments that preceded 502 these specifications included a mechanism for passing backup ASCII 503 addresses to intermediate relay systems and having those systems 504 alter the relevant message header fields and substitute the 505 addresses, the requirements and long-term implications of that system 506 proved too complex to be satisfactory. Consequently, if an 507 intermediate SMTP relay that is transmitting a message that requires 508 these extensions and discovers that the next system in the chain does 509 not support them, it will have little choice other than to reject or 510 return the message. 512 As discussed above, downgrading to an ASCII-only form may occur 513 before or during the initial message submission. It might also occur 514 after the delivery to the final delivery MTA in order to accommodate 515 messages stores or IMAP or POP servers or clients that have different 516 capabilities than the delivery MTA. These two cases are discussed in 517 the subsections below. 519 8.1. Downgrading before or during Message Submission 521 Perhaps obviously, the most convenient time to find an ASCII address 522 corresponding to an internationalized address is at the originating 523 MUA. This can occur either before the message is sent or after the 524 internationalized form of the message is rejected. It is also the 525 most convenient time to convert a message from the internationalized 526 form into conventional ASCII form or to generate a non-delivery 527 message to the sender if either is necessary. At that point, the 528 user has a full range of choices available, including contacting the 529 intended recipient out of band for an alternate address, consulting 530 appropriate directories, arranging for translation of both addresses 531 and message content into a different language, and so on. While it 532 is natural to think of message downgrading as optimally being a 533 fully-automated process, we should not underestimate the capabilities 534 of a user of at least moderate intelligence who wishes to communicate 535 with another such user. 537 In this context, one can easily imagine modifications to message 538 submission servers (as described in [RFC4409]) so that they would 539 perform downgrading, or perhaps even upgrading, operations, receiving 540 messages with one or more of the internationalization extensions 541 discussed here and adapting the outgoing message, as needed, to 542 respond to the delivery or next-hop environment it encounters. 544 8.2. Downgrading or Other Processing After Final SMTP Delivery 546 When an email message is received by a final delivery SMTP server, it 547 is usually stored in some form. Then it is retrieved either by 548 software that reads the stored form directly or by client software 549 via some email retrieval mechanisms such as POP or IMAP. 551 The SMTP extension described in Section 7.1 provides protection only 552 in transport. It does not prevent MUAs and email retrieval 553 mechanisms that have not been upgraded to understand 554 internationalized addresses and UTF-8 message headers from accessing 555 stored internationalized emails. 557 Since the final delivery SMTP server (or, to be more specific, its 558 corresponding mail storage agent) cannot safely assume that agents 559 accessing email storage will always be capable of handling the 560 extensions proposed here, it MAY either downgrade internationalized 561 emails or specially identify messages that utilize these extensions, 562 or both. If this is done, the final delivery SMTP server SHOULD 563 include a mechanism to preserve or recover the original 564 internationalized forms without information loss to support access by 565 UTF8SMTPbis-aware agents. 567 9. Downgrading in Transit 569 [[anchor16: Note in Draft and Question for the WG: We could discuss 570 the various issues with in-transit downgrading including the 571 complexities of carrying backup addresses, the problems that 572 motivated the "don't mess with addresses in transit" (paraphrased, 573 obviously) rule in RFC 5321 and friends, and so on. Or we could omit 574 it (and this section). Pragmatically, I think it would take us some 575 time to reach consensus on what, exactly, should be said and that 576 might delay progress. But input is clearly needed -- if it is not 577 received before we prepared -02, this section will simply be 578 dropped.]] 580 10. User Interface and Configuration Issues 582 Internationalization of addresses and message headers, especially in 583 combination with variations on character coding that are inherent to 584 Unicode, may make careful choices of addresses and careful 585 configuration of servers and DNS records even more important than 586 they are for traditional Internet email. It is likely that, as 587 experience develops with the use of these protocols, it will be 588 desirable to produce one or more additional documents that offer 589 guidance for configuration and interfaces. A document that discusses 590 issues with mail user agents (MUAs), especially with regard to 591 downgrading, is expected to be developed in the EAI Working Group. 593 The subsections below address some other issues. 595 10.1. Choices of Mailbox Names and Unicode Normalization 597 It has long been the case the email syntax permits choices about 598 mailbox names that that are unwise in practice if one actually 599 intends the mailboxes to be accessible to a broad range of senders. 600 The most-often-cited examples involve the use of case-sensitivity and 601 tricky quoting of embedded characters in mailbox local parts. While 602 these are permitted by the protocols and servers are expected to 603 support them and there are special cases where they can provide 604 value, taking advantage of those features is almost always bad 605 practice. 607 In the absence of this extension, SMTP clients and servers are 608 constrained to using only those addresses permitted by RFC 5321. The 609 local parts of those addresses MAY be made up of any ASCII characters 610 except the control characters that 5321 prohibits, although some of 611 them MUST be quoted as specified there. It is notable in an 612 internationalization context that there is a long history on some 613 systems of using overstruck ASCII characters (a character, a 614 backspace, and another character) within a quoted string to 615 approximate non-ASCII characters. This form of internationalization 616 was permitted by RFC 821 but is prohibited by RFC 5321 because it 617 requires a backspace character (a prohibited C0 control). The 618 practice SHOULD be phased out as this extension becomes widely 619 deployed but backward-compatibility considerations may require that 620 it continue to be recognized. 622 For the particular case of EAI mailbox names, special attention must 623 be paid to Unicode normalization, in part because Unicode strings may 624 be normalized by other processes independent of what a mail protocol 625 specifies (this is exactly analogous to what may happen with quoting 626 and dequoting in traditional addresses). Consequently, the following 627 principles are offered as advice to those who are selecting names for 628 mailboxes: 630 o In general, it is wise for servers to provide addresses only in 631 Normalized form and to normalize strings on receipt, using either 632 Normalization Form NFC and, except in unusual circumstances, NFKC. 633 [[anchor19: Note in Draft: "Normalize on receipt" is consistent 634 with the recommendations in draft-iab-i18n-encoding. The issue 635 with NFKC is that some of the characters mapped out may be 636 significant, especially in personal names. Anyone with objections 637 should speak up. Soon.]] 639 o It may be wise to support other forms of the same local-part 640 string, either as aliases or by normalization of strings reaching 641 the delivery server, in the event that the sender does not send 642 the strings in normalized form. 644 o Stated differently and in more specific terms, the rules of the 645 protocol for local-part strings essentially provide that: 647 * Unnormalized strings are valid, but sufficiently bad practice 648 that they may not work reliably on a global basis. 650 * C0 (and presumably C1) controls (see The Unicode Standard) are 651 prohibited, the first in RFC 5321 and the second by an obvious 652 extension from it. 654 * Other kinds of punctuation, spaces, etc., are risky practice. 655 Perhaps they will work, and SMTP receiver code is required to 656 handle them, but creating dependencies on them in mailbox names 657 that are chosen is usually a bad practice and may lead to 658 interoperability problems. 660 11. Additional Issues 662 This section identifies issues that are not covered, or not covered 663 comprehensively, as part of this set of specifications, but that will 664 require ongoing review as part of deployment of email address and 665 header internationalization. 667 11.1. Impact on URIs and IRIs 669 The mailto: schema defined in [RFC2368] and discussed in the 670 Internationalized Resource Identifier (IRI) specification [RFC3987] 671 may need to be modified when this work is completed and standardized. 672 In particular, providing an alternate address as part of a mailto: 673 URI may require some fairly careful work on the syntax of that URI. 675 11.2. Interaction with Delivery Notifications 677 The advent of UTF8SMTPbis will make necessary consideration of the 678 interaction with delivery notification mechanisms, including the 679 ASCII-only SMTP extension for requesting delivery notifications 680 (DSNs) [RFC3461], and the format of delivery notifications [RFC3464]. 681 A new document, "International Delivery and Disposition 682 Notifications" [RFC5337] adds a new address type for international 683 email addresses so an original recipient address with non-ASCII 684 characters can be correctly preserved even after downgrading. If an 685 SMTP server advertises both the UTF8SMTPbis and the DSN extension, 686 that server MUST implement internationalized DSNs, including support 687 for the ORCPT parameter. 689 11.3. Use of Email Addresses as Identifiers 691 There are a number of places in contemporary Internet usage in which 692 email addresses are used as identifiers for individuals, including as 693 identifiers to Web servers supporting some electronic commerce sites. 694 These documents do not address those uses, but it is reasonable to 695 expect that some difficulties will be encountered when 696 internationalized addresses are first used in those contexts, many of 697 which cannot even handle the full range of addresses permitted today. 699 11.4. Encoded Words, Signed Messages, and Downgrading 701 One particular characteristic of the email format is its persistency: 702 MUAs are expected to handle messages that were originally sent 703 decades ago and not just those delivered seconds ago. As such, MUAs 704 and mail filtering software, such as that specified in Sieve 705 [RFC5228], will need to continue to accept and decode header fields 706 that use the "encoded word" mechanism [RFC2047] to accommodate non- 707 ASCII characters in some header fields. While extensions to both 708 POP3 and IMAP have been proposed to enable automatic EAI-upgrade -- 709 including RFC 2047 decoding -- of messages by the POP3 or IMAP 710 server, there are message structures and MIME content-types for which 711 that cannot be done or where the change would have unacceptable side 712 effects. 714 For example, message parts that are cryptographically signed, using 715 e.g., S/MIME [RFC3851] or Pretty Good Privacy (PGP) [RFC3156], cannot 716 be upgraded from the RFC 2047 form to normal UTF-8 characters without 717 breaking the signature. Similarly, message parts that are encrypted 718 may contain, when decrypted, header fields that use the RFC 2047 719 encoding; such messages cannot be 'fully' upgraded without access to 720 cryptographic keys. 722 11.5. LMTP 724 LMTP [RFC2033] may be used as the final delivery agent. In such 725 cases, LMTP may be arranged to deliver the mail to the mail store. 726 The mail store may not have UTF8SMTPbis capability. LMTP need to be 727 updated to deal with these situations. 729 11.6. Other Uses of Local Parts 731 Local parts are sometimes used to construct domain labels, e.g., the 732 local part "user" in the address user@domain.example could be 733 converted into a vanity host user.domain.example with its Web space 734 at and the catchall addresses 735 any.thing.goes@user.domain.example. 737 Such schemes are obviously limited by, among other things, the SMTP 738 rules for domain names, and will not work without further 739 restrictions for other local parts such as the 740 specified in [RFC5335]. Whether this issue is relevant to these 741 specifications is an open question. It may be simply another case of 742 the considerable flexibility accorded to delivery MTAs in determining 743 the mailbox names they will accept and how they are interpreted. 745 11.7. Non-Standard Encapsulation Formats 747 Some applications use formats similar to the application/mbox format 748 defined in [RFC4155] instead of the message/digest RFC 2046, Section 749 5.1.5 [RFC2046] form to transfer multiple messages as single units. 750 Insofar as such applications assume that all stored messages use the 751 message/rfc822 RFC 2046, Section 5.2.1 [RFC2046] format with US-ASCII 752 message headers, they are not ready for the extensions specified in 753 this series of documents and special measures may be needed to 754 properly detect and process them. 756 12. Experimental Targets 758 [[anchor26: Note in draft: this section is left in this draft for 759 convenience in review. It will be removed with -02.]] 761 In addition to the simple question of whether the model outlined here 762 can be made to work in a satisfactory way for upgraded systems and 763 provide adequate protection for un-upgraded ones, we expect that 764 actually working with the systems will provide answers to two 765 additional questions: what restrictions such as character lists or 766 normalization should be placed, if any, on the characters that are 767 permitted to be used in address local-parts and how useful, in 768 practice, will downgrading turn out to be given whatever restrictions 769 and constraints that must be placed upon it. 771 13. IANA Considerations 773 This overview description and framework document does not contemplate 774 any IANA registrations or other actions. Some of the documents in 775 the group have their own IANA considerations sections and 776 requirements. 778 14. Security Considerations 780 Any expansion of permitted characters and encoding forms in email 781 addresses raises some risks. There have been discussions on so 782 called "IDN-spoofing" or "IDN homograph attacks". These attacks 783 allow an attacker (or "phisher") to spoof the domain or URLs of 784 businesses. The same kind of attack is also possible on the local 785 part of internationalized email addresses. It should be noted that 786 the proposed fix involving forcing all displayed elements into 787 normalized lower-case works for domain names in URLs, but not email 788 local parts since those are case sensitive. 790 Since email addresses are often transcribed from business cards and 791 notes on paper, they are subject to problems arising from confusable 792 characters (see [RFC4690]). These problems are somewhat reduced if 793 the domain associated with the mailbox is unambiguous and supports a 794 relatively small number of mailboxes whose names follow local system 795 conventions. They are increased with very large mail systems in 796 which users can freely select their own addresses. 798 The internationalization of email addresses and message headers must 799 not leave the Internet less secure than it is without the required 800 extensions. The requirements and mechanisms documented in this set 801 of specifications do not, in general, raise any new security issues. 803 They do require a review of issues associated with confusable 804 characters -- a topic that is being explored thoroughly elsewhere 805 (see, e.g., [RFC4690]) -- and, potentially, some issues with UTF-8 806 normalization, discussed in [RFC3629], and other transformations. 807 Normalization and other issues associated with transformations and 808 standard forms are also part of the subject of ongoing work discussed 809 in [RFC5198], in [RFC5893] and elsewhere. 811 Some issues specifically related to internationalized addresses and 812 message headers are discussed in more detail in the other documents 813 in this set. However, in particular, caution should be taken that 814 any "downgrading" mechanism, or use of downgraded addresses, does not 815 inappropriately assume authenticated bindings between the 816 internationalized and ASCII addresses. Expecting and most or all 817 such transformations prior to final delivery be done by systems that 818 are presumed to be under the administrative control of the sending 819 user ameliorates the potential problem somewhat as compared to what 820 it would be if the relationships were changed in transit. 822 The new UTF-8 header and message formats might also raise, or 823 aggravate, another known issue. If the model creates new forms of an 824 'invalid' or 'malformed' message, then a new email attack is created: 825 in an effort to be robust, some or most agents will accept such 826 message and interpret them as if they were well-formed. If a filter 827 interprets such a message differently than the final MUA, then it may 828 be possible to create a message that appears acceptable under the 829 filter's interpretation but should be rejected under the 830 interpretation given to it by the final MUA. Such attacks already 831 exist for existing messages and encoding layers, e.g., invalid MIME 832 syntax, invalid HTML markup, and invalid coding of particular image 833 types. 835 In addition, email addresses are used in many contexts other than 836 sending mail, such as for identifiers under various circumstances 837 (see Section 11.3). Each of those contexts will need to be 838 evaluated, in turn, to determine whether the use of non-ASCII forms 839 is appropriate and what particular issues they raise. 841 This work will clearly affect any systems or mechanisms that are 842 dependent on digital signatures or similar integrity protection for 843 email message headers (see also the discussion in Section 11.4). 844 Many conventional uses of PGP and S/MIME are not affected since they 845 are used to sign body parts but not message headers. On the other 846 hand, the developing work on domain keys identified mail (DKIM 847 [RFC5863]) will eventually need to consider this work and vice versa: 848 while this specification does not address or solve the issues raised 849 by DKIM and other signed header mechanisms, the issues will have to 850 be coordinated and resolved eventually if the two sets of protocols 851 are to co-exist. In addition, to the degree to which email addresses 852 appear in PKI (Public Key Infrastructure) certificates, standards 853 addressing such certificates will need to be upgraded to address 854 these internationalized addresses. Those upgrades will need to 855 address questions of spoofing by look-alikes of the addresses 856 themselves. 858 15. Acknowledgements 860 This document is an update to, and derived from, RFC 4952. This 861 document would have been impossible without the work and 862 contributions acknowledged in it. The present document benefited 863 significantly from discussions in the EAI WG and elsewhere after RFC 864 4952 was published, especially discussions about the experimental 865 versions of other documents in the internationalized email 866 collection, and from RFC errata on RFC 4952 itself. 868 16. References 870 16.1. Normative References 872 [ASCII] American National Standards Institute (formerly 873 United States of America Standards Institute), 874 "USA Code for Information Interchange", 875 ANSI X3.4-1968, 1968. 877 ANSI X3.4-1968 has been replaced by newer 878 versions with slight modifications, but the 1968 879 version remains definitive for the Internet. 881 [RFC1652] Klensin, J., Freed, N., Rose, M., Stefferud, E., 882 and D. Crocker, "SMTP Service Extension for 883 8bit-MIMEtransport", RFC 1652, July 1994. 885 [RFC2119] Bradner, S., "Key words for use in RFCs to 886 Indicate Requirement Levels'", RFC 2119, BCP 14, 887 March 1997. 889 [RFC3629] Yergeau, F., "UTF-8, a transformation format of 890 ISO 10646", STD 63, RFC 3629, November 2003. 892 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", 893 RFC 5321, October 2008. 895 [RFC5890] Klensin, J., "Internationalized Domain Names for 896 Applications (IDNA): Definitions and Document 897 Framework", RFC 5890, June 2010. 899 16.2. Informative References 901 [EAI-Mailinglist] Gellens, R., "Mailing Lists and 902 Internationalized Email Addresses", June 2010, < 903 https://datatracker.ietf.org/doc/ 904 draft-ietf-eai-mailinglist/>. 906 [Hoffman-IMAA] Hoffman, P. and A. Costello, "Internationalizing 907 Mail Addresses in Applications (IMAA)", Work 908 in Progress, October 2003. 910 [JET-IMA] Yao, J. and J. Yeh, "Internationalized eMail 911 Address (IMA)", Work in Progress, June 2005. 913 [Klensin-emailaddr] Klensin, J., "Internationalization of Email 914 Addresses", Work in Progress, July 2005. 916 [RFC2033] Myers, J., "Local Mail Transfer Protocol", 917 RFC 2033, October 1996. 919 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose 920 Internet Mail Extensions (MIME) Part One: Format 921 of Internet Message Bodies", RFC 2045, 922 November 1996. 924 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose 925 Internet Mail Extensions (MIME) Part Two: Media 926 Types", RFC 2046, November 1996. 928 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail 929 Extensions) Part Three: Message Header 930 Extensions for Non-ASCII Text", RFC 2047, 931 November 1996. 933 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value 934 and Encoded Word Extensions: 935 Character Sets, Languages, and Continuations", 936 RFC 2231, November 1997. 938 [RFC2368] Hoffman, P., Masinter, L., and J. Zawinski, "The 939 mailto URL scheme", RFC 2368, July 1998. 941 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. 942 Roessler, "MIME Security with OpenPGP", 943 RFC 3156, August 2001. 945 [RFC3461] Moore, K., "Simple Mail Transfer Protocol (SMTP) 946 Service Extension for Delivery Status 947 Notifications (DSNs)", RFC 3461, January 2003. 949 [RFC3464] Moore, K. and G. Vaudreuil, "An Extensible 950 Message Format for Delivery Status 951 Notifications", RFC 3464, January 2003. 953 [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail 954 Extensions (S/MIME) Version 3.1 Message 955 Specification", RFC 3851, July 2004. 957 [RFC3987] Duerst, M. and M. Suignard, "Internationalized 958 Resource Identifiers (IRIs)", RFC 3987, 959 January 2005. 961 [RFC4155] Hall, E., "The application/mbox Media Type", 962 RFC 4155, September 2005. 964 [RFC4409] Gellens, R. and J. Klensin, "Message Submission 965 for Mail", RFC 4409, April 2006. 967 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, 968 "Review and Recommendations for 969 Internationalized Domain Names (IDNs)", 970 RFC 4690, September 2006. 972 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework 973 for Internationalized Email", RFC 4952, 974 July 2007. 976 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format 977 for Network Interchange", RFC 5198, March 2008. 979 [RFC5228] Guenther, P. and T. Showalter, "Sieve: An Email 980 Filtering Language", RFC 5228, January 2008. 982 [RFC5322] Resnick, P., Ed., "Internet Message Format", 983 RFC 5322, October 2008. 985 [RFC5335] Abel, Y., "Internationalized Email Headers", 986 RFC 5335, September 2008. 988 [RFC5336] Yao, J. and W. Mao, "SMTP Extension for 989 Internationalized Email Addresses", RFC 5336, 990 September 2008. 992 [RFC5337] Newman, C. and A. Melnikov, "Internationalized 993 Delivery Status and Disposition Notifications", 994 RFC 5337, September 2008. 996 [RFC5504] Fujiwara, K. and Y. Yoneya, "Downgrading 997 Mechanism for Email Address 998 Internationalization", RFC 5504, March 2009. 1000 [RFC5721] Gellens, R. and C. Newman, "POP3 Support for 1001 UTF-8", RFC 5721, February 2010. 1003 [RFC5738] Resnick, P. and C. Newman, "IMAP Support for 1004 UTF-8", RFC 5738, March 2010. 1006 [RFC5825] Fujiwara, K. and B. Leiba, "Displaying 1007 Downgraded Messages for Email Address 1008 Internationalization", RFC 5825, April 2010. 1010 [RFC5863] Hansen, T., Siegel, E., Hallam-Baker, P., and D. 1011 Crocker, "DomainKeys Identified Mail (DKIM) 1012 Development, Deployment, and Operations", 1013 RFC 5863, May 2010. 1015 [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left 1016 Scripts for Internationalized Domain Names for 1017 Applications (IDNA)", RFC 5893, June 2010. 1019 Appendix A. Change Log 1021 [[RFC Editor: Please remove this section prior to publication.]] 1023 A.1. Changes between -00 and -01 1025 o Because there has been no feedback on the mailing list, updated 1026 the various questions to refer to this version as well. 1028 o Reflected RFC Editor erratum #1507 by correcting terminology for 1029 headers and header fields and distinguishing between "message 1030 headers" and different sorts of headers (e.g., the MIME ones). 1032 o Merged earlier sections 4.4 and 4.6 into an expanded Section 4.4. 1034 o Merged earlier Section 11.6 into Section 11.2 and eliminated the 1035 note in draft. 1037 o Eliminated former last paragraph of Section 11.4 as an artifact of 1038 in-transit downgrading. 1040 o Updated a few references. 1042 Authors' Addresses 1044 John C Klensin 1045 1770 Massachusetts Ave, #322 1046 Cambridge, MA 02140 1047 USA 1049 Phone: +1 617 491 5735 1050 EMail: john-ietf@jck.com 1052 YangWoo Ko 1053 ICU 1054 119 Munjiro 1055 Yuseong-gu, Daejeon 305-732 1056 Republic of Korea 1058 EMail: yw@mrko.pe.kr