idnits 2.17.1 draft-ietf-eai-framework-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1016. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1027. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1034. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1040. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 5, 2007) is 6284 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 1652 (Obsoleted by RFC 6152) ** Obsolete normative reference: RFC 2821 (Obsoleted by RFC 5321) ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) == Outdated reference: A later version (-06) exists of draft-ietf-eai-dsn-00 == Outdated reference: A later version (-13) exists of draft-ietf-eai-smtpext-01 == Outdated reference: A later version (-12) exists of draft-ietf-eai-utf8headers-01 == Outdated reference: A later version (-12) exists of draft-ietf-eai-downgrade-02 == Outdated reference: A later version (-09) exists of draft-ietf-eai-imap-utf8-00 == Outdated reference: A later version (-03) exists of draft-ietf-eai-scenarios-01 -- Obsolete informational reference (is this intentional?): RFC 2368 (Obsoleted by RFC 6068) -- Obsolete informational reference (is this intentional?): RFC 2822 (Obsoleted by RFC 5322) -- Obsolete informational reference (is this intentional?): RFC 3028 (Obsoleted by RFC 5228, RFC 5429) -- Obsolete informational reference (is this intentional?): RFC 3851 (Obsoleted by RFC 5751) -- Obsolete informational reference (is this intentional?): RFC 4409 (Obsoleted by RFC 6409) Summary: 4 errors (**), 0 flaws (~~), 8 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Email Address Internationalization J. Klensin 3 (EAI) 4 Internet-Draft Y. Ko 5 Intended status: Informational ICU 6 Expires: August 9, 2007 February 5, 2007 8 Overview and Framework for Internationalized Email 9 draft-ietf-eai-framework-05.txt 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on August 9, 2007. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 Full use of electronic mail throughout the world requires that people 43 be able to use their own names, written correctly in their own 44 languages and scripts, as mailbox names in email addresses. This 45 document introduces a series of specifications that define mechanisms 46 and protocol extensions needed to fully support internationalized 47 email addresses. These changes include an SMTP extension and 48 extension of email header syntax to accommodate UTF-8 data. The 49 document set also includes discussion of key assumptions and issues 50 in deploying fully internationalized email. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 1.1. Role of This Specification . . . . . . . . . . . . . . . . 3 56 1.2. Problem statement . . . . . . . . . . . . . . . . . . . . 3 57 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. Overview of the Approach . . . . . . . . . . . . . . . . . . . 6 59 3. Document Plan . . . . . . . . . . . . . . . . . . . . . . . . 6 60 4. Overview of Protocol Extensions and Changes . . . . . . . . . 7 61 4.1. SMTP Extension for Internationalized Email Address . . . . 7 62 4.2. Transmission of Email Header Fields in UTF-8 Encoding . . 8 63 4.3. Downgrading Mechanism for Backward Compatibility . . . . . 9 64 5. Downgrading Before and After SMTP Transactions . . . . . . . . 10 65 5.1. Downgrading Before or During Message Submission . . . . . 10 66 5.2. Downgrading or Other Processing After Final SMTP 67 Delivery . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 6. Additional Issues . . . . . . . . . . . . . . . . . . . . . . 11 69 6.1. Impact on URIs and IRIs . . . . . . . . . . . . . . . . . 11 70 6.2. Interaction with delivery notifications . . . . . . . . . 12 71 6.3. Use of email addresses as identifiers . . . . . . . . . . 12 72 6.4. Encoded words, signed messages and downgrading . . . . . . 12 73 6.5. Other Uses of Local Parts . . . . . . . . . . . . . . . . 13 74 6.6. Non-standard Encapsulation Formats . . . . . . . . . . . . 13 75 7. Experimental Targets . . . . . . . . . . . . . . . . . . . . . 13 76 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 77 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 78 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 79 11. Change History . . . . . . . . . . . . . . . . . . . . . . . . 16 80 11.1. draft-klensin-ima-framework: Version 00 . . . . . . . . . 16 81 11.2. draft-klensin-ima-framework: Version 01 . . . . . . . . . 16 82 11.3. draft-ietf-eai-framework: Version 00 . . . . . . . . . . . 16 83 11.4. draft-ietf-eai-framework: Version 01 . . . . . . . . . . . 17 84 11.5. draft-ietf-eai-framework: Version 02 . . . . . . . . . . . 17 85 11.6. draft-ietf-eai-framework: Version 03 . . . . . . . . . . . 18 86 11.7. draft-ietf-eai-framework: Version 04 . . . . . . . . . . . 18 87 11.8. draft-ietf-eai-framework: Version 05 . . . . . . . . . . . 18 88 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 89 12.1. Normative References . . . . . . . . . . . . . . . . . . . 18 90 12.2. Informative References . . . . . . . . . . . . . . . . . . 19 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 92 Intellectual Property and Copyright Statements . . . . . . . . . . 23 94 1. Introduction 96 In order to use internationalized email addresses, we need to 97 internationalize both the domain part and the local part of email 98 addresses. The domain part of email addresses is already 99 internationalized [RFC3490], while the local part is not. Without 100 the extensions specified in this document, the mailbox name is 101 restricted to a subset of 7-bit ASCII [RFC2821]. Though MIME 102 [RFC2045] enables the transport of non-ASCII data, it does not 103 provide a mechanism for internationalized email addresses. In RFC 104 2047 [RFC2047], MIME defines an encoding mechanism for some specific 105 message header fields to accommodate non-ASCII data. However, it 106 does not permit the use of email addresses that include non-ASCII 107 characters. Without the extensions defined here, or some equivalent 108 set, the only way to incorporate non-ASCII characters in any part of 109 email addresses is to use RFC2047 coding to embed them in what RFC 110 2822 [RFC2822] calls the "display name" (known as a "name phrase" or 111 by other terms elsewhere) of the relevant headers. Information coded 112 into the display name is invisible in the message envelope and, for 113 many purposes, is not part of the address at all. 115 1.1. Role of This Specification 117 This document presents the overview and framework for an approach to 118 the next stage of email internationalization. This new stage 119 requires not only internationalization of addresses and headers, but 120 also associated transport and delivery models. 122 This document provides the framework for a series of experimental 123 specifications that, together, provide the details for a way to 124 implement and support internationalized email. The document itself 125 describes how the various elements of email internationalization fit 126 together and the relationships among the various documents involved. 128 1.2. Problem statement 130 IDNA [RFC3490] permits internationalized domain names, but deployment 131 has not yet reached most users. One of the reasons for this is that 132 we do not yet have fully internationalized naming schemes. Domain 133 names are just one of the various names and identifiers that are 134 required to be internationalized. In many contexts, until more of 135 those identifiers are internationalized, internationalized domain 136 names alone have little value. 138 Email addresses are prime examples of why it is not good enough to 139 just internationalize the domain name. As most of us have learned 140 from experience, users strongly prefer email addresses that resemble 141 names or initials to those involving seemingly meaningless strings of 142 letters or numbers. Unless the entire email address can use familiar 143 characters and formats, users will perceive email as being culturally 144 unfriendly. If the names and initials used in email addresses can be 145 expressed in the native languages and writing systems of the users, 146 the Internet will be perceived as more natural, especially by those 147 whose native language is not written in a subset of a Roman-derived 148 script. 150 Internationalization of email addresses is not merely a matter of 151 changing the SMTP envelope; or of modifying the From, To, and Cc 152 headers; or of permitting upgraded mail user agents (MUAs) to decode 153 a special coding and respond by displaying local characters. To be 154 perceived as usable, the addresses must be internationalized and 155 handled consistently in all of the contexts in which they occur. 156 This requirement has far-reaching implications: collections of 157 patches and workarounds are not adequate. Even if they were 158 adequate, a workaround-based approach may result in an assortment of 159 implementations with different sets of patches and workarounds having 160 been applied with consequent user confusion about what is actually 161 usable and supported. Instead, we need to build a fully 162 internationalized email environment, focusing on permitting efficient 163 communication among those who share a language or other community. 164 That, in turn, implies changes to the mail header environment to 165 permit the full range of Unicode characters where that makes sense, 166 an SMTP extension to permit UTF-8 [RFC3629] mail addressing and 167 delivery of those extended headers, and (finally) a requirement for 168 support of the 8BITMIME SMTP Extension [RFC1652] so that all of these 169 can be transported through the mail system without having to overcome 170 the limitation that headers do not have content-transfer-encodings. 172 1.3. Terminology 174 This document assumes a reasonable understanding of the protocols and 175 terminology of the core email standards as documented in [RFC2821] 176 and [RFC2822]. 178 Much of the description in this document depends on the abstractions 179 of "Mail Transfer Agent" ("MTA") and "Mail User Agent" ("MUA"). 180 However, it is important to understand that those terms and the 181 underlying concepts postdate the design of the Internet's email 182 architecture and the application of the "protocols on the wire" 183 principle to it. That email architecture, as it has evolved, and the 184 "wire" principle have prevented any strong and standardized 185 distinctions about how MTAs and MUAs interact on a given origin or 186 destination host (or even whether they are separate). 188 However, the term "final delivery MTA" is used in this document in a 189 fashion equivalent to the term "delivery system" or "final delivery 190 system" of RFC 2821. This is the SMTP server that controls the 191 format of local parts of addresses and is permitted to inspect and 192 interpret them. It receives messages from the network for delivery 193 to mailboxes or other local processing, including any forwarding or 194 aliasing that changes envelope addresses, rather than relaying. From 195 the perspective of the network, any local delivery arrangements such 196 as saving to a message store, handoff to specific message delivery 197 programs or agents, and mechanisms for retrieving messages are all 198 "behind" the final delivery MTA and hence not part of the SMTP 199 transport or delivery process. 201 In this document, an address is "all-ASCII", or just an "ASCII 202 address", if every character in the address is in the ASCII character 203 repertoire [ASCII]; an address is "non-ASCII", or an "i18n-address", 204 if any character is not in the ASCII character repertoire. Such 205 addresses may be restricted in other ways, but those restrictions are 206 not relevant to this definition. The term "all-ASCII" is also 207 applied to other protocol elements when the distinction is important, 208 with "non-ASCII" or "internationalized" as its opposite. 210 The umbrella term to describe the email address internationalization 211 specified by this document and its companion documents is "UTF8SMTP". 212 For example, an address permitted by this specification is referred 213 to as a "UTF8SMTP (compliant) address". 215 Please note that according to the definitions given here the set of 216 all "all-ASCII" addresses and the set of all "non-ASCII" addresses 217 are mutually exclusive. The set of all UTF8SMTP addresses is the 218 union of these two sets. 220 An "ASCII user" (i) exclusively uses email addresses that contain 221 ASCII characters only, and (ii) cannot generate recipient addresses 222 that contain non-ASCII characters. 224 An "i18mail user" has one or more non-ASCII email addresses. Such a 225 user may have ASCII addresses too; if the user has more than one 226 email account and corresponding address, or more than one alias for 227 the same address, he or she has some method to choose which address 228 to use on outgoing email. Note that under this definition, it is not 229 possible to tell from an ASCII address if the owner of that address 230 is an i18mail user or not. (A non-ASCII address implies a belief 231 that the owner of that address is an i18mail user.) There is no such 232 thing as an "i18mail message"; the term applies only to users and 233 their agents and capabilities. 235 A "message" is sent from one user (sender) using a particular email 236 address to one or more other recipient email addresses (often 237 referred to just as "users" or "recipient users"). 239 A "mailing list" is a mechanism whereby a message may be distributed 240 to multiple recipients by sending to one recipient address. An agent 241 (typically not a human being) at that single address then causes the 242 message to be redistributed to the target recipients. This agent 243 sets the envelope return address of the redistributed message to a 244 different address from that of the original single recipient message. 245 Using a different envelope return address (reverse-path) causes error 246 (and other automatically generated) messages to go to an error 247 handling address. 249 As specified in RFC 2821, a message that is undeliverable for some 250 reason is expected to result in notification to the sender. This can 251 occur in either of two ways. One, typically called "Rejection", 252 occurs when an SMTP server returns a reply code indicating a fatal 253 error (a "5yz" code) or persistently returns a temporary failure 254 error (a "4yz" code). The other involves accepting the message 255 during SMTP processing and then generating a message to the sender, 256 typically known as a "Non-delivery notification" or "NDN". Current 257 practice often favors rejection over NDNs because of the reduced 258 likelihood that the generation of NDNs will be used as a spamming 259 technique. The latter, NDN, case is unavoidable if an intermediate 260 MTA accepts a message that is then rejected by the next-hop server. 262 The pronouns "he" and "she" are used interchangeably to indicate a 263 human of indeterminate gender. 265 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", 266 and "MAY" in this document are to be interpreted as described in RFC 267 2119 [RFC2119]. 269 2. Overview of the Approach 271 This set of specifications changes both SMTP and the format of email 272 headers to permit non-ASCII characters to be represented directly. 273 Each important component of the work is described in a separate 274 document. The document set, whose members are described in the next 275 section, also contains informational documents whose purpose is to 276 provide implementation suggestions and guidance for the protocols. 278 3. Document Plan 280 In addition to this document, the following documents make up this 281 specification and provide advice and context for it. 283 o SMTP extensions. This document [EAI-SMTPext] provides an SMTP 284 extension for internationalized addresses, as provided for in RFC 285 2821. 287 o Email headers in UTF-8. This document [EAI-UTF8] essentially 288 updates RFC 2822 to permit some information in email headers to be 289 expressed directly by Unicode characters encoded in UTF-8 when the 290 SMTP extension described above is used. This document, possibly 291 with one or more supplemental ones, will also need to address the 292 interactions with MIME, including relationships between UTF8SMTP 293 and internal MIME headers and content types. 295 o In-transit downgrading from internationalized addressing with the 296 SMTP extension and UTF-8 headers to traditional email formats and 297 characters [EAI-downgrade]. Downgrading either at the point of 298 message origination or after the mail has successfully been 299 received by a final delivery SMTP server involve different 300 constraints and possibilities; see Section 4.3 and Section 5, 301 below. Processing that occurs after such final delivery, 302 primarily that involved with the delivery to a mailbox or message 303 store is sometimes called "Message Delivery" processing. 305 o Extensions to the IMAP protocol to support internationalized 306 headers [EAI-imap]. 308 o Parallel extensions to the POP protocol [EAI-pop]. 310 o Description of internationalization changes for delivery 311 notifications (DSNs) [EAI-DSN]. 313 o Scenarios for the use of these protocols [EAI-scenarios]. 315 4. Overview of Protocol Extensions and Changes 317 4.1. SMTP Extension for Internationalized Email Address 319 An SMTP extension, "UTF8SMTP" is specified that 321 o Permits the use of UTF-8 strings in email addresses, both local 322 parts and domain names. 324 o Permits the selective use of UTF-8 strings in email headers (see 325 Section 4.2). 327 o Requires that the server advertise the 8BITMIME extension 328 [RFC1652] and that the client support 8-bit transmission so that 329 header information can be transmitted without using a special 330 content-transfer-encoding. 332 o Provides information to support downgrading mechanisms. 334 Some general principles affect the development decisions underlying 335 this work. 337 1. Email addresses enter subsystems (such as a user interface) that 338 may perform charset conversions or other encoding changes. When 339 the left hand side of the address includes characters outside the 340 US-ASCII character repertoire, use of punycode on the right hand 341 side is discouraged to promote consistent processing of 342 characters throughout the address. 344 2. An SMTP relay must 346 * Either recognize the format explicitly, agreeing to do so via 347 an ESMTP option, 349 * Select and use an ASCII-only address, downgrading other 350 information as needed (see Section 4.3), or 352 * Reject the message or, if necessary, return a non-delivery 353 notification message, so that the sender can make another 354 plan. 356 If the message cannot be forwarded because the next-hop system 357 cannot accept the extension and insufficient information is 358 available to reliably downgrade it, it MUST be rejected or a non- 359 delivery message generated and sent. 361 3. In the interest of interoperability, charsets other than UTF-8 362 are prohibited in mail addresses and headers. There is no 363 practical way to identify them properly with an extension similar 364 to this without introducing great complexity. 366 Conformance to the group of standards specified here for email 367 transport and delivery requires implementation of the SMTP Extension 368 specification, including recognition of the keywords associated with 369 alternate addresses, and the UTF-8 Header specification. Support for 370 downgrading is not required, but, if implemented, MUST be implemented 371 as specified. Similarly, if the system implements IMAP or POP, it 372 MUST conform to the i18n IMAP or POP specifications respectively. 374 4.2. Transmission of Email Header Fields in UTF-8 Encoding 376 There are many places in MUAs or in user presentation in which email 377 addresses or domain names appear. Examples include the conventional 378 From, To, or Cc header fields; Message-ID and In-Reply-To header 379 fields that normally contain domain names (but that may be a special 380 case); and in message bodies. Each of these must be examined from an 381 internationalization perspective. The user will expect to see 382 mailbox and domain names in local characters, and to see them 383 consistently. If non-obvious encodings, such as protocol-specific 384 ASCII-Compatible Encoding (ACE) variants, are used, the user will 385 inevitably, if only occasionally, see them rather than "native" 386 characters and will find that discomfiting or astonishing. 387 Similarly, if different codings are used for mail transport and 388 message bodies, the user is particularly likely to be surprised, if 389 only as a consequence of the long-established "things leak" 390 principle. The only practical way to avoid these sources of 391 discomfort, in both the medium and the longer term, is to have the 392 encodings used in transport be as nearly as possible the same as the 393 encodings used in message headers and message bodies. 395 When email local parts are internationalized, it seems clear that 396 they should be accompanied by arrangements for email headers in fully 397 internationalized form. That form should presumably use UTF-8 rather 398 than ASCII as the base character set for the contents of header 399 fields (protocol elements such as the header field names themselves 400 will remain entirely in ASCII). For transition purposes and 401 compatibility with legacy systems, this can done by extending the 402 encoding models of [RFC2045] and [RFC2231]. However, our target 403 should be fully internationalized headers, as discussed in 404 [EAI-UTF8]. 406 4.3. Downgrading Mechanism for Backward Compatibility 408 As with any use of the SMTP extension mechanism, there is always the 409 possibility of a client that requires the feature encountering a 410 server that does not support the required feature. In the case of 411 email address and header internationalization, the risk should be 412 minimized by the fact that the selection of submission servers are 413 presumably under the control of the sender's client and the selection 414 of potential intermediate relays is under the control of the 415 administration of the final delivery server. 417 For situations in which a client that needs to use UTF8SMTP 418 encounters a server that does not support the extension UTF8SMTP, 419 there are two possibilities: 421 o Reject the message or generate and send a non-delivery message, 422 requiring the sender to resubmit it with traditional-format 423 addresses and headers. 425 o Figure out a way to downgrade the envelope or message body in 426 transit. Especially when internationalized addresses are 427 involved, downgrading will require that all-ASCII addresses be 428 obtained from some source. An optional extension parameter is 429 provided as a way of transmitting an alternate address. Downgrade 430 issues and a specification are discussed in [EAI-downgrade]. 432 (The client can also try an alternate next-hop host or requeue the 433 message and try later, on the assumption that the lack of UTF8SMTP is 434 a transient failure; since this ultimately resolves to success or 435 failure, it doesn't change the discussion here.) 437 The first of these two options, that of rejecting or returning the 438 message to the sender MAY always be chosen. 440 If a UTF8SMTP capable client is sending a message that does not 441 require the extended capabilities, it SHOULD send the message whether 442 or not the server announces support for the extension. In other 443 words, both the addresses in the envelope and the entire set of 444 headers of the message are entirely in ASCII (perhaps including 445 encoded words in the headers). In that case, the client SHOULD send 446 the message whether or not the server announces the capability 447 specified here. 449 5. Downgrading Before and After SMTP Transactions 451 In addition to the in-transit downgrades discussed above, downgrading 452 may also occur before or during initial message submission or after 453 delivery to the final delivery MTA. Because these cases have a 454 different set of available information from in-transit cases, the 455 constraints and opportunities may be somewhat different too. These 456 two cases are discussed in the subsections below. 458 5.1. Downgrading Before or During Message Submission 460 Perhaps obviously, the most convenient time to find an ASCII address 461 corresponding to an internationalized address is at the originating 462 MUA. This can occur either before the message is sent or after the 463 internationalized form of the message is rejected. It is also the 464 most convenient time to convert a message from the internationalized 465 form into conventional ASCII form or to generate a non-delivery 466 message to the sender if either is necessary. At that point, the 467 user has a full range of choices available, including contacting the 468 intended recipient out of band for an alternate address, consulting 469 appropriate directories, arranging for translation of both addresses 470 and message content into a different language, and so on. While it 471 is natural to think of message downgrading as optimally being a 472 fully-automated process, we should not underestimate the capabilities 473 of a user of at least moderate intelligence who wishes to communicate 474 with another such user. 476 In this context, one can easily imagine modifications to message 477 submission servers (as described in [RFC4409]) so that they would 478 perform downgrading, or perhaps even upgrading, operations, receiving 479 messages with one or more of the internationalization extensions 480 discussed here and adapting the outgoing message, as needed, to 481 respond to the delivery or next-hop environment it encounters. 483 5.2. Downgrading or Other Processing After Final SMTP Delivery 485 When an email message is received by a final delivery SMTP server, it 486 is usually stored in some form. Then it is retrieved either by 487 software that reads the stored form directly or by client software 488 via some email retrieval mechanisms such as POP or IMAP. 490 The SMTP extension described in Section 4.1 provides protection only 491 in transport. It does not prevent MUAs and email retrieval 492 mechanisms that have not been upgraded to understand 493 internationalized addresses and UTF-8 headers from accessing stored 494 internationalized emails. 496 Since the final delivery SMTP server (or, to be more specific, its 497 corresponding mail storage agent) cannot safely assume that agents 498 accessing email storage will always be capable of handling the 499 extensions proposed here, it MAY either downgrade internationalized 500 emails or specially identify messages that utilize these extensions, 501 or both. If this is done, the final delivery SMTP server SHOULD 502 include a mechanism to preserve or recover the original 503 internationalized forms without information loss to support access by 504 UTF8SMTP-aware agents. 506 6. Additional Issues 508 This section identifies issues that are not covered as part of this 509 set of specifications, but that will need to be considered as part of 510 deployment of email address and header internationalization. 512 6.1. Impact on URIs and IRIs 514 The mailto: schema defined in [RFC2368] and discussed in IRI 515 [RFC3987] may need to be modified when this work is completed and 516 standardized. 518 6.2. Interaction with delivery notifications 520 The advent of UTF8SMTP will make necessary consideration of the 521 interaction with delivery notification mechanisms, including the SMTP 522 extension for requesting delivery notifications [RFC3461], and the 523 format of delivery notifications [RFC3464]. These issues are 524 discussed in a forthcoming document that will update those RFCs as 525 needed [EAI-DSN]. 527 6.3. Use of email addresses as identifiers 529 There are a number of places in contemporary Internet usage in which 530 email addresses are used as identifiers for individuals, including as 531 identifiers to web servers supporting some electronic commerce sites. 532 These documents do not address those uses, but it is reasonable to 533 expect that some difficulties will be encountered when 534 internationalized addresses are first used in those contexts, many of 535 which cannot even handle the full range of addresses permitted today. 537 6.4. Encoded words, signed messages and downgrading 539 One particular characteristic of the email format is its persistency: 540 MUAs are expected to handle messages that were originally sent 541 decades ago and not just those delivered seconds ago. As such, MUAs 542 and mail filtering software, such as that specified in SIEVE 543 [RFC3028], will need to continue to accept and decode header fields 544 that use the "encoded word" mechanism [RFC2047] to accommodate non- 545 ASCII characters in some header fields. While extensions to both 546 POP3 and IMAP have been proposed to enable automatic EAI-upgrade -- 547 including RFC 2047 decoding -- of messages by the POP3 or IMAP 548 server, there are message structures and MIME content-types for which 549 that cannot be done or where the change would have unacceptable side- 550 effects. 552 For example, message parts that are cryptographically signed using, 553 e.g., S/MIME [RFC3851] or PGP [RFC3156], cannot be upgraded from RFC 554 2047 form to normal UTF-8 characters without breaking the signature. 555 Similarly, message parts that are encrypted may contain, when 556 decrypted, header fields that use the RFC 2047 encoding; such 557 messages cannot be 'fully' upgraded without access to cryptographic 558 keys. 560 Similar issues may arise if signed messages are downgraded in transit 561 [EAI-downgrade] and then an attempt is made to upgrade them to the 562 original form and then verify the signatures. Even the very subtle 563 changes that may result from algorithms to downgrade and then upgrade 564 again may be sufficient to invalidate the signatures if they impact 565 either the primary or MIME bodypart headers. When signatures are 566 present, downgrading must be performed with extreme care if at all. 568 6.5. Other Uses of Local Parts 570 Local parts are sometimes used to construct domain labels, e.g. the 571 local part "user" in the address user@domain.example could be 572 converted into a vanity host user.domain.example with Web space at 573 and catchall addresses 574 any.thing.goes@user.domain.example. 576 Such schemes are obviously limited by among others the SMTP rules for 577 domain names, and will not work without further restrictions for 578 other local parts such as the specified in 579 [EAI-UTF8]. Whether this issue is relevant to these specifications 580 is an open question. It may be simply another case of the 581 considerable flexibility accorded to delivery MTAs in determining the 582 mailbox names they will accept and how they are interpreted. 584 6.6. Non-standard Encapsulation Formats 586 Some applications use formats similar to the application/mbox format 587 defined in [RFC4155] instead of the message/digest RFC 2046, Section 588 5.1.5 [RFC2046] form to transfer multiple messages as single units. 589 Insofar as such applications assume that all stored messages use the 590 message/rfc822 RFC 2046, Section 5.2.1 [RFC2046] format with US-ASCII 591 headers, they are not ready for the extensions specified in this 592 series of documents and special measures may be needed to properly 593 detect and process them. 595 7. Experimental Targets 597 In addition to the simple question of whether the model outlined here 598 can be made to work in a satisfactory way for upgraded systems and 599 provide adequate protection for un-upgraded ones, we expect that 600 actually working with the systems will provide answers to two 601 additional questions: what restrictions such as character lists or 602 normalization should be placed, if any, on the characters that are 603 permitted to be used in address local-parts and how useful, in 604 practice, will downgrading turn out to be given whatever restrictions 605 and constraints that must be placed upon it. 607 8. IANA Considerations 609 This overview description and framework document does not contemplate 610 any IANA registrations or other actions. Some of the documents in 611 the group have their own IANA considerations sections and 612 requirements. 614 9. Security Considerations 616 Any expansion of permitted characters and encoding forms in email 617 addresses raises some risks. There have been discussions on so 618 called "IDN-spoofing" or "IDN homograph attacks". These attacks 619 allow an attacker (or "phisher") to spoof the domain or URLs of 620 businesses. The same kind of attack is also possible on the local 621 part of internationalized email addresses. It should be noted that 622 the proposed fix involving forcing all displayed elements into 623 normalized lower-case works for domain names in URLs, but not email 624 local parts since those are case sensitive. 626 Since email addresses are often transcribed from business cards and 627 notes on paper, they are subject to problems arising from confusable 628 characters (see [RFC4690]). These problems are somewhat reduced if 629 the domain associated with the mailbox is unambiguous and supports a 630 relatively small number of mailboxes whose names follow local system 631 conventions; they are increased with very large mail systems in which 632 users can freely select their own addresses. 634 The internationalization of email addresses and headers must not 635 leave the Internet less secure than it is without the required 636 extensions. The requirements and mechanisms documented in this set 637 of specifications do not, in general, raise any new security issues. 638 They do require a review of issues associated with confusable 639 characters -- a topic that is being explored thoroughly elsewhere 640 (see, e.g., [RFC4690]) -- and, potentially, some issues with UTF-8 641 normalization, discussed in [RFC3629], and other transformations. 642 Normalization and other issues associated with transformations and 643 standard forms are also part of the subject of ongoing work discussed 644 in [Net-Unicode], in [IDNAbis-BIDI] and elsewhere. Some issues 645 specifically related to internationalized addresses and headers are 646 discussed in more detail in the other documents in this set. 647 However, in particular, caution should be taken that any 648 "downgrading" mechanism, or use of downgraded addresses, does not 649 inappropriately assume authenticated bindings between the 650 internationalized and ASCII addresses. 652 The new UTF-8 header and message formats might also raise, or 653 aggravate, another known issue. If the model creates new forms of 654 'invalid' or 'malformed' message, then a new email attack is created: 655 in an effort to be robust, some or or most agents will accept such 656 message and interpret them as if they were well-formed. If a filter 657 interprets such a message differently than then final MUA, then it 658 may be possible to create a message which appears acceptable under 659 the filter's interpretation but which should be rejected under the 660 interpretation given it by the final MUA. Such attacks already exist 661 for existing messages and encoding layers, e.g., invalid MIME syntax, 662 invalid HTML markup, and invalid coding of particular image types. 664 Models for "downgrading" of messages or addresses from UTF-8 form to 665 some ASCII form, including those described in [EAI-downgrade], pose 666 another special problem and risk: any system that transforms one 667 address or set of mail header fields into another becomes a point at 668 which spoofing attacks can occur and those who wish to spoof messages 669 might be able to do so by imitating a message downgraded from one 670 with a legitimate original address. 672 In addition, email addresses are used in many contexts other than 673 sending mail, such as for identifiers under various circumstances 674 (see Section 6.3). Each of those contexts will need to be evaluated, 675 in turn, to determine whether the use of non-ASCII forms is 676 appropriate and what particular issues they raise. 678 This work will clearly impact any systems or mechanisms that are 679 dependent on digital signatures or similar integrity protection for 680 mail headers (see also the discussion in Section 6.4). Many 681 conventional uses of PGP and S/MIME are not affected since they are 682 used to sign body parts but not headers. On the other hand, the 683 developing work on domain keys identified mail (DKIM [DKIM-Charter]) 684 will eventually need to consider this work and vice versa: while this 685 experiment does not propose to address or solve the issues raised by 686 DKIM and other signed header mechanisms, the issues will have to be 687 coordinated and resolved eventually if the two sets of protocols are 688 to co-exist. In addition, to the degree to which email addresses 689 appear in PKI certificates, standards addressing such certificates 690 will need to be upgraded to address these internationalized 691 addresses. Those upgrades will need to address questions of spoofing 692 by look-alikes of the addresses themselves. 694 10. Acknowledgements 696 This document, and the related ones, were originally derived from 697 drafts by John Klensin and the JET group [Klensin-emailaddr], 698 [JET-IMA]. The work drew inspiration from discussions on the "IMAA" 699 mailing list, sponsored by the Internet Mail Consortium and 700 especially from an early draft by Paul Hoffman and Adam Costello 701 [Hoffman-IMAA] that attempted to define an MUA-only solution to the 702 address internationalization problem. 704 More recent drafts have benefited from considerable discussion within 705 the IETF EAI Working Group and especially from suggestions and text 706 provided by Martin Duerst, Frank Ellermann, Philip Guenther, Kari 707 Hurtta, and Alexey Melnikov, and from extended discussions among the 708 editors and authors of the core documents cited in Section 3: Harald 709 Alvestrand, Kazunori Fujiwara, Chris Newman, Pete Resnick, Jiankang 710 Yao, Jeff Yeh, and Yoshiro Yoneya. 712 Additional comments received during IETF Last Call, including those 713 from Paul Hoffman and Robert Sparks, were helpful in making the 714 document more clear and comprehensive. 716 11. Change History 718 This document has evolved through several titles as well as the usual 719 version numbers. The list below tries to trace that thread as well 720 as changes within the substance of the document. The first document 721 of the series was posted as draft-klensin-emailaddr-i18n-00.txt in 722 October 2003. 724 11.1. draft-klensin-ima-framework: Version 00 726 This version supercedes draft-lee-jet-ima-00 and 727 draft-klensin-emailaddr-i18n-03. It represents a major rewrite and 728 change of architecture from the former and incorporates many ideas 729 and some text from the latter. 731 11.2. draft-klensin-ima-framework: Version 01 733 o Some clarifications of terminology (more to follow) and general 734 editorial improvements. 736 o Upgrades to reflect discussions during IETF 64. 738 o Improved treatment of downgrading before and after message 739 transport. 741 11.3. draft-ietf-eai-framework: Version 00 743 This version supercedes draft-klensin-ima-framework-01; its file name 744 should represent the form to be used until the IETF email address and 745 header internationalization ("EAI") work concludes. 747 o Changed "display name" terminology to be consistent with RFC 2822. 748 Also clarified some other terminology issues. 750 o Added a comment about the possible role of MessageSubmission 751 servers in downgrading. 753 o Removed the "IMA" terminology, converting it to either "EAI" or 754 prose. 756 o Per meeting and mailing list discussion, added conformance 757 statements about bouncing if neither forwarding nor downgrading 758 were possible and about implementation requirements. 760 o Updated several references. Some documents are still tentative. 762 o Fixed many typographical errors. 764 11.4. draft-ietf-eai-framework: Version 01 766 o Added comments about PGP, S/MIME, and DKIM to Security 767 Considerations 769 o Rationalized terminology and included terminology from scenarios 770 document. 772 11.5. draft-ietf-eai-framework: Version 02 774 o Clarified comment about IRIs and MAILTO. 776 o Identified issue with S/MIME and PGP for encapsulated content. 778 o Added note about the definitive "UTF8SMTP" terminology. 780 o Removed mail exploder related discussions and reference. 782 o Adjusted some requirement levels. 784 o Removed computed ASCII address (aka ATOMIC) related discussion. 786 o Added a section about delivery notifications and created a pointer 787 to a new document about them. 789 o Added a new section noting the use of email addresses as 790 identifiers. 792 o Added a new section discussing implications of downgrading to 793 digital signatures on messages. 795 o Many editorial revisions, corrections to references, etc., 796 including moving the references to the other documents in the 797 series to "informative" -- this document does not depend on them 798 for a specification and is, itself, intended to be Informational. 800 11.6. draft-ietf-eai-framework: Version 03 802 o Revised the material in the "document plan" that introduces the 803 "MDA" terminology. 805 o Added definitions for "reject", and "non-delivery message" ("NDN") 806 and removed the term "bounce" from the document. 808 o Removed the "Internationalization Considerations" section as 809 pointless and silly. 811 o Several references corrected and other small text clarifications 812 inserted in response to WG Last Call comments. 814 o Modified the references to EAI WG drafts to use "EAI-" rather than 815 "I18Nemail-" to reduce the chances for confusion. 817 o Added placeholders for unresolved WG Last Call issues and notes on 818 significant changes made during WG Last Call (marked "WGLC" with 819 issues entered into the tracker identified by issue number) 821 o Incorporated extensive editorial clarifications from Randy Gellens 822 into Section 1. 824 11.7. draft-ietf-eai-framework: Version 04 826 o Corrected the description of header fields that must be examined. 828 o Added a note to "Security Considerations" about spoofing risks 829 associated with downgrading and extended the treatment of digital 830 signatures to include PKI certificate issues. 832 o Several typographic, editorial, and small definitional 833 corrections. 835 11.8. draft-ietf-eai-framework: Version 05 837 o Small textual adjustments for consistency with WGLC writeup. 839 12. References 841 12.1. Normative References 843 [ASCII] American National Standards Institute (formerly United 844 States of America Standards Institute), "USA Code for 845 Information Interchange", ANSI X3.4-1968, 1968. 847 ANSI X3.4-1968 has been replaced by newer versions with 848 slight modifications, but the 1968 version remains 849 definitive for the Internet. 851 [RFC1652] Klensin, J., Freed, N., Rose, M., Stefferud, E., and D. 852 Crocker, "SMTP Service Extension for 8bit-MIMEtransport", 853 RFC 1652, July 1994. 855 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 856 Requirement Levels'", RFC 2119, March 1997. 858 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 859 April 2001. 861 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 862 "Internationalizing Domain Names in Applications (IDNA)", 863 RFC 3490, March 2003. 865 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 866 10646", STD 63, RFC 3629, November 2003. 868 12.2. Informative References 870 [DKIM-Charter] 871 IETF, "Domain Keys Identified Mail (dkim)", October 2006, 872 . 874 [EAI-DSN] Newman, C., "UTF-8 Delivery and Disposition Notification", 875 draft-ietf-eai-dsn-00 (work in progress), January 2007. 877 This document is under development by the WG. The date 878 given is an estimate for a version ready for posting. 880 [EAI-SMTPext] 881 Yao, J., Ed. and W. Mao, Ed., "SMTP extension for 882 internationalized email address", 883 draft-ietf-eai-smtpext-01 (work in progress), July 2006. 885 [EAI-UTF8] 886 Yeh, J., "Internationalized Email Headers", 887 draft-ietf-eai-utf8headers-01.txt (work in progress), 888 August 2006. 890 [EAI-downgrade] 891 YONEYA, Y., Ed. and K. Fujiwara, Ed., "Downgrading 892 mechanism for Internationalized eMail Address (IMA)", 893 draft-ietf-eai-downgrade-02 (work in progress), 894 August 2005. 896 [EAI-imap] 897 Resnick, P. and C. Newman, "IMAP Support for UTF-8", 898 draft-ietf-eai-imap-utf8-00 (work in progress), May 2006. 900 [EAI-pop] Newman, C., "POP3 Support for UTF-8", June 2006, . 903 [EAI-scenarios] 904 Alvestrand, H., "UTF-8 Mail: Scenarios", 905 draft-ietf-eai-scenarios-01 (work in progress), June 2006. 907 [Hoffman-IMAA] 908 Hoffman, P. and A. Costello, "Internationalizing Mail 909 Addresses in Applications (IMAA)", draft-hoffman-imaa-03 910 (work in progress), October 2003. 912 [IDNAbis-BIDI] 913 Alvestrand, H. and C. Karp, "An IDNA problem in right-to- 914 left scripts", October 2006, . 917 [JET-IMA] Yao, J. and J. Yeh, "Internationalized eMail Address 918 (IMA)", draft-lee-jet-ima-00 (work in progress), 919 June 2005. 921 [Klensin-emailaddr] 922 Klensin, J., "Internationalization of Email Addresses", 923 draft-klensin-emailaddr-i18n-03 (work in progress), 924 July 2005. 926 [Net-Unicode] 927 Klensin, J. and M. Padlipsky, "Unicode Format for Network 928 Interchange", April 2006, . 931 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 932 Extensions (MIME) Part One: Format of Internet Message 933 Bodies", RFC 2045, November 1996. 935 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 936 Extensions (MIME) Part Two: Media Types", RFC 2046, 937 November 1996. 939 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 940 Part Three: Message Header Extensions for Non-ASCII Text", 941 RFC 2047, November 1996. 943 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 944 Word Extensions: Character Sets, Languages, and 945 Continuations", RFC 2231, November 1997. 947 [RFC2368] Hoffman, P., Masinter, L., and J. Zawinski, "The mailto 948 URL scheme", RFC 2368, July 1998. 950 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, 951 April 2001. 953 [RFC3028] Showalter, T., "Sieve: A Mail Filtering Language", 954 RFC 3028, January 2001. 956 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, 957 "MIME Security with OpenPGP", RFC 3156, August 2001. 959 [RFC3461] Moore, K., "Simple Mail Transfer Protocol (SMTP) Service 960 Extension for Delivery Status Notifications (DSNs)", 961 RFC 3461, January 2003. 963 [RFC3464] Moore, K. and G. Vaudreuil, "An Extensible Message Format 964 for Delivery Status Notifications", RFC 3464, 965 January 2003. 967 [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail 968 Extensions (S/MIME) Version 3.1 Message Specification", 969 RFC 3851, July 2004. 971 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 972 Identifiers (IRIs)", RFC 3987, January 2005. 974 [RFC4155] Hall, E., "The application/mbox Media Type", RFC 4155, 975 September 2005. 977 [RFC4409] Gellens, R. and J. Klensin, "Message Submission for Mail", 978 RFC 4409, April 2006. 980 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and 981 Recommendations for Internationalized Domain Names 982 (IDNs)", RFC 4690, September 2006. 984 Authors' Addresses 986 John C Klensin 987 1770 Massachusetts Ave, #322 988 Cambridge, MA 02140 989 USA 991 Phone: +1 617 491 5735 992 Email: john-ietf@jck.com 994 YangWoo Ko 995 ICU 996 119 Munjiro 997 Yuseong-gu, Daejeon 305-732 998 Republic of Korea 1000 Email: yw@mrko.pe.kr 1002 Full Copyright Statement 1004 Copyright (C) The IETF Trust (2007). 1006 This document is subject to the rights, licenses and restrictions 1007 contained in BCP 78, and except as set forth therein, the authors 1008 retain all their rights. 1010 This document and the information contained herein are provided on an 1011 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1012 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1013 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1014 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1015 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1016 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1018 Intellectual Property 1020 The IETF takes no position regarding the validity or scope of any 1021 Intellectual Property Rights or other rights that might be claimed to 1022 pertain to the implementation or use of the technology described in 1023 this document or the extent to which any license under such rights 1024 might or might not be available; nor does it represent that it has 1025 made any independent effort to identify any such rights. Information 1026 on the procedures with respect to rights in RFC documents can be 1027 found in BCP 78 and BCP 79. 1029 Copies of IPR disclosures made to the IETF Secretariat and any 1030 assurances of licenses to be made available, or the result of an 1031 attempt made to obtain a general license or permission for the use of 1032 such proprietary rights by implementers or users of this 1033 specification can be obtained from the IETF on-line IPR repository at 1034 http://www.ietf.org/ipr. 1036 The IETF invites any interested party to bring to its attention any 1037 copyrights, patents or patent applications, or other proprietary 1038 rights that may cover technology that may be required to implement 1039 this standard. Please address the information to the IETF at 1040 ietf-ipr@ietf.org. 1042 Acknowledgment 1044 Funding for the RFC Editor function is provided by the IETF 1045 Administrative Support Activity (IASA).