idnits 2.17.1 draft-ietf-idn-mua-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 374 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 337 has weird spacing: '...rds for use i...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 5, 2001) is 8294 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR17' -- Possible downref: Non-RFC (?) normative reference: ref. 'US-ASCII' -- Possible downref: Normative reference to a draft: ref. 'IDNCOMP' ** Obsolete normative reference: RFC 821 (Obsoleted by RFC 2821) ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 1652 (Obsoleted by RFC 6152) Summary: 7 errors (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Maynard Kang 2 draft-ietf-idn-mua-00.txt i-EMAIL.net 3 February 5, 2001 4 Expires on August 5, 2001 6 Internationalizing Domain Names in Mail User Agents 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other 15 groups may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference material 20 or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Abstract 30 This document describes a way where domain names used in Internet e-mail 31 can be internationalized by making changes only to end-user Mail User 32 Agents and, by doing so, avoid damaging other applications which handle 33 Internet e-mail, such as Message Transfer Agents and Delivery Agents. 35 1. Introduction 37 One of the proposed solutions for internationalized domain names (IDN) 38 involves only updating the user applications with no changes required 39 to the DNS protocol, servers and resolvers [IDNA] compared to other 40 solutions which require changes to be made to protocol, servers, 41 resolvers and applications. 43 The underlying principle of [IDNA] may be similarly applied to the 44 Internet e-mail system today - by effecting changes to only the Mail 45 User Agent (MUA) component of the e-mail system. Thus, existing 46 Message Transfer Agents, Delivery Agents and other applications which 47 handle e-mail do not have to be changed at all. 49 1.1 Definitions and Conventions 51 Usage of terms related to the character encoding model are in 52 reference to Unicode Technical Report 17 [UTR17]. 54 The terms "international character", "non-ASCII character" and 55 "multilingual character", which are used interchangeably, are taken 56 to mean any abstract character which is not included in the range 57 specified by [US-ASCII]. 59 1.2 Terminology 61 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", 62 and "MAY" in this document are to be interpreted as described in RFC 63 2119 [RFC2119]. 65 1.3. Design Philosophy 67 As the Internet e-mail system is a diverse, distributed and 68 heterogeneous system with many vendors deploying a vast number of 69 applications, it is of utmost importance that interoperability amongst 70 these various components is maintained. Thus, the ideal solution would 71 be one which does not compromise or damage the operation of any of these 72 existing components once internationalized domain names are encountered. 74 Also, solutions which call for changes to be made to many or even all 75 components of the Internet e-mail system would require far too much 76 time and effort to deploy, given that Internet e-mail has such a huge 77 installed base. 79 This solution adheres to both of the above principles, in that 80 interoperability is preserved and that the cost and speed of 81 implementation is low. All that the user has to do to use IDNs in e-mail 82 is update his or her MUA. 84 1.4. IDN Summary 86 This solution specifies an IDN architecture of arch-3 (just send ACE) 87 and a transition strategy of trans-1 (always do current plus new 88 architecture) as described in [IDNCOMP]. The choice of ACE format is not 89 defined in this document, but MUST be the same as that specified in 90 [IDNA] in order to maintain uniqueness and consistency. 92 1.5. E-mail Internationalization Summary 94 As many Internet e-mail standards such as the SMTP protocol [RFC821] 95 and the e-mail message format [RFC822] only specify usage of the 7-bit 96 ASCII character set [US-ASCII], international characters which use octet- 97 based character encoding schemes (CES) cannot be used in e-mail 98 transmission, headers and bodies. 100 Although this issue has been addressed in [RFC2045] for message bodies 101 and [RFC2047] for message headers through the use of a Transfer Encoding 102 Syntax (TES) such as Quoted-Printable or Base64, there is no similar 103 solution which extends the functionality of [RFC821] to include usage of 104 international characters, except for [RFC1652] which allows transmission 105 of 8-bit data passed by the DATA command in an SMTP session. 107 [RFC1652] however, does not fully address the problem of using IDNs in 108 an SMTP session - the IDN may be used in areas within the SMTP session 109 other than the DATA command, such as the MAIL FROM and RCPT TO commands, 110 where an IDN may be part of the e-mail address(es) specified there. 112 Hence, this would be a major stumbling block to deploying "just-send- 113 8bit" IDNs for use in Internet e-mail, as these IDNs would not be able 114 to be used in SMTP e-mail transmissions due to [RFC821] restrictions. 116 2. Architectural Overview 118 The end-user MUA may encounter IDNs in the scenarios below: 120 (i) When specifying the transmission server (i.e. SMTP server) 121 (ii) When specifying the retrieval server (i.e. POP3/IMAP4/any other 122 retrieval mechanism) 123 (iii) When specifying e-mail addresses during composition of a message 124 (iv) When reading messages with e-mail addresses in it 126 As with [IDNA], the MUA is updated in a similar fashion to process IDNs 127 which are input by users and process IDNs which are displayed to users, 128 in all of the scenarios above. 130 For (i) and (ii), the IDN MUST be handled in the same manner as 131 specified in [IDNA]. The method of handling an IDN For (iii) and (iv) is 132 described below in 2.1. 134 2.1 Interfaces between E-mail components when composing/reading a mail 136 The interfaces between e-mail components can be pictorially represented 137 as shown below. 139 The example assumes the setup of a POP3/IMAP4 retrieval client and 140 server, but the exact nature of end-to-end e-mail transmission may vary 141 accordingly (e.g. elm or pine would read directly from the mail store). 142 However, these variations do not impact an accurate description of this 143 solution to a large extent as no changes are required at these levels. 145 +------+ +------+ 146 | User | | User | 147 +------+ +---^--| 148 | User Input: User Display: Characters/ | 149 | Keyboard/Pen/etc Glyphs on CRT or other | 150 +-----v---------------+ Representation (e.g. sound) | 151 | Input Method Editor | +------------|-----+ 152 +---------------------+ | Rendering Engine | 153 | Input: Any localized/ +---------^--------+ 154 | internationalized Output: Any localized/ | 155 | charset internationalized | 156 +----v-----------------+ charset | 157 | +------------------+ | +----------|-------------+ 158 | | Mail Composition | | | +--------------+ | 159 | | Interface | | Sender's | | Mail Reading | | 160 | +------------------+ | MUA | | Interface | | 161 | | | | +--------^-----+ | 162 | | Nameprepped ACE | Receiver's | | Nameprepped | 163 | v | MUA | | ACE | 164 | +-------------+ | | +-------------------+ | 165 | | SMTP Client | | | | POP3/IMAP4 Client | | 166 | +-------------+ | | +-------------------+ | 167 +----|-----------------+ +----------^-------------+ 168 | Nameprepped | Nameprepped 169 v ACE Nameprepped Nameprepped | ACE 170 +-------------+ ACE +------------+ ACE +-------------------+ 171 | SMTP Server | -----> | Mail Store | -----> | POP3/IMAP4 Server | 172 +-------------+ +------------+ +-------------------+ 174 2.1.1 Interface between User and Input Method Editor 176 For ASCII characters, input is straightforward: the user types on the 177 keyboard and whichever character that is pressed is sent to the 178 application. 180 However, for international characters, the end-user has to use a script- 181 specific Input Method Editor (IME), which may or may not be built-into 182 the OS, to interpret what the user communicates to the system and 183 thereafter send the respective international characters to the 184 application. 186 For example, for input of Chinese characters, some users use IMEs 187 which support the "Pinyin" input method. When a user types "zhongguo" 188 (in ASCII characters) on the keyboard and selects the characters which 189 represent "China" (in Chinese) from a list, the IME sends the 190 international characters to the application in a user-determined 191 charset (e.g. GB2312). 193 2.1.2 Interface between Input Method Editor and MUA Composition 194 Interface 196 The MUA mail composition interface (i.e. the "Compose Message" 197 function of the MUA) SHOULD be able to accept IDNs using 8-bit character 198 encoding schemes, including those represented in any localized (e.g. 199 GB2312) or internationalized (e.g. UTF-8) charsets. 201 This input typically takes place where e-mail addresses are entered 202 such as the "From", "To", "Cc", "Bcc" fields, amongst others, as IDNs 203 may be used at the right-hand-side of the "@" sign in an e-mail address 204 (domain-parts). 206 The mail composition interface MAY allow ACE input for the same 207 reasons as specified in [IDNA], but is not recommended as ACE is opaque 208 and ugly. 210 2.1.3 Interface between MUA Composition Interface and SMTP Client 212 The MUA composition interface communicates with the SMTP client in the 213 MUA typically through internal function calls within the software itself 214 or through an API. It is at this level where ACE conversion of any IDN 215 encountered by the MUA composition interface takes place. 217 Before converting the name parts of the IDN into ACE, the MUA MUST 218 prepare each name part as specified in [NAMEPREP]. Thereafter, the MUA 219 MUST convert the name parts into ACE before passing any data to the SMTP 220 client. 222 The SMTP client then prepares the e-mail for transmission using the 223 SMTP protocol [RFC821], and thereafter establishes an SMTP connection 224 with the user-specified SMTP server to transmit the e-mail. 226 It is important to note that an IDN specified in the parameters of any 227 SMTP command MUST be represented in nameprepped ACE at this point in 228 time. This includes SMTP commands which require domain parameters (such 229 as the HELO and EHLO commands) and commands where e-mail addresses are 230 specified (such as the MAIL FROM, RCPT TO, DATA, VRFY, EXPN, SEND, SOML 231 and SAML commands). 233 As for data passed by the DATA command, ACE conversion MUST be 234 performed when the "domain" portion of an "addr-spec" or when a "domain" 235 itself, within the context of [RFC822], is encountered. This is 236 necessary as an updated MUA may originate a message which is read by a 237 non-updated MUA. If this happens, the non-updated MUA may face 238 operational problems dealing with IDNs that appear in the "addr-spec" 239 which are not in ACE. 241 Any transfer encoding syntax to be applied to the mail headers as 242 specified in [RFC2047] SHOULD be performed before nameprepped ACE 243 conversion. This is to reduce confusion between IDNs within "addr-spec" 244 and "domain" portions, in the context of [RFC822], and IDNs which appear 245 as arbitrary data in mail headers and bodies. 247 2.1.4. Interface between POP3/IMAP4 client (or local mail store) and 248 Mail Reading Interface 250 The MUA mail reading interface (i.e. "Read mail" function of an MUA) 251 typically displays e-mail data retrieved from either a POP3/IMAP4 252 client or from a local mail store through internal function calls within 253 the MUA software or through an API. 255 When e-mail containing an ACE-represented IDN is to be displayed, the 256 MUA SHOULD convert the ACE-represented IDN contained within the 257 "addr-spec" or "domain" portion specified in [RFC822] back into any 258 localized or internationalized charset of the user's choice, whenever 259 possible. In the event that it is impossible to achieve conversion back 260 into the selected localized charset (for example, conversion of RACE- 261 represented Hangeul characters into ISO-8859-1 is impossible), the MUA 262 should prompt the user with an error message. 264 It may be possible to save and retrieve information about the original 265 charset of the ACE-converted IDN through the use of additional 266 [RFC822] mail headers, but that is not (yet) addressed by this memo. 268 Although it is possible to render ACE into properly decoded glyphs and 269 display the actual abstract characters without any conversion to other 270 charsets, the MUA SHOULD NOT do this as it is not the primary function 271 of an MUA to render characters. This should be left to a rendering 272 engine which is separate from the MUA and typically embedded into the 273 OS. It is sufficient for the MUA to pass the appropriate charset to the 274 rendering engine for proper display. 276 3. ACE Length Considerations 278 As [RFC821] in Section 4.5.3 restricts the maximum total length of a 279 domain name to 64 characters, representation of IDNs using ACE may 280 pose a potential problem. Most ACEs typically require 3-4 ASCII 281 characters to represent one international character (especially in the 282 case of CJK characters, where compression is less effective). 284 That would leave only about 16-24 characters for the whole IDN, 285 including all name parts and dots. This is highly undesirable as some 286 languages such as Arabic are unable to be abbreviated and the domain 287 names may require a larger length than that which is allowed by 288 [RFC821]. 290 To further complicate matters, several mailing list software such as 291 ezmlm embed domain names into the local-parts portion of an e-mail 292 address during management of subscriptions, together with randomly- 293 generated subscription information. This would leave an even smaller 294 maximum ACE length, if interoperability with these mailing list software 295 were to be maintained, given that there is also a 64 character 296 restriction on local parts. 298 4. Security Considerations 300 As this memo is based on [IDNA], security considerations are similar 301 to that faced by [IDNA]. This includes security considerations from 302 [NAMEPREP] as well. 304 5. Other Considerations 306 Although this document addresses end-user MUAs (e.g. elm, mutt, pine, 307 Eudora, Outlook Express, etc) to a large extent, the definition of an 308 MUA could be extended to include web-based e-mail server software and 309 automated programs such as mailing list management software. 311 End-user MUAs may also include additional functionality where IDNs may 312 be encountered, such as calendaring/scheduling, directory services and 313 digital certificate storage. This is not (yet) addressed in this memo. 315 6. Future Extensions 317 It is possible to achieve internationalization of the entire e-mail 318 address by representation of international characters in the local-parts 319 of an "addr-spec" using nameprepped ACE conversion in a similar fashion 320 as described in this memo. 322 However, this is a different problem altogether and is currently beyond 323 the scope of this memo. 325 7. References 327 [IDNA] Paul Hoffman & Patrik Faltstrom, "Internationalizing Host Names 328 in Applications (IDNA)", draft-ietf-idn-idna. 330 [UTR17] K. Whistler & M. Davis, Unicode Consortium, "Character Encoding 331 Model", Unicode Technical Report #17, 332 http://www.unicode.org/unicode/reports/tr17/ 334 [US-ASCII] United States of America Standards Institute, "USA Code for 335 Information Interchange", X3.4, 1968. 337 [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate 338 Requirement Levels", March 1997, RFC 2119. 340 [IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name 341 Proposals", draft-ietf-idn-compare. 343 [RFC821] Jonathan B. Postel, "Simple Mail Transfer Protocol", August 344 1982, RFC 821. 346 [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet 347 Text Messages", August 1982, RFC 822. 349 [RFC2045] N. Freed & N. Borenstein, "Multipurpose Internet Mail 350 Extensions (MIME) Part One: Format of Internet Message Bodies", 351 November 1996, RFC 2045. 353 [RFC2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions) 354 Part Three: Message Header Extensions for Non-ASCII Text", November 355 1996, RFC 2047. 357 [RFC1652] J. Klensin et al., "SMTP Service Extension for 8bit- 358 MIMEtransport", July 1994, RFC 1652. 360 [NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of 361 Internationalized Host Names", draft-ietf-idn-nameprep. 363 A. Author's Address 365 Maynard Kang 366 i-EMAIL.net Pte Ltd 367 1 Kim Seng Promenade #12-07 368 Great World City West Tower 369 Singapore 237994 370 E-mail: maynard@i-email.net