Internet Draft Maynard Kang draft-ietf-idn-mua-00.txt i-EMAIL.net February 5, 2001 Expires on August 5, 2001 Internationalizing Domain Names in Mail User Agents Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes a way where domain names used in Internet e-mail can be internationalized by making changes only to end-user Mail User Agents and, by doing so, avoid damaging other applications which handle Internet e-mail, such as Message Transfer Agents and Delivery Agents. 1. Introduction One of the proposed solutions for internationalized domain names (IDN) involves only updating the user applications with no changes required to the DNS protocol, servers and resolvers [IDNA] compared to other solutions which require changes to be made to protocol, servers, resolvers and applications. The underlying principle of [IDNA] may be similarly applied to the Internet e-mail system today - by effecting changes to only the Mail User Agent (MUA) component of the e-mail system. Thus, existing Message Transfer Agents, Delivery Agents and other applications which handle e-mail do not have to be changed at all. 1.1 Definitions and Conventions Usage of terms related to the character encoding model are in reference to Unicode Technical Report 17 [UTR17]. The terms "international character", "non-ASCII character" and "multilingual character", which are used interchangeably, are taken to mean any abstract character which is not included in the range specified by [US-ASCII]. 1.2 Terminology The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.3. Design Philosophy As the Internet e-mail system is a diverse, distributed and heterogeneous system with many vendors deploying a vast number of applications, it is of utmost importance that interoperability amongst these various components is maintained. Thus, the ideal solution would be one which does not compromise or damage the operation of any of these existing components once internationalized domain names are encountered. Also, solutions which call for changes to be made to many or even all components of the Internet e-mail system would require far too much time and effort to deploy, given that Internet e-mail has such a huge installed base. This solution adheres to both of the above principles, in that interoperability is preserved and that the cost and speed of implementation is low. All that the user has to do to use IDNs in e-mail is update his or her MUA. 1.4. IDN Summary This solution specifies an IDN architecture of arch-3 (just send ACE) and a transition strategy of trans-1 (always do current plus new architecture) as described in [IDNCOMP]. The choice of ACE format is not defined in this document, but MUST be the same as that specified in [IDNA] in order to maintain uniqueness and consistency. 1.5. E-mail Internationalization Summary As many Internet e-mail standards such as the SMTP protocol [RFC821] and the e-mail message format [RFC822] only specify usage of the 7-bit ASCII character set [US-ASCII], international characters which use octet- based character encoding schemes (CES) cannot be used in e-mail transmission, headers and bodies. Although this issue has been addressed in [RFC2045] for message bodies and [RFC2047] for message headers through the use of a Transfer Encoding Syntax (TES) such as Quoted-Printable or Base64, there is no similar solution which extends the functionality of [RFC821] to include usage of international characters, except for [RFC1652] which allows transmission of 8-bit data passed by the DATA command in an SMTP session. [RFC1652] however, does not fully address the problem of using IDNs in an SMTP session - the IDN may be used in areas within the SMTP session other than the DATA command, such as the MAIL FROM and RCPT TO commands, where an IDN may be part of the e-mail address(es) specified there. Hence, this would be a major stumbling block to deploying "just-send- 8bit" IDNs for use in Internet e-mail, as these IDNs would not be able to be used in SMTP e-mail transmissions due to [RFC821] restrictions. 2. Architectural Overview The end-user MUA may encounter IDNs in the scenarios below: (i) When specifying the transmission server (i.e. SMTP server) (ii) When specifying the retrieval server (i.e. POP3/IMAP4/any other retrieval mechanism) (iii) When specifying e-mail addresses during composition of a message (iv) When reading messages with e-mail addresses in it As with [IDNA], the MUA is updated in a similar fashion to process IDNs which are input by users and process IDNs which are displayed to users, in all of the scenarios above. For (i) and (ii), the IDN MUST be handled in the same manner as specified in [IDNA]. The method of handling an IDN For (iii) and (iv) is described below in 2.1. 2.1 Interfaces between E-mail components when composing/reading a mail The interfaces between e-mail components can be pictorially represented as shown below. The example assumes the setup of a POP3/IMAP4 retrieval client and server, but the exact nature of end-to-end e-mail transmission may vary accordingly (e.g. elm or pine would read directly from the mail store). However, these variations do not impact an accurate description of this solution to a large extent as no changes are required at these levels. +------+ +------+ | User | | User | +------+ +---^--| | User Input: User Display: Characters/ | | Keyboard/Pen/etc Glyphs on CRT or other | +-----v---------------+ Representation (e.g. sound) | | Input Method Editor | +------------|-----+ +---------------------+ | Rendering Engine | | Input: Any localized/ +---------^--------+ | internationalized Output: Any localized/ | | charset internationalized | +----v-----------------+ charset | | +------------------+ | +----------|-------------+ | | Mail Composition | | | +--------------+ | | | Interface | | Sender's | | Mail Reading | | | +------------------+ | MUA | | Interface | | | | | | +--------^-----+ | | | Nameprepped ACE | Receiver's | | Nameprepped | | v | MUA | | ACE | | +-------------+ | | +-------------------+ | | | SMTP Client | | | | POP3/IMAP4 Client | | | +-------------+ | | +-------------------+ | +----|-----------------+ +----------^-------------+ | Nameprepped | Nameprepped v ACE Nameprepped Nameprepped | ACE +-------------+ ACE +------------+ ACE +-------------------+ | SMTP Server | -----> | Mail Store | -----> | POP3/IMAP4 Server | +-------------+ +------------+ +-------------------+ 2.1.1 Interface between User and Input Method Editor For ASCII characters, input is straightforward: the user types on the keyboard and whichever character that is pressed is sent to the application. However, for international characters, the end-user has to use a script- specific Input Method Editor (IME), which may or may not be built-into the OS, to interpret what the user communicates to the system and thereafter send the respective international characters to the application. For example, for input of Chinese characters, some users use IMEs which support the "Pinyin" input method. When a user types "zhongguo" (in ASCII characters) on the keyboard and selects the characters which represent "China" (in Chinese) from a list, the IME sends the international characters to the application in a user-determined charset (e.g. GB2312). 2.1.2 Interface between Input Method Editor and MUA Composition Interface The MUA mail composition interface (i.e. the "Compose Message" function of the MUA) SHOULD be able to accept IDNs using 8-bit character encoding schemes, including those represented in any localized (e.g. GB2312) or internationalized (e.g. UTF-8) charsets. This input typically takes place where e-mail addresses are entered such as the "From", "To", "Cc", "Bcc" fields, amongst others, as IDNs may be used at the right-hand-side of the "@" sign in an e-mail address (domain-parts). The mail composition interface MAY allow ACE input for the same reasons as specified in [IDNA], but is not recommended as ACE is opaque and ugly. 2.1.3 Interface between MUA Composition Interface and SMTP Client The MUA composition interface communicates with the SMTP client in the MUA typically through internal function calls within the software itself or through an API. It is at this level where ACE conversion of any IDN encountered by the MUA composition interface takes place. Before converting the name parts of the IDN into ACE, the MUA MUST prepare each name part as specified in [NAMEPREP]. Thereafter, the MUA MUST convert the name parts into ACE before passing any data to the SMTP client. The SMTP client then prepares the e-mail for transmission using the SMTP protocol [RFC821], and thereafter establishes an SMTP connection with the user-specified SMTP server to transmit the e-mail. It is important to note that an IDN specified in the parameters of any SMTP command MUST be represented in nameprepped ACE at this point in time. This includes SMTP commands which require domain parameters (such as the HELO and EHLO commands) and commands where e-mail addresses are specified (such as the MAIL FROM, RCPT TO, DATA, VRFY, EXPN, SEND, SOML and SAML commands). As for data passed by the DATA command, ACE conversion MUST be performed when the "domain" portion of an "addr-spec" or when a "domain" itself, within the context of [RFC822], is encountered. This is necessary as an updated MUA may originate a message which is read by a non-updated MUA. If this happens, the non-updated MUA may face operational problems dealing with IDNs that appear in the "addr-spec" which are not in ACE. Any transfer encoding syntax to be applied to the mail headers as specified in [RFC2047] SHOULD be performed before nameprepped ACE conversion. This is to reduce confusion between IDNs within "addr-spec" and "domain" portions, in the context of [RFC822], and IDNs which appear as arbitrary data in mail headers and bodies. 2.1.4. Interface between POP3/IMAP4 client (or local mail store) and Mail Reading Interface The MUA mail reading interface (i.e. "Read mail" function of an MUA) typically displays e-mail data retrieved from either a POP3/IMAP4 client or from a local mail store through internal function calls within the MUA software or through an API. When e-mail containing an ACE-represented IDN is to be displayed, the MUA SHOULD convert the ACE-represented IDN contained within the "addr-spec" or "domain" portion specified in [RFC822] back into any localized or internationalized charset of the user's choice, whenever possible. In the event that it is impossible to achieve conversion back into the selected localized charset (for example, conversion of RACE- represented Hangeul characters into ISO-8859-1 is impossible), the MUA should prompt the user with an error message. It may be possible to save and retrieve information about the original charset of the ACE-converted IDN through the use of additional [RFC822] mail headers, but that is not (yet) addressed by this memo. Although it is possible to render ACE into properly decoded glyphs and display the actual abstract characters without any conversion to other charsets, the MUA SHOULD NOT do this as it is not the primary function of an MUA to render characters. This should be left to a rendering engine which is separate from the MUA and typically embedded into the OS. It is sufficient for the MUA to pass the appropriate charset to the rendering engine for proper display. 3. ACE Length Considerations As [RFC821] in Section 4.5.3 restricts the maximum total length of a domain name to 64 characters, representation of IDNs using ACE may pose a potential problem. Most ACEs typically require 3-4 ASCII characters to represent one international character (especially in the case of CJK characters, where compression is less effective). That would leave only about 16-24 characters for the whole IDN, including all name parts and dots. This is highly undesirable as some languages such as Arabic are unable to be abbreviated and the domain names may require a larger length than that which is allowed by [RFC821]. To further complicate matters, several mailing list software such as ezmlm embed domain names into the local-parts portion of an e-mail address during management of subscriptions, together with randomly- generated subscription information. This would leave an even smaller maximum ACE length, if interoperability with these mailing list software were to be maintained, given that there is also a 64 character restriction on local parts. 4. Security Considerations As this memo is based on [IDNA], security considerations are similar to that faced by [IDNA]. This includes security considerations from [NAMEPREP] as well. 5. Other Considerations Although this document addresses end-user MUAs (e.g. elm, mutt, pine, Eudora, Outlook Express, etc) to a large extent, the definition of an MUA could be extended to include web-based e-mail server software and automated programs such as mailing list management software. End-user MUAs may also include additional functionality where IDNs may be encountered, such as calendaring/scheduling, directory services and digital certificate storage. This is not (yet) addressed in this memo. 6. Future Extensions It is possible to achieve internationalization of the entire e-mail address by representation of international characters in the local-parts of an "addr-spec" using nameprepped ACE conversion in a similar fashion as described in this memo. However, this is a different problem altogether and is currently beyond the scope of this memo. 7. References [IDNA] Paul Hoffman & Patrik Faltstrom, "Internationalizing Host Names in Applications (IDNA)", draft-ietf-idn-idna. [UTR17] K. Whistler & M. Davis, Unicode Consortium, "Character Encoding Model", Unicode Technical Report #17, http://www.unicode.org/unicode/reports/tr17/ [US-ASCII] United States of America Standards Institute, "USA Code for Information Interchange", X3.4, 1968. [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate Requirement Levels", March 1997, RFC 2119. [IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name Proposals", draft-ietf-idn-compare. [RFC821] Jonathan B. Postel, "Simple Mail Transfer Protocol", August 1982, RFC 821. [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet Text Messages", August 1982, RFC 822. [RFC2045] N. Freed & N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", November 1996, RFC 2045. [RFC2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", November 1996, RFC 2047. [RFC1652] J. Klensin et al., "SMTP Service Extension for 8bit- MIMEtransport", July 1994, RFC 1652. [NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of Internationalized Host Names", draft-ietf-idn-nameprep. A. Author's Address Maynard Kang i-EMAIL.net Pte Ltd 1 Kim Seng Promenade #12-07 Great World City West Tower Singapore 237994 E-mail: maynard@i-email.net