idnits 2.17.1 draft-hoffman-utf8headers-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 401 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 6 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 15, 2003) is 7410 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3490 (ref. 'IDNA') (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 2822 (ref. 'MSGFMT') (Obsoleted by RFC 5322) ** Obsolete normative reference: RFC 2821 (ref. 'SMTP') (Obsoleted by RFC 5321) Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Paul Hoffman 2 draft-hoffman-utf8headers-00.txt Internet Mail Consortium 3 December 15, 2003 4 Expires in six months 6 SMTP Service Extensions or Transmission of Headers 7 in UTF-8 Encoding 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note 16 that other groups may also distribute working documents as 17 Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other documents 21 at any time. It is inappropriate to use Internet-Drafts as 22 reference material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Abstract 32 Mailbox names often represent the names of human users. Many of these 33 users throughout the world have names that are not normally represented 34 by the users with just the ASCII repertoire of characters, and would therefore 35 like to use their real names in their mailbox names. These users 36 are also likely to use non-ASCII text in their common names and subjects 37 of email messages, both in what they send and what they receive. 38 This protocol specifies how to represent all headers 39 of email messages encoded in UTF-8. 41 1. Introduction 43 The format of email messages [MSGFMT] only allows ASCII characters in the 44 headers of messages. This prevents users from having email addresses 45 that contain non-ASCII characters. It further forces non-ASCII text in 46 common names, comments, and in free text (such as in the Subject: field) 47 to be in quoted-printable format [MIME3]. This specification describes a 48 change to the email message format, and to SMTP message transport, that 49 allows non-ASCII characters throughout email headers. These changes 50 affect SMTP clients, SMTP servers, and mail user agents (MUAs). 52 In this specification, the SMTP protocol [SMTP] is used to prevent the 53 transmission of messages with UTF-8 [UTF8] headers to systems that 54 cannot handle such messages. The new SMTP extension has the name 55 "UTF-8-HEADERS". 57 Using this new SMTP extension prevents the introduction of such 58 messages in message stores that might misrepresent or mangle such 59 messages. It should be noted that using an ESMTP extension does not 60 prevent transferring email messages with UTF-8 headers to other systems 61 that use the email format for messages, such as in the POP and IMAP 62 protocols. Those protocols will need to be changed in order to handle 63 messages in message stores that have UTF-8 headers. 65 The dual motivations of this protocol are to allow UTF-8 everywhere in 66 the headers and to not bounce any messages just because they originated 67 with UTF-8 headers. Using this protocol, messages that originated with 68 UTF-8 headers will only be bounced if an enabled SMTP client is speaking 69 to an unenabled SMTP server and some of the UTF-8 headers cannot be 70 downgraded to all-ASCII headers. This protocol describes how to 71 downgrade all headers from UTF-8 to all-ASCII, but does not guarantee 72 that such downgrading will always be successful. 74 Further, this protocol allows current users who have all-ASCII mailbox 75 names to step up to UTF-8 headers easily. This means that users of this 76 protocol should normally be able to communicate with other users of this 77 protocol and with users who have not yet updated. 79 This protocol does not require the sender or recipient of mail to have 80 mailbox names that do not include non-ASCII characters. For example, the 81 protocol might still be used if just the subject header has non-ASCII 82 characters, and the protocol must be used if other headers (particularly 83 Received headers) contain non-ASCII characters. 85 1.1 Terminology 87 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and 88 "MAY" in this document are to be interpreted as described in RFC 2119 89 [KEYWORDS]. 91 Unless otherwise noted, all terms used here are defined in RFC 2821 and 92 RFC 2822. 94 In this document, an address is "all-ASCII" if every character in the 95 address is in the ASCII character repertoire [ASCII]; an address is 96 "non-ASCII" if any character is not in the ASCII character repertoire. 97 Similarly, a header body is "all-ASCII" if every character in the body 98 of the header is in the ASCII character repertoire; a header body is 99 "non-ASCII" if any character is not in the ASCII character repertoire. 101 This document is being discussed on the ietf-imaa mailing list. See 102 for information about subscribing and 103 the list's archive. 105 2. Changes to MUAs and to the user's mail environment 107 For this protocol to work well (that is, for it not to bounce mail 108 excessively when an enabled system encounters a non-enabled system), any 109 mail sender who has non-ASCII characters in the 110 addr-spec of their mailbox name SHOULD 111 have a second mailbox whose addr-spec contains only ASCII characters. This 112 second mailbox is used when a recipient of a message is not using this 113 protocol; this is the "fallback address" for the sender. 115 Having two mailboxes is not an absolute requirement because some mail 116 systems will not allow a user to be able to get mail from two addresses 117 (the non-ASCII and all-ASCII addresses). If a user does have two 118 mailboxes, they SHOULD both be on the same mail server (that is, they 119 should both have the same host name in the user's address). 121 Having two mailboxes can lead to confusion for users if the MUA does not 122 handle them well. MUAs that follow this specification SHOULD have 123 options that would make it seem like two mailboxes are one. For example, 124 if a user says "read my mail", the MUA SHOULD read from both the mailbox 125 with the non-ASCII name and the mailbox with the all-ASCII name. Note 126 that this feature might not be necessary: a terminating SMTP server 127 might have combined all incoming mail for both addresses into a single 128 mailbox. However, MUAs SHOULD NOT assume that combining by the SMTP 129 server will always be the case. 131 2.1 Changes to MUA administrative interfaces 133 The administrative interface for MUAs that use this protocol MUST have 134 method for a user to specify the name of their mailbox that contains 135 non-ASCII characters, and MUST have a method for the user to specify the 136 name of their mailbox that contains non-ASCII characters. 138 The MUA user interface SHOULD also allow users to specify the common 139 name associated with the non-ASCII mailbox using non-ASCII 140 characters; this common name MUST be encoded as UTF-8. The common name 141 associated with the all-ASCII mailbox MUST only contain ASCII 142 characters, although it can use a quoted-printable format to represent a 143 different encoding; this encoding SHOULD be UTF-8. 145 MUAs are encouraged to cache address mappings that are specified 146 in incoming mail. Given that mappings might change over time, 147 these MUAs might over-write existing mappings with new ones, 148 and might give the user a choice for the time-to-live for the 149 cached mapping. 151 2.2 Address-map headers 153 For every address in a message with a non-ASCII local-part, the mail 154 initiator SHOULD create a mapping in a new header, called 155 "Address-map:". A message SHOULD have one Address-map: header for every 156 non-ASCII address for which the sender knows a map. The header is only 157 for addresses that have a non-ASCII local-part in its addr-spec. It MUST 158 NOT be used for addresses that have all-ASCII addr-specs, even if those 159 addresses have UTF-8 domain names, and it MUST NOT be used if the 160 local-part of the addr-spec is all-ASCII but the display-name or the 161 comment is non-ASCII. 163 If the sender has an all-ASCII local-part associated with its non-ASCII 164 mailbox, the sender's MUA MUST create an Address-map header for that 165 association. If the sender knows (such as through caching incoming 166 address maps or from an address book) the mapping for any recipient that 167 has a non-ASCII mailbox name, the sending MUA SHOULD create an 168 Address-map header for it. 170 Both addresses in the Address-map header are full addr-specs. The body 171 of the Address-map header only contains addr-specs, never display-names 172 or comments. The format of the Address-map header is: 174 Address-map: , 176 The encoding for address-with-non-ASCII-LHS MUST be UTF-8; the encoding 177 for downgrade-address MUST be ASCII. If the domain name in an 178 internationalized domain name [IDNA], then it MUST be encoded in UTF-8 179 in the address-with-non-ASCII-LHS and MUST be encoded using IDNA in the 180 downgrade-address. 182 Examples: 184 Address-map: Jos@example.com,jose@example.com 186 Address-map: bjn@rksmrgs.se, 187 bjorn-ascii@rksmrgs-5wao1o.se 189 Note that when receiving mail, the Address-map headers may be all in ASCII. 190 This would be due to an intervening SMTP server or other agent downgrading 191 the map. All-ASCII Address-map headers MUST be accepted. 193 2.3 Changes to MUA sending 195 Sending MUAs that follow this protocol MUST create all headers encoded 196 in UTF-8. No other direct encodings are allowed. MUAs MAY continue to 197 use quoted-printable text to specify some text in other encodings; 198 however this is not recommended because it is likely that this will not 199 interoperate well with MUAs that follow this specification. 201 3. Changes to SMTP 203 This protocol defines a new SMTP extension, UTF-8-HEADERS. (The formal 204 definition is in the IANA Considerations section.) 206 3.1 UTF-8-HEADERS extension 208 If an SMTP server advertises the UTF-8-HEADERS extentension, an 209 SMTP client that supports this protocol SHOULD send message headers 210 as described in this document. 212 The terminal SMTP server is responsible for knowing whether or not the 213 message store can handle UTF-8 headers. A terminal SMTP server MUST NOT 214 advertise the UTF-8-HEADERS extension if the message store for which it 215 is responsible cannot 216 handle UTF-8 headers. 218 If an SMTP client does not see the UTF-8-HEADERS extension advertised 219 by an SMTP server, the SMTP client MUST downgrade the 220 non-ASCII contents of all header bodies before continuing to send 221 the message. The SMTP client SHOULD send the message with the downgraded 222 header bodies as a normal message. 223 If any header body cannot be downgraded, the SMTP client 224 MUST bounce the message with an error code of 558. 226 All UTF-8 headers bodies can be downgraded to being all-ASCII. 227 However, any header body that contains a non-ASCII mailbox name might 228 not be able to be downgraded if there is no Address-map header that 229 gives a mapping for the downgrading. 231 3.2 Downgrading header bodies 233 This section defines how to downgrade header bodies. Note that 234 downgrading MUST only be done if necessary. That is, downgrading 235 MUST never be done on fields or bodies that are all-ASCII. 237 3.2.1 Mailboxes 239 Mailboxes appear in many standard headers, such as To:, From:, Sender:, 240 Reply-to:, Cc:, Bcc:, Received:, and some of the Resent-: headers. 241 Downgrading mailboxes is done as follows: 243 1) If necessary, convert the domain using IDNA. 245 2) If necessary,convert the local-parts using values from an 246 Address-map: header in the message 248 3) If necessary,convert any display-name or comment using 249 quoted-printable with UTF-8 encoding 251 3.3.2 Message-ids 253 Downgrading message-ids is done as follows 255 1) If necessary,convert the id-left using Base64 257 2) If necessary,convert the id-right using Base64 259 3.3.3 Informational headers 261 If necessary, downgrading the bodies of informational headers (Subject:, 262 Comments:, and Keywords:) is done using quoted-printable with UTF-8 263 encoding. 265 3.3.4 Address-map headers 267 If necessary, the Address-map: header is downgraded using Base64 for 268 local-parts, and IDNA for domain names. 270 For example: 272 Address-map: Jos@example.com,jose@example.com 274 would be downgraded to: 276 Address-map: Sm9zw6k=@example.com,jose@example.com 278 As another example: 280 Address-map: bjn@rksmrgs.se, 281 bjorn-ascii@rksmrgs-5wao1o.se 283 would be downgraded to: 285 Address-map: YmrDtnJu@rksmrgs-5wao1o.se, 286 bjorn-ascii@rksmrgs-5wao1o.se 288 3.3 Things not changed from RFC 2822 290 Note that this protocol does change the definition of header field 291 names. That is, only the bodies of headers are allowed to have non-ASCII 292 characters; the rules in RFC 2822 for header names are not changed. 294 Similarly, this protocol does not change the date and time specification 295 in RFC 2822. 297 3.4 Additional processing rules 299 In order to make mail retrieval easier, terminal SMTP servers SHOULD 300 write messages addressed to either the UTF-8 address or the all-ASCII 301 address into the same mailbox. However, given that this is quite 302 different than common practice today, the ramifications for doing this 303 should be studied carefully before this is implemented. 305 Intermediate SMTP servers MAY change the values in the Address-map: 306 header (such as to add one that is missing or to correct a mapping), but 307 SHOULD only do so for domains local to the intermediate SMTP server. 309 Terminal SMTP servers MAY look into the headers of a message to 310 determine whether they should upgrade a downgraded set of headers to 311 UTF-8. This is easy to determine: if the Address-map: header contains 312 only ASCII, it was downgraded earlier in the chain of SMTP server. 313 Upgrading is particularly useful on bounce messages caused by bad 314 mappings. 316 4. Security considerations 318 If a user has a non-ASCII mailbox address and a mapped all-ASCII mailbox 319 address, a digital certificate that identifies that user SHOULD have 320 both addresses in the identity. Having multiple email addresses as 321 identities in a single certificate is already supported in PKIX and 322 OpenPGP. 324 Internationalized local parts will cause mail addresses to become 325 longer, and possibly make it harder to keep lines in a header under 78 326 characters. Lines that are longer than 78 characters (which is a SHOULD 327 specification, not a MUST specification, in RFC 2822) could possibly 328 cause mail user agents to fail in ways that affect security. 330 5. IANA considerations 332 IANA will assign the UTF-8-HEADERS extension for ESMTP. 334 The UTF-8 headers extension is defined as follows: 336 (1) The name of the SMTP service extension is "UTF-8 headers". 338 (2) The EHLO keyword value associated with the extension is 339 UTF-8-HEADERS. 341 (3) No parameter is used with the UTF-8-HEADERS EHLO keyword. 343 (4) No additional parameters are added to either the MAIL FROM or RCPT 344 TO commands. 346 (5) No additional SMTP verbs are defined by this extension. 348 (6) This document specifies how support for the extension affects the 349 behavior of a server and client SMTP. 351 6. References 353 6.1 Normative references 355 [ASCII] Cerf, V., "ASCII format for Network Interchange", RFC 20, 356 October 1969. 358 [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing 359 Domain Names in Applications (IDNA)", RFC 3490, March 2003. 361 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 362 Requirement Levels", BCP 14, RFC 2119, March 1997. 364 [MIME3] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part 365 Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 366 1996. 368 [MSGFMT] Resnick, P., "Internet Message Format", RFC 2822, April 2001. 370 [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April 371 2001. 373 [UTF8] Yergeau, F. "UTF-8, a Transformation Format of ISO 10646", RFC 374 3629, November 2003. 376 7. Author's address 378 Paul Hoffman 379 Internet Mail Consortium 380 127 Segre Place 381 Santa Cruz, CA 95060 USA 382 phoffman@imc.org 384 A. Open issues 386 - POP and IMAP might be updated to allow one request to bring in two or 387 more mailboxes; otherwise, users will have to do two separate requests. 389 - It might be good to have a protocol for determining mappings, but it 390 is not defined here.