idnits 2.17.1
draft-hoffman-utf8headers-00.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** Looks like you're using RFC 2026 boilerplate. This must be updated to
follow RFC 3978/3979, as updated by RFC 4748.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
** Missing expiration date. The document expiration date should appear on
the first and last page.
== No 'Intended status' indicated for this document; assuming Proposed
Standard
== The page length should not exceed 58 lines per page, but there was 1
longer page, the longest (page 1) being 401 lines
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** There are 5 instances of too long lines in the document, the longest one
being 6 characters in excess of 72.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The document seems to lack the recommended RFC 2119 boilerplate, even if
it appears to use RFC 2119 keywords.
(The document does seem to have the reference to RFC 2119 which the
ID-Checklist requires).
-- The document seems to lack a disclaimer for pre-RFC5378 work, but may
have content which was first submitted before 10 November 2008. If you
have contacted all the original authors and they are all willing to grant
the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
this comment. If not, you may need to add the pre-RFC5378 disclaimer.
(See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (December 15, 2003) is 7410 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
** Obsolete normative reference: RFC 3490 (ref. 'IDNA') (Obsoleted by RFC
5890, RFC 5891)
** Obsolete normative reference: RFC 2822 (ref. 'MSGFMT') (Obsoleted by RFC
5322)
** Obsolete normative reference: RFC 2821 (ref. 'SMTP') (Obsoleted by RFC
5321)
Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
1 Internet Draft Paul Hoffman
2 draft-hoffman-utf8headers-00.txt Internet Mail Consortium
3 December 15, 2003
4 Expires in six months
6 SMTP Service Extensions or Transmission of Headers
7 in UTF-8 Encoding
9 Status of this Memo
11 This document is an Internet-Draft and is in full conformance with
12 all provisions of Section 10 of RFC2026.
14 Internet-Drafts are working documents of the Internet Engineering
15 Task Force (IETF), its areas, and its working groups. Note
16 that other groups may also distribute working documents as
17 Internet-Drafts.
19 Internet-Drafts are draft documents valid for a maximum of six
20 months and may be updated, replaced, or obsoleted by other documents
21 at any time. It is inappropriate to use Internet-Drafts as
22 reference material or to cite them other than as "work in progress."
24 The list of current Internet-Drafts can be accessed at
25 http://www.ietf.org/ietf/1id-abstracts.txt
27 The list of Internet-Draft Shadow Directories can be accessed at
28 http://www.ietf.org/shadow.html.
30 Abstract
32 Mailbox names often represent the names of human users. Many of these
33 users throughout the world have names that are not normally represented
34 by the users with just the ASCII repertoire of characters, and would therefore
35 like to use their real names in their mailbox names. These users
36 are also likely to use non-ASCII text in their common names and subjects
37 of email messages, both in what they send and what they receive.
38 This protocol specifies how to represent all headers
39 of email messages encoded in UTF-8.
41 1. Introduction
43 The format of email messages [MSGFMT] only allows ASCII characters in the
44 headers of messages. This prevents users from having email addresses
45 that contain non-ASCII characters. It further forces non-ASCII text in
46 common names, comments, and in free text (such as in the Subject: field)
47 to be in quoted-printable format [MIME3]. This specification describes a
48 change to the email message format, and to SMTP message transport, that
49 allows non-ASCII characters throughout email headers. These changes
50 affect SMTP clients, SMTP servers, and mail user agents (MUAs).
52 In this specification, the SMTP protocol [SMTP] is used to prevent the
53 transmission of messages with UTF-8 [UTF8] headers to systems that
54 cannot handle such messages. The new SMTP extension has the name
55 "UTF-8-HEADERS".
57 Using this new SMTP extension prevents the introduction of such
58 messages in message stores that might misrepresent or mangle such
59 messages. It should be noted that using an ESMTP extension does not
60 prevent transferring email messages with UTF-8 headers to other systems
61 that use the email format for messages, such as in the POP and IMAP
62 protocols. Those protocols will need to be changed in order to handle
63 messages in message stores that have UTF-8 headers.
65 The dual motivations of this protocol are to allow UTF-8 everywhere in
66 the headers and to not bounce any messages just because they originated
67 with UTF-8 headers. Using this protocol, messages that originated with
68 UTF-8 headers will only be bounced if an enabled SMTP client is speaking
69 to an unenabled SMTP server and some of the UTF-8 headers cannot be
70 downgraded to all-ASCII headers. This protocol describes how to
71 downgrade all headers from UTF-8 to all-ASCII, but does not guarantee
72 that such downgrading will always be successful.
74 Further, this protocol allows current users who have all-ASCII mailbox
75 names to step up to UTF-8 headers easily. This means that users of this
76 protocol should normally be able to communicate with other users of this
77 protocol and with users who have not yet updated.
79 This protocol does not require the sender or recipient of mail to have
80 mailbox names that do not include non-ASCII characters. For example, the
81 protocol might still be used if just the subject header has non-ASCII
82 characters, and the protocol must be used if other headers (particularly
83 Received headers) contain non-ASCII characters.
85 1.1 Terminology
87 The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
88 "MAY" in this document are to be interpreted as described in RFC 2119
89 [KEYWORDS].
91 Unless otherwise noted, all terms used here are defined in RFC 2821 and
92 RFC 2822.
94 In this document, an address is "all-ASCII" if every character in the
95 address is in the ASCII character repertoire [ASCII]; an address is
96 "non-ASCII" if any character is not in the ASCII character repertoire.
97 Similarly, a header body is "all-ASCII" if every character in the body
98 of the header is in the ASCII character repertoire; a header body is
99 "non-ASCII" if any character is not in the ASCII character repertoire.
101 This document is being discussed on the ietf-imaa mailing list. See
102 for information about subscribing and
103 the list's archive.
105 2. Changes to MUAs and to the user's mail environment
107 For this protocol to work well (that is, for it not to bounce mail
108 excessively when an enabled system encounters a non-enabled system), any
109 mail sender who has non-ASCII characters in the
110 addr-spec of their mailbox name SHOULD
111 have a second mailbox whose addr-spec contains only ASCII characters. This
112 second mailbox is used when a recipient of a message is not using this
113 protocol; this is the "fallback address" for the sender.
115 Having two mailboxes is not an absolute requirement because some mail
116 systems will not allow a user to be able to get mail from two addresses
117 (the non-ASCII and all-ASCII addresses). If a user does have two
118 mailboxes, they SHOULD both be on the same mail server (that is, they
119 should both have the same host name in the user's address).
121 Having two mailboxes can lead to confusion for users if the MUA does not
122 handle them well. MUAs that follow this specification SHOULD have
123 options that would make it seem like two mailboxes are one. For example,
124 if a user says "read my mail", the MUA SHOULD read from both the mailbox
125 with the non-ASCII name and the mailbox with the all-ASCII name. Note
126 that this feature might not be necessary: a terminating SMTP server
127 might have combined all incoming mail for both addresses into a single
128 mailbox. However, MUAs SHOULD NOT assume that combining by the SMTP
129 server will always be the case.
131 2.1 Changes to MUA administrative interfaces
133 The administrative interface for MUAs that use this protocol MUST have
134 method for a user to specify the name of their mailbox that contains
135 non-ASCII characters, and MUST have a method for the user to specify the
136 name of their mailbox that contains non-ASCII characters.
138 The MUA user interface SHOULD also allow users to specify the common
139 name associated with the non-ASCII mailbox using non-ASCII
140 characters; this common name MUST be encoded as UTF-8. The common name
141 associated with the all-ASCII mailbox MUST only contain ASCII
142 characters, although it can use a quoted-printable format to represent a
143 different encoding; this encoding SHOULD be UTF-8.
145 MUAs are encouraged to cache address mappings that are specified
146 in incoming mail. Given that mappings might change over time,
147 these MUAs might over-write existing mappings with new ones,
148 and might give the user a choice for the time-to-live for the
149 cached mapping.
151 2.2 Address-map headers
153 For every address in a message with a non-ASCII local-part, the mail
154 initiator SHOULD create a mapping in a new header, called
155 "Address-map:". A message SHOULD have one Address-map: header for every
156 non-ASCII address for which the sender knows a map. The header is only
157 for addresses that have a non-ASCII local-part in its addr-spec. It MUST
158 NOT be used for addresses that have all-ASCII addr-specs, even if those
159 addresses have UTF-8 domain names, and it MUST NOT be used if the
160 local-part of the addr-spec is all-ASCII but the display-name or the
161 comment is non-ASCII.
163 If the sender has an all-ASCII local-part associated with its non-ASCII
164 mailbox, the sender's MUA MUST create an Address-map header for that
165 association. If the sender knows (such as through caching incoming
166 address maps or from an address book) the mapping for any recipient that
167 has a non-ASCII mailbox name, the sending MUA SHOULD create an
168 Address-map header for it.
170 Both addresses in the Address-map header are full addr-specs. The body
171 of the Address-map header only contains addr-specs, never display-names
172 or comments. The format of the Address-map header is:
174 Address-map: ,
176 The encoding for address-with-non-ASCII-LHS MUST be UTF-8; the encoding
177 for downgrade-address MUST be ASCII. If the domain name in an
178 internationalized domain name [IDNA], then it MUST be encoded in UTF-8
179 in the address-with-non-ASCII-LHS and MUST be encoded using IDNA in the
180 downgrade-address.
182 Examples:
184 Address-map: Jos@example.com,jose@example.com
186 Address-map: bjn@rksmrgs.se,
187 bjorn-ascii@rksmrgs-5wao1o.se
189 Note that when receiving mail, the Address-map headers may be all in ASCII.
190 This would be due to an intervening SMTP server or other agent downgrading
191 the map. All-ASCII Address-map headers MUST be accepted.
193 2.3 Changes to MUA sending
195 Sending MUAs that follow this protocol MUST create all headers encoded
196 in UTF-8. No other direct encodings are allowed. MUAs MAY continue to
197 use quoted-printable text to specify some text in other encodings;
198 however this is not recommended because it is likely that this will not
199 interoperate well with MUAs that follow this specification.
201 3. Changes to SMTP
203 This protocol defines a new SMTP extension, UTF-8-HEADERS. (The formal
204 definition is in the IANA Considerations section.)
206 3.1 UTF-8-HEADERS extension
208 If an SMTP server advertises the UTF-8-HEADERS extentension, an
209 SMTP client that supports this protocol SHOULD send message headers
210 as described in this document.
212 The terminal SMTP server is responsible for knowing whether or not the
213 message store can handle UTF-8 headers. A terminal SMTP server MUST NOT
214 advertise the UTF-8-HEADERS extension if the message store for which it
215 is responsible cannot
216 handle UTF-8 headers.
218 If an SMTP client does not see the UTF-8-HEADERS extension advertised
219 by an SMTP server, the SMTP client MUST downgrade the
220 non-ASCII contents of all header bodies before continuing to send
221 the message. The SMTP client SHOULD send the message with the downgraded
222 header bodies as a normal message.
223 If any header body cannot be downgraded, the SMTP client
224 MUST bounce the message with an error code of 558.
226 All UTF-8 headers bodies can be downgraded to being all-ASCII.
227 However, any header body that contains a non-ASCII mailbox name might
228 not be able to be downgraded if there is no Address-map header that
229 gives a mapping for the downgrading.
231 3.2 Downgrading header bodies
233 This section defines how to downgrade header bodies. Note that
234 downgrading MUST only be done if necessary. That is, downgrading
235 MUST never be done on fields or bodies that are all-ASCII.
237 3.2.1 Mailboxes
239 Mailboxes appear in many standard headers, such as To:, From:, Sender:,
240 Reply-to:, Cc:, Bcc:, Received:, and some of the Resent-: headers.
241 Downgrading mailboxes is done as follows:
243 1) If necessary, convert the domain using IDNA.
245 2) If necessary,convert the local-parts using values from an
246 Address-map: header in the message
248 3) If necessary,convert any display-name or comment using
249 quoted-printable with UTF-8 encoding
251 3.3.2 Message-ids
253 Downgrading message-ids is done as follows
255 1) If necessary,convert the id-left using Base64
257 2) If necessary,convert the id-right using Base64
259 3.3.3 Informational headers
261 If necessary, downgrading the bodies of informational headers (Subject:,
262 Comments:, and Keywords:) is done using quoted-printable with UTF-8
263 encoding.
265 3.3.4 Address-map headers
267 If necessary, the Address-map: header is downgraded using Base64 for
268 local-parts, and IDNA for domain names.
270 For example:
272 Address-map: Jos@example.com,jose@example.com
274 would be downgraded to:
276 Address-map: Sm9zw6k=@example.com,jose@example.com
278 As another example:
280 Address-map: bjn@rksmrgs.se,
281 bjorn-ascii@rksmrgs-5wao1o.se
283 would be downgraded to:
285 Address-map: YmrDtnJu@rksmrgs-5wao1o.se,
286 bjorn-ascii@rksmrgs-5wao1o.se
288 3.3 Things not changed from RFC 2822
290 Note that this protocol does change the definition of header field
291 names. That is, only the bodies of headers are allowed to have non-ASCII
292 characters; the rules in RFC 2822 for header names are not changed.
294 Similarly, this protocol does not change the date and time specification
295 in RFC 2822.
297 3.4 Additional processing rules
299 In order to make mail retrieval easier, terminal SMTP servers SHOULD
300 write messages addressed to either the UTF-8 address or the all-ASCII
301 address into the same mailbox. However, given that this is quite
302 different than common practice today, the ramifications for doing this
303 should be studied carefully before this is implemented.
305 Intermediate SMTP servers MAY change the values in the Address-map:
306 header (such as to add one that is missing or to correct a mapping), but
307 SHOULD only do so for domains local to the intermediate SMTP server.
309 Terminal SMTP servers MAY look into the headers of a message to
310 determine whether they should upgrade a downgraded set of headers to
311 UTF-8. This is easy to determine: if the Address-map: header contains
312 only ASCII, it was downgraded earlier in the chain of SMTP server.
313 Upgrading is particularly useful on bounce messages caused by bad
314 mappings.
316 4. Security considerations
318 If a user has a non-ASCII mailbox address and a mapped all-ASCII mailbox
319 address, a digital certificate that identifies that user SHOULD have
320 both addresses in the identity. Having multiple email addresses as
321 identities in a single certificate is already supported in PKIX and
322 OpenPGP.
324 Internationalized local parts will cause mail addresses to become
325 longer, and possibly make it harder to keep lines in a header under 78
326 characters. Lines that are longer than 78 characters (which is a SHOULD
327 specification, not a MUST specification, in RFC 2822) could possibly
328 cause mail user agents to fail in ways that affect security.
330 5. IANA considerations
332 IANA will assign the UTF-8-HEADERS extension for ESMTP.
334 The UTF-8 headers extension is defined as follows:
336 (1) The name of the SMTP service extension is "UTF-8 headers".
338 (2) The EHLO keyword value associated with the extension is
339 UTF-8-HEADERS.
341 (3) No parameter is used with the UTF-8-HEADERS EHLO keyword.
343 (4) No additional parameters are added to either the MAIL FROM or RCPT
344 TO commands.
346 (5) No additional SMTP verbs are defined by this extension.
348 (6) This document specifies how support for the extension affects the
349 behavior of a server and client SMTP.
351 6. References
353 6.1 Normative references
355 [ASCII] Cerf, V., "ASCII format for Network Interchange", RFC 20,
356 October 1969.
358 [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing
359 Domain Names in Applications (IDNA)", RFC 3490, March 2003.
361 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
362 Requirement Levels", BCP 14, RFC 2119, March 1997.
364 [MIME3] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part
365 Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November
366 1996.
368 [MSGFMT] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
370 [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April
371 2001.
373 [UTF8] Yergeau, F. "UTF-8, a Transformation Format of ISO 10646", RFC
374 3629, November 2003.
376 7. Author's address
378 Paul Hoffman
379 Internet Mail Consortium
380 127 Segre Place
381 Santa Cruz, CA 95060 USA
382 phoffman@imc.org
384 A. Open issues
386 - POP and IMAP might be updated to allow one request to bring in two or
387 more mailboxes; otherwise, users will have to do two separate requests.
389 - It might be good to have a protocol for determining mappings, but it
390 is not defined here.