idnits 2.17.1 draft-ietf-eai-rfc5335bis-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC5335, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC5322, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC2045, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC2045, updated by this document, for RFC5378 checks: 1994-06-16) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 10, 2011) is 4672 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2821' is mentioned on line 408, but not defined ** Obsolete undefined reference: RFC 2821 (Obsoleted by RFC 5321) == Missing Reference: 'RFC2822' is mentioned on line 408, but not defined ** Obsolete undefined reference: RFC 2822 (Obsoleted by RFC 5322) == Missing Reference: 'RFC5504' is mentioned on line 412, but not defined ** Obsolete undefined reference: RFC 5504 (Obsoleted by RFC 6530) -- Possible downref: Non-RFC (?) normative reference: ref. 'ASCII' == Outdated reference: A later version (-12) exists of draft-ietf-eai-frmwrk-4952bis-10 == Outdated reference: A later version (-16) exists of draft-ietf-eai-rfc5336bis-07 -- Possible downref: Non-RFC (?) normative reference: ref. 'NFC' ** Downref: Normative reference to an Informational RFC: RFC 5598 Summary: 4 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Email Address Internationalization A. Yang 3 (EAI) TWNIC 4 Internet-Draft S. Steele 5 Obsoletes: 5335 (if approved) Microsoft 6 Updates: 2045,5322 (if approved) N. Freed 7 Intended status: Standards Track Oracle 8 Expires: January 11, 2012 July 10, 2011 10 Internationalized Email Headers 11 draft-ietf-eai-rfc5335bis-11 13 Abstract 15 Internet mail was originally limited to 7-bit ASCII. MIME added 16 support for the use of 8-bit character sets in body parts, and also 17 defined an encoded-word construct so other character sets could be 18 used in certain header field values. But full internationalization 19 of electronic mail requires additional enhancements to allow the use 20 of Unicode, including characters outside the ASCII repertoire, in 21 mail addresses as well as direct use of Unicode in header fields like 22 From:, To:, and Subject:, without requiring the use of complex 23 encoded-word constructs. This document specifies an enhancement to 24 the Internet Message Format that allows use of Unicode in mail 25 addresses and most header field content. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on January 11, 2012. 44 Copyright Notice 46 Copyright (c) 2011 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2. Terminology Used In This Specification . . . . . . . . . . . . 3 63 3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4 64 3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4 65 3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5 66 3.3. Changes to MIME Message Type Encoding Restrictions . . . . 6 67 3.4. The Message/global Media Type . . . . . . . . . . . . . . 6 68 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8 69 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 70 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 71 7. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 9 72 7.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 9 73 7.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 10 74 7.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 10 75 7.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 10 76 7.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 10 77 7.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 10 78 7.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 10 79 7.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 10 80 7.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 10 81 7.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 10 82 7.11. draft-ietf-eai-rfc5335bis-11 . . . . . . . . . . . . . . . 11 83 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 84 8.1. Normative References . . . . . . . . . . . . . . . . . . . 11 85 8.2. Informative References . . . . . . . . . . . . . . . . . . 12 87 1. Introduction 89 Internet mail distinguishes a message from its transport and further 90 divides a message between a header and a body [RFC5598]. Internet 91 mail header field values contain a variety of strings that are 92 intended to be user-visible. The range of supported characters for 93 these strings was originally limited to 7-bit [ASCII]. MIME 94 [RFC2045] [RFC2046] [RFC2047] provides the ability to use additional 95 character sets, but this support is limited to body part data and to 96 special encoded-word constructs that were only allowed in a limited 97 number of places in header field values. 99 Globalization of the Internet requires support of the much larger set 100 of characters provided by Unicode [RFC5198] in both mail addresses 101 and most header field values. Additionally, complex encoding schemes 102 like encoded-words introduce inefficiencies as well as significant 103 opportunities for processing errors. And finally, native support for 104 the UTF-8 charset is now available on most systems. Hence it is 105 strongly desirable for Internet mail to support UTF-8 [RFC3629] 106 directly. 108 This document specifies an enhancement to the Internet Message Format 109 [RFC5322] and to MIME that permits the direct use of UTF-8, rather 110 than only ASCII, in header field values, including mail addresses. A 111 new media type, message/global, is defined for messages that use this 112 extended format. This specification also lifts the MIME restriction 113 on having non-identity content-transfer-encodings on any subtype of 114 the message top-level type so that message/global parts can be safely 115 transmitted across existing mail infrastructure. 117 This specification is based on a model of native, end-to-end support 118 for UTF-8, which depends on having an "8-bit clean" environment 119 assured by the transport system. Support for carriage across legacy, 120 7-bit infrastructure and for processing by 7-bit receivers requires 121 additional mechanisms that are not provided by these specifications. 123 2. Terminology Used In This Specification 125 A plain ASCII string is fully compatible with [RFC5321] and 126 [RFC5322]. In this document, non-ASCII strings are UTF-8 strings if 127 they are in header field values which contain at least one (see Section 3.1). 130 Unless otherwise noted, all terms used here are defined in [RFC5321], 131 [RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or 132 [I-D.ietf-eai-rfc5336bis]. 134 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 135 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 136 document are to be interpreted as described in [RFC2119]. 138 The term "8-bit" means octets are present in the data with values 139 above 0x7F. 141 3. Changes to Message Header Fields 143 To permit Unicode characters in field values, the header definition 144 in [RFC5322] is extended to support the new format. The following 145 sections specify the necessary changes to RFC 5322's ABNF. 147 The syntax rules not mentioned below remain defined as in [RFC5322]. 149 Note that this protocol does not change RFC 5322 rules for defining 150 header field names. The bodies of header fields are allowed to 151 contain Unicode characters, but the header field names themselves 152 must contain only ASCII characters. 154 Also note that messages in this format require the use of the 155 &UTF8SMTPbis; extension [I-D.ietf-eai-rfc5336bis] to be transferred 156 via SMTP. 158 3.1. UTF-8 Syntax and Normalization 160 UTF-8 characters can be defined in terms of octets using the 161 following ABNF [RFC5234], taken from [RFC3629]: 163 UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4 165 UTF8-2 = 167 UTF8-3 = 169 UTF8-4 = 171 See [RFC5198] for a discussion of Unicode normalization; 172 normalization form [NFC] SHOULD be used. Actually, if one is going 173 to do internationalization properly, one of the most often-cited 174 goals is to permit people to spell their names correctly. Since many 175 mailbox local parts reflect personal names, that principle applies to 176 mailboxes as well. The NFKC normalization form SHOULD NOT be used 177 because it may lose information that is needed to correctly spell 178 some names in some unusual circumstances. 180 3.2. Syntax Extensions to RFC 5322 182 The following rules extend the ABNF syntax defined in [RFC5322] and 183 [RFC5234] in order to allow UTF-8 content. 185 VCHAR =/ UTF8-non-ascii 187 ctext =/ UTF8-non-ascii 189 atext =/ UTF8-non-ascii 191 qtext =/ UTF8-non-ascii 193 text =/ UTF8-non-ascii 194 ; note that this upgrades the body to UTF-8 196 dtext =/ UTF8-non-ascii 198 A consequence of the change to the dtext rule is that UTF-8 would 199 then be allowed in the domain parts of message-ids as well as 200 addresses. This is unnecessary and undesirable, so three additional 201 RFC 5322 rules are redefined and a new itext rule is added: 203 id-left = dot-id-text 205 id-right = dot-id-text / no-fold-literal 207 dot-id-text = 1*itext *("." 1*itext) 209 itext = ALPHA / DIGIT / ; Printable US-ASCII 210 "!" / "#" / ; characters not including 211 "$" / "%" / ; specials. Used for msg-ids. 212 "&" / "'" / 213 "*" / "+" / 214 "-" / "/" / 215 "=" / "?" / 216 "^" / "_" / 217 "`" / "{" / 218 "|" / "}" / 219 "~" 221 This change also specifically disallows obsolete forms of message-ids 222 that RFC 5322 allows. 224 The preceding changes mean that the following constructs now allow 225 UTF-8: 227 1. Unstructured text, used in header fields like Subject: or 228 Content-description:. 230 2. Any construct that uses atoms, including but not limited to the 231 local parts of addresses. This includes addresses in the "for" 232 clauses of Received: header fields. 234 3. Quoted strings. 236 4. Domains. (But not in message-ids.) 238 Note that header field names are not on this list; these are still 239 restricted to ASCII. 241 3.3. Changes to MIME Message Type Encoding Restrictions 243 This specification updates Section 6.4 of [RFC2045]. [RFC2045] 244 prohibits applying a content-transfer-encoding to any subtypes of 245 "message/". This specification relaxes that rule -- it allows newly 246 defined MIME types to permit content-transfer-encoding, and it allows 247 content-transfer-encoding for message/global (see Section 3.4). 249 Background: Normally, transfer of message/global will be done in 250 8-bit-clean channels, and body parts will have "identity" encodings, 251 that is, no decoding is necessary. 253 But in the case where a message containing a message/global is 254 downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding 255 might have to be applied to the message; if the message travels 256 multiple times between a 7-bit environment and an environment 257 implementing these extensions, multiple levels of encoding may occur. 258 This is expected to be rarely seen in practice, and the potential 259 complexity of other ways of dealing with the issue are thought to be 260 larger than the complexity of allowing nested encodings where 261 necessary. 263 3.4. The Message/global Media Type 265 Internationalized messages in this format MUST only be transmitted as 266 authorized by [I-D.ietf-eai-rfc5336bis] or within a non-SMTP 267 environment that supports these messages. A message is a "message/ 268 global message" if: 270 o it contains 8-bit UTF-8 header values as specified in this 271 document, or 273 o it contains 8-bit UTF-8 values in the header fields of body parts. 275 The content of a message/global part is otherwise identical to that 276 of a message/rfc822 part. 278 If this type is sent to a 7-bit-only system, it has to have an 279 appropriate content-transfer-encoding applied. (Note that a system 280 compliant with MIME that doesn't recognize message/global is supposed 281 to treat it as "application/octet-stream" as described in Section 282 5.2.4 of [RFC2046].) 284 Type name: message 286 Subtype name: global 288 Required parameters: none 290 Optional parameters: none 292 Encoding considerations: Any content-transfer-encoding is permitted. 293 The 8-bit or binary content-transfer-encodings are recommended 294 where permitted. 296 Security considerations: See Section 4. 298 Interoperability considerations: This media type provides 299 functionality similar to the message/rfc822 content type for email 300 messages with international email headers. When there is a need 301 to embed or return such content in another message, there is 302 generally an option to use this media type and leave the content 303 unchanged or down-convert the content to message/rfc822. Both of 304 these choices will interoperate with the installed base, but with 305 different properties. Systems unaware of internationalized 306 headers will typically treat a message/global body part as an 307 unknown attachment, while they will understand the structure of a 308 message/rfc822. However, systems that understand message/global 309 will provide functionality superior to the result of a down- 310 conversion to message/rfc822. The most interoperable choice 311 depends on the deployed software. 313 Published specification: RFC XXXX 315 Applications that use this media type: SMTP servers and email 316 clients that support multipart/report generation or parsing. 317 Email clients that forward messages with international headers as 318 attachments. 320 Additional information: 322 Magic number(s): none 324 File extension(s): The extension ".u8msg" is suggested. 326 Macintosh file type code(s): A uniform type identifier (UTI) of 327 "public.utf8-email-message" is suggested. This conforms to 328 "public.message" and "public.composite-content", but does not 329 necessarily conform to "public.utf8-plain-text". 331 Person & email address to contact for further information: See the 332 Author's Address section of this document. 334 Intended usage: COMMON 336 Restrictions on usage: This is a structured media type that embeds 337 other MIME media types. The 8-bit or binary content-transfer- 338 encoding SHOULD be used unless this media type is sent over a 339 7-bit-only transport. 341 Author: See the Author's Address section of this document. 343 Change controller: IETF Standards Process 345 4. Security Considerations 347 Because UTF-8 often requires several octets to encode a single 348 character, internationalization may cause header field values in 349 general and mail addresses in particular to become longer. As 350 specified in [RFC5322], each line of characters MUST be no more than 351 998 octets, excluding the CRLF. On the other hand, MDA (Mail 352 Delivery Agent) processes that parse, store, or handle email 353 addresses or local parts must take extra care not to overflow 354 buffers, truncate addresses, or exceed storage allotments. Also, 355 they must take care, when comparing, to use the entire lengths of the 356 addresses. 358 There are lots of ways of using UTF-8 to represent something 359 equivalent or similar to a particular displayed character or group of 360 characters. This may allow filtering systems to be bypassed by using 361 a slightly different character to avoid detection while still 362 reaching the end user with largely the same intended deleterious 363 effect. The normalization process is described in Section 3.1 is 364 recommended to minimize this problem. 366 The security impact of UTF-8 headers on email signature systems such 367 as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is 368 discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14. 370 If a user has a non-ASCII mailbox address and an ASCII mailbox 371 address, a digital certificate that identifies that user might have 372 both addresses in the identity. Having multiple email addresses as 373 identities in a single certificate is already supported in PKIX 374 (Public Key Infrastructure for X.509 Certificates) [RFC5280] and 375 OpenPGP [RFC3156], but there may be user interface issues associated 376 with the introduction of UTF-8 into addresses in this context. 378 5. IANA Considerations 380 IANA is requested to update the registration of the message/global 381 MIME type using the registration form contained in Section 3.4. 383 6. Acknowledgements 385 This document incorporates many ideas first described in Internet- 386 Draft form by Paul Hoffman, although many details have changed from 387 that earlier work. 389 The author especially thanks Jeff Yeh for his efforts and 390 contributions on editing previous versions. 392 Most of the content of this document was provided by John C Klensin 393 and Dave Crocker. Significant comments and suggestions were received 394 from Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, 395 Chris Newman, Kristin Hubner, Yangwoo Ko, Yoshiro Yoneya, and other 396 members of the JET team (Joint Engineering Team) and were 397 incorporated into the document. The editors wish to sincerely thank 398 them all for their contributions. 400 7. Edit history 402 [[RFC Editor: please remove this section before publishing.]] 404 7.1. draft-ietf-eai-rfc5335bis-00 406 1. Applied Errata suggested by Alfred Hoenes. 408 2. Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322]. 410 3. Abrogate in ABNF of . 412 4. Revoke [RFC5504] from this document. 414 5. Upgrade some references from I-Ds to RFC. 416 7.2. draft-ietf-eai-rfc5335bis-01 418 1. Author name revised. 420 7.3. draft-ietf-eai-rfc5335bis-02 422 1. ABNF revised. 424 7.4. draft-ietf-eai-rfc5335bis-03 426 1. Fix typos 428 2. ABNF revised 430 3. Improve sentence 432 7.5. draft-ietf-eai-rfc5335bis-04 434 1. improve sentences and ABNF revised based on AD and Co-chairs 436 7.6. draft-ietf-eai-rfc5335bis-05 438 1. ABNF revised based on AD comments 440 7.7. draft-ietf-eai-rfc5335bis-06 442 1. ABNF revised 444 2. improve Section 5 446 7.8. draft-ietf-eai-rfc5335bis-07 448 1. Minor ABNF revised in Section 3.2 450 2. improve Section 5 452 7.9. draft-ietf-eai-rfc5335bis-09 454 Version -08 was posted in error and withdrawn. Version 09 is is 455 identical to version 07 except for a date change, addition of this 456 note, and some vertical spacing compression on this page. 458 7.10. draft-ietf-eai-rfc5335bis-10 460 1. Add appendix and overview of changes 462 2. Replace polls result in Abstract and Section 1 463 3. Minor Sentence modification 465 7.11. draft-ietf-eai-rfc5335bis-11 467 1. Major rewrite of entire document to incorporate Dave Crocker's 468 simplified ABNF. 470 2. The document has intentionally been refocused on implementors 471 wishing to adapt their software to support EAI, so much of the 472 explanatory and historical text has been removed. (Some of it 473 may be reintroduced later as an appendix. 475 8. References 477 8.1. Normative References 479 [ASCII] "Coded Character Set -- 7-bit American 480 Standard Code for Information 481 Interchange", ANSI X3.4, 1986. 483 [I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and 484 Framework for Internationalized 485 Email", 486 draft-ietf-eai-frmwrk-4952bis-10 (work 487 in progress), September 2010. 489 [I-D.ietf-eai-rfc5336bis] Yao, J. and W. MAO, "SMTP Extension 490 for Internationalized Email Address", 491 draft-ietf-eai-rfc5336bis-07 (work in 492 progress), December 2010. 494 [NFC] Davis, M. and K. Whistler, "Unicode 495 Standard Annex #15: Unicode 496 Normalization Forms", September 2010, 497 . 500 [RFC2119] Bradner, S., "Key words for use in 501 RFCs to Indicate Requirement Levels", 502 BCP 14, RFC 2119, March 1997. 504 [RFC3629] Yergeau, F., "UTF-8, a transformation 505 format of ISO 10646", STD 63, 506 RFC 3629, November 2003. 508 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode 509 Format for Network Interchange", 510 RFC 5198, March 2008. 512 [RFC5234] Crocker, D. and P. Overell, "Augmented 513 BNF for Syntax Specifications: ABNF", 514 STD 68, RFC 5234, January 2008. 516 [RFC5321] Klensin, J., "Simple Mail Transfer 517 Protocol", RFC 5321, October 2008. 519 [RFC5322] Resnick, P., Ed., "Internet Message 520 Format", RFC 5322, October 2008. 522 [RFC5598] Crocker, D., "Internet Mail 523 Architecture", RFC 5598, July 2009. 525 8.2. Informative References 527 [RFC2045] Freed, N. and N. Borenstein, 528 "Multipurpose Internet Mail Extensions 529 (MIME) Part One: Format of Internet 530 Message Bodies", RFC 2045, 531 November 1996. 533 [RFC2046] Freed, N. and N. Borenstein, 534 "Multipurpose Internet Mail Extensions 535 (MIME) Part Two: Media Types", 536 RFC 2046, November 1996. 538 [RFC2047] Moore, K., "MIME (Multipurpose 539 Internet Mail Extensions) Part Three: 540 Message Header Extensions for Non- 541 ASCII Text", RFC 2047, November 1996. 543 [RFC3156] Elkins, M., Del Torto, D., Levien, R., 544 and T. Roessler, "MIME Security with 545 OpenPGP", RFC 3156, August 2001. 547 [RFC5280] Cooper, D., Santesson, S., Farrell, 548 S., Boeyen, S., Housley, R., and W. 549 Polk, "Internet X.509 Public Key 550 Infrastructure Certificate and 551 Certificate Revocation List (CRL) 552 Profile", RFC 5280, May 2008. 554 [RFC6152] Klensin, J., Freed, N., Rose, M., and 555 D. Crocker, "SMTP Service Extension 556 for 8-bit MIME Transport", STD 71, 557 RFC 6152, March 2011. 559 Authors' Addresses 561 Abel Yang 562 TWNIC 563 4F-2, No. 9, Sec 2, Roosevelt Rd. 564 Taipei, 100 565 Taiwan 567 Phone: +886 2 23411313 ext 505 568 EMail: abelyang@twnic.net.tw 570 Shawn Steele 571 Microsoft 573 EMail: Shawn.Steele@microsoft.com 575 Ned Freed 576 Oracle 577 800 Royal Oaks 578 Monrovia, CA 91016-6347 579 USA 581 EMail: ned+ietf@mrochek.com