idnits 2.17.1 draft-klensin-encoded-word-type-u-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC2047, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC2231, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. (Using the creation date from RFC2047, updated by this document, for RFC5378 checks: 1996-06-03) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 24, 2011) is 4531 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Duplicate reference: RFC2231, mentioned in 'RFC2231-Err478', was also mentioned in 'RFC2231'. -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode' Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Klensin 3 Internet-Draft November 24, 2011 4 Updates: 2047, 2231 (if approved) 5 Expires: May 27, 2012 7 The "U" Encoding for Encoded-Words in Email 8 draft-klensin-encoded-word-type-u-00 10 Abstract 12 The "Encoded Word" conventions have been used extensively in email 13 headers and elsewhere to permit the encoding of non-ASCII characters 14 where only ASCII ones are normally permitted. The existing 15 specification defines only two kinds of encoding, one of which cannot 16 be understood easily by people and the other of which has been widely 17 discredited. This document specifies a third encoding that is easily 18 accessible by users and much more closely tied to contemporary 19 practices. 21 The current version of the proposal is intended for possible 22 discussion in the EAI, IRI, and PRECIS WGs to see if it sheds light 23 on other issues being discussed in those WGs. It is not, at this 24 point, proposed for adoption. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on May 27, 2012. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Updated Specifications . . . . . . . . . . . . . . . . . . 3 62 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.3. Scope and Discussion List . . . . . . . . . . . . . . . . . 4 64 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 3. Security Considerations . . . . . . . . . . . . . . . . . . . . 4 66 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5 67 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 68 5.1. Normative References . . . . . . . . . . . . . . . . . . . 5 69 5.2. Informative References . . . . . . . . . . . . . . . . . . 5 70 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 6 72 1. Introduction 74 The "Encoded Word" conventions [RFC2047] have been used extensively 75 in email headers and elsewhere to permit the encoding of non-ASCII 76 characters where only ASCII ones are normally permitted. That 77 existing encoded-word specification defines only two kinds of 78 encoding, one of which cannot be understood easily by people ("B", 79 the MIME "Base64" encoding) and the other of which ("Q", so-called 80 Quoted Printable) has been widely discredited. This document 81 specifies a third encoding, based on the "\u'NNNN'" convention, that 82 is easily accessible by users and much more closely tied to 83 contemporary practices. 85 Unlike the "B" and "Q" encodings, which were specified at a time when 86 many coded character sets were in common use, it is now appropriate 87 [RFC5198] to tie a new encoding specifically to Unicode [Unicode] and 88 the corresponding ISO Standard [ISO10646], viewing conversion to 89 local character sets, if necessary at all, to be a local matter. 90 Consequently, this specification permits only the combination "=?iso- 91 10646-UCS-4?u?". 93 [[anchor2: Note in Draft: If we were really going to do this, it 94 would make sense to define a charset that would actually reflect 95 Unicode code points, not some encoding of them. Neither of the 96 currently-registered "iso-10646-UCS-4" nor "UTF-32" and its 97 variations are quite right for that purpose. Cf. 98 http://www.iana.org/assignments/character-sets]] 100 If adopted, it is intended not only as an alternative to "Q" and "B", 101 but also as an alternative to the %-encoding of Section 2.1 of the 102 URI Specification [RFC3986] of UTF-8 [RFC3629] (and other) strings. 103 %-encoding was more than adequate for its original purpose of 104 encoding eight-bit character sets, notably ISO 8859-1 [ISO8859-1], 105 but is problematic for email (especially addresses and fields related 106 to them) because "%" has an important historic (and still 107 occasionally used) meaning in those contexts and because its use to 108 encode already-encoded forms of multi-octet character sets, such as 109 UTF-8 and Unicode, creates strings that are at least as difficult for 110 end users to interpret as Base64. 112 1.1. Updated Specifications 114 This document, if approved, updates the Encoded-Word specification 115 [RFC2047] and the specification for the use encoded-words with 116 language information [RFC2231] to permit use of an additional 117 encoding type, type "U". 119 1.2. Terminology 121 Some reasonable understanding of Encoded-Words and the Quoted- 122 Printable, Base64, and %-encoding conventions are required to 123 understand this introductory material but not the proposal itself. 125 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 126 in this document are to be interpreted as defined in RFC 2119 127 [RFC2119]. 129 1.3. Scope and Discussion List 131 RFC Editor: In the unlikely event that you see this subsection, it 132 should be removed before publication. 134 The current version of the proposal is intended for possible 135 discussion in the EAI, IRI, and PRECIS WGs to see if it sheds light 136 on other issues being discussed in those WGs. If discussions are of 137 interest, they should occur on the mailing lists associated with 138 those groups. 140 This Internet Draft is, at this point, intended only to promote 141 discussion of a possibly-useful building block for other work. It is 142 not proposed for adoption by the IETF for any purpose. 144 2. Specification 146 A new encoding form for encoded words is defined with code "u". The 147 associated encoded-text string is consistent with the rules in 148 Section 4 of RFC 2047, i.e., it consists of ASCII characters with 149 space, tab, and "?" characters excluded. Non-ASCII characters are 150 encoded using the \u'NNNN' form, where "NNNN" consists of four to six 151 hexadecimal digits designating a Unicode (ISO 10646) code point. 152 That encoding convention is defined in RFC 5137 [RFC5137] together 153 with an explanation of why the quotes should be required. 155 As an example, the German equivalent of the string "This is nuts", 156 would appear in the extended form of RFC 2231 (updated by verified 157 Erratum 478 [RFC2231-Err478]) as 158 =?iso-10646-UCS-4+de?u?Das ist verr\u'00FC'ckt?= 160 3. Security Considerations 162 This specification does not raise any security issues that are not 163 already present in RFC 2047 and its various updates. Because the 164 coding is more transparent to the end user than any of Base64, Quoted 165 Printable for non-ASCII text, or %-encoding of UTF-8, it may 166 eliminate or reduce one possible attack vector that is present with 167 those other approaches. 169 4. IANA Considerations 171 [[anchor9: RFC Editor: Please remove this section.]] 172 Because there does not appear to be a registry for either encoded- 173 word encodings or the content-transfer-encodings on which they are 174 based, this document requires no actions by the IANA. 176 5. References 178 5.1. Normative References 180 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) 181 Part Three: Message Header Extensions for Non-ASCII Text", 182 RFC 2047, November 1996. 184 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 185 Requirement Levels", BCP 14, RFC 2119, March 1997. 187 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded 188 Word Extensions: 189 Character Sets, Languages, and Continuations", RFC 2231, 190 November 1997. 192 [RFC2231-Err478] 193 Stedfast, J., "MIME Parameter Value and Encoded Word 194 Extensions: Character Sets, Languages, and Continuations, 195 Erratum 478", November 2001, 196 . 198 [Unicode] The Unicode Consortium. The Unicode Standard, Version 199 6.0.0, defined by:, "The Unicode Standard, Version 6.0.0", 200 Mountain View, CA: The Unicode Consortium, 2011. ISBN 978- 201 1-936213-01-6, 2011, 202 . 204 5.2. Informative References 206 [ISO10646] 207 International Organization for Standardization, 208 "Information Technology - Universal Multiple-octet coded 209 Character Set (UCS)", ISO Standard 10646:2011, March 2011. 211 [ISO8859-1] 212 International Organization for Standardization, 213 "Information technology - 8-bit single byte coded graphic 214 - character sets - Part 1: Latin alphabet No. 1", 215 ISO Standard 8859-1:1998, 1998. 217 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 218 10646", STD 63, RFC 3629, November 2003. 220 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 221 Resource Identifier (URI): Generic Syntax", STD 66, 222 RFC 3986, January 2005. 224 [RFC5137] Klensin, J., "ASCII Escaping of Unicode Characters", 225 BCP 137, RFC 5137, February 2008. 227 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network 228 Interchange", RFC 5198, March 2008. 230 Author's Address 232 John C Klensin 233 1770 Massachusetts Ave, #322 234 Cambridge, MA 02140 235 USA 237 Phone: +1 617 491 5735 238 Email: john-ietf@jck.com