idnits 2.17.1 

draft-klensin-encoded-word-type-u-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC2047, but the
     abstract doesn't seem to mention this, which it should.

  -- The draft header indicates that this document updates RFC2231, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet has text resembling
     RFC 2119 boilerplate text.

     (Using the creation date from RFC2047, updated by this document, for
     RFC5378 checks: 1996-06-03)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 24, 2011) is 4531 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Duplicate reference: RFC2231, mentioned in 'RFC2231-Err478', was also
     mentioned in 'RFC2231'.

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode'


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                         J. Klensin
3	Internet-Draft                                         November 24, 2011
4	Updates: 2047, 2231 (if approved)
5	Expires: May 27, 2012

7	              The "U" Encoding for Encoded-Words in Email
8	                 draft-klensin-encoded-word-type-u-00

10	Abstract

12	   The "Encoded Word" conventions have been used extensively in email
13	   headers and elsewhere to permit the encoding of non-ASCII characters
14	   where only ASCII ones are normally permitted.  The existing
15	   specification defines only two kinds of encoding, one of which cannot
16	   be understood easily by people and the other of which has been widely
17	   discredited.  This document specifies a third encoding that is easily
18	   accessible by users and much more closely tied to contemporary
19	   practices.

21	   The current version of the proposal is intended for possible
22	   discussion in the EAI, IRI, and PRECIS WGs to see if it sheds light
23	   on other issues being discussed in those WGs.  It is not, at this
24	   point, proposed for adoption.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on May 27, 2012.

43	Copyright Notice

45	   Copyright (c) 2011 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
61	     1.1.  Updated Specifications  . . . . . . . . . . . . . . . . . . 3
62	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
63	     1.3.  Scope and Discussion List . . . . . . . . . . . . . . . . . 4
64	   2.  Specification . . . . . . . . . . . . . . . . . . . . . . . . . 4
65	   3.  Security Considerations . . . . . . . . . . . . . . . . . . . . 4
66	   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5
67	   5.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 5
68	     5.1.  Normative References  . . . . . . . . . . . . . . . . . . . 5
69	     5.2.  Informative References  . . . . . . . . . . . . . . . . . . 5
70	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 6

72	1.  Introduction

74	   The "Encoded Word" conventions [RFC2047] have been used extensively
75	   in email headers and elsewhere to permit the encoding of non-ASCII
76	   characters where only ASCII ones are normally permitted.  That
77	   existing encoded-word specification defines only two kinds of
78	   encoding, one of which cannot be understood easily by people ("B",
79	   the MIME "Base64" encoding) and the other of which ("Q", so-called
80	   Quoted Printable) has been widely discredited.  This document
81	   specifies a third encoding, based on the "\u'NNNN'" convention, that
82	   is easily accessible by users and much more closely tied to
83	   contemporary practices.

85	   Unlike the "B" and "Q" encodings, which were specified at a time when
86	   many coded character sets were in common use, it is now appropriate
87	   [RFC5198] to tie a new encoding specifically to Unicode [Unicode] and
88	   the corresponding ISO Standard [ISO10646], viewing conversion to
89	   local character sets, if necessary at all, to be a local matter.
90	   Consequently, this specification permits only the combination "=?iso-
91	   10646-UCS-4?u?".

93	   [[anchor2: Note in Draft: If we were really going to do this, it
94	   would make sense to define a charset that would actually reflect
95	   Unicode code points, not some encoding of them.  Neither of the
96	   currently-registered "iso-10646-UCS-4" nor "UTF-32" and its
97	   variations are quite right for that purpose.  Cf.
98	   http://www.iana.org/assignments/character-sets]]

100	   If adopted, it is intended not only as an alternative to "Q" and "B",
101	   but also as an alternative to the %-encoding of Section 2.1 of the
102	   URI Specification [RFC3986] of UTF-8 [RFC3629] (and other) strings.
103	   %-encoding was more than adequate for its original purpose of
104	   encoding eight-bit character sets, notably ISO 8859-1 [ISO8859-1],
105	   but is problematic for email (especially addresses and fields related
106	   to them) because "%" has an important historic (and still
107	   occasionally used) meaning in those contexts and because its use to
108	   encode already-encoded forms of multi-octet character sets, such as
109	   UTF-8 and Unicode, creates strings that are at least as difficult for
110	   end users to interpret as Base64.

112	1.1.  Updated Specifications

114	   This document, if approved, updates the Encoded-Word specification
115	   [RFC2047] and the specification for the use encoded-words with
116	   language information [RFC2231] to permit use of an additional
117	   encoding type, type "U".

119	1.2.  Terminology

121	   Some reasonable understanding of Encoded-Words and the Quoted-
122	   Printable, Base64, and %-encoding conventions are required to
123	   understand this introductory material but not the proposal itself.

125	   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
126	   in this document are to be interpreted as defined in RFC 2119
127	   [RFC2119].

129	1.3.  Scope and Discussion List

131	   RFC Editor: In the unlikely event that you see this subsection, it
132	   should be removed before publication.

134	   The current version of the proposal is intended for possible
135	   discussion in the EAI, IRI, and PRECIS WGs to see if it sheds light
136	   on other issues being discussed in those WGs.  If discussions are of
137	   interest, they should occur on the mailing lists associated with
138	   those groups.

140	   This Internet Draft is, at this point, intended only to promote
141	   discussion of a possibly-useful building block for other work.  It is
142	   not proposed for adoption by the IETF for any purpose.

144	2.  Specification

146	   A new encoding form for encoded words is defined with code "u".  The
147	   associated encoded-text string is consistent with the rules in
148	   Section 4 of RFC 2047, i.e., it consists of ASCII characters with
149	   space, tab, and "?" characters excluded.  Non-ASCII characters are
150	   encoded using the \u'NNNN' form, where "NNNN" consists of four to six
151	   hexadecimal digits designating a Unicode (ISO 10646) code point.
152	   That encoding convention is defined in RFC 5137 [RFC5137] together
153	   with an explanation of why the quotes should be required.

155	   As an example, the German equivalent of the string "This is nuts",
156	   would appear in the extended form of RFC 2231 (updated by verified
157	   Erratum 478 [RFC2231-Err478]) as
158	   =?iso-10646-UCS-4+de?u?Das ist verr\u'00FC'ckt?=

160	3.  Security Considerations

162	   This specification does not raise any security issues that are not
163	   already present in RFC 2047 and its various updates.  Because the
164	   coding is more transparent to the end user than any of Base64, Quoted
165	   Printable for non-ASCII text, or %-encoding of UTF-8, it may
166	   eliminate or reduce one possible attack vector that is present with
167	   those other approaches.

169	4.  IANA Considerations

171	   [[anchor9: RFC Editor: Please remove this section.]]
172	   Because there does not appear to be a registry for either encoded-
173	   word encodings or the content-transfer-encodings on which they are
174	   based, this document requires no actions by the IANA.

176	5.  References

178	5.1.  Normative References

180	   [RFC2047]  Moore, K., "MIME (Multipurpose Internet Mail Extensions)
181	              Part Three: Message Header Extensions for Non-ASCII Text",
182	              RFC 2047, November 1996.

184	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
185	              Requirement Levels", BCP 14, RFC 2119, March 1997.

187	   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
188	              Word Extensions:
189	              Character Sets, Languages, and Continuations", RFC 2231,
190	              November 1997.

192	   [RFC2231-Err478]
193	              Stedfast, J., "MIME Parameter Value and Encoded Word
194	              Extensions: Character Sets, Languages, and Continuations,
195	              Erratum 478", November 2001,
196	              <http://www.rfc-editor.org./errata_search.php?eid=478>.

198	   [Unicode]  The Unicode Consortium.  The Unicode Standard, Version
199	              6.0.0, defined by:, "The Unicode Standard, Version 6.0.0",
200	              Mountain View, CA: The Unicode Consortium, 2011. ISBN 978-
201	              1-936213-01-6, 2011,
202	              <http://www.unicode.org/versions/Unicode6.0.0/>.

204	5.2.  Informative References

206	   [ISO10646]
207	              International Organization for Standardization,
208	              "Information Technology - Universal Multiple-octet coded
209	              Character Set (UCS)", ISO Standard 10646:2011, March 2011.

211	   [ISO8859-1]
212	              International Organization for Standardization,
213	              "Information technology - 8-bit single byte coded graphic
214	              - character sets - Part 1: Latin alphabet No. 1",
215	              ISO Standard 8859-1:1998, 1998.

217	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
218	              10646", STD 63, RFC 3629, November 2003.

220	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
221	              Resource Identifier (URI): Generic Syntax", STD 66,
222	              RFC 3986, January 2005.

224	   [RFC5137]  Klensin, J., "ASCII Escaping of Unicode Characters",
225	              BCP 137, RFC 5137, February 2008.

227	   [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network
228	              Interchange", RFC 5198, March 2008.

230	Author's Address

232	   John C Klensin
233	   1770 Massachusetts Ave, #322
234	   Cambridge, MA  02140
235	   USA

237	   Phone: +1 617 491 5735
238	   Email: john-ietf@jck.com