idnits 2.17.1
draft-reschke-rfc5987bis-01.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack an IANA Considerations section. (See Section
2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
when there are no actions for IANA.)
** The abstract seems to contain references ([2], [1]), which it shouldn't.
Please replace those with straight textual mentions of the documents in
question.
-- The draft header indicates that this document obsoletes RFC5987, but the
abstract doesn't seem to directly say this. It does mention RFC5987
though, so this could be OK.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (September 8, 2011) is 4611 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
RFC 7232, RFC 7233, RFC 7234, RFC 7235)
-- Possible downref: Non-RFC (?) normative reference: ref. 'USASCII'
-- Duplicate reference: RFC2978, mentioned in 'Err1912', was also mentioned
in 'RFC2978'.
-- Obsolete informational reference (is this intentional?): RFC 2388
(Obsoleted by RFC 7578)
-- Obsolete informational reference (is this intentional?): RFC 5987
(Obsoleted by RFC 8187)
Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group J. Reschke
3 Internet-Draft greenbytes
4 Obsoletes: 5987 (if approved) September 8, 2011
5 Intended status: Standards Track
6 Expires: March 11, 2012
8 Indicating Character Encoding and Language for HTTP Header Field
9 Parameters
10 draft-reschke-rfc5987bis-01
12 Abstract
14 By default, message header field parameters in Hypertext Transfer
15 Protocol (HTTP) messages cannot carry characters outside the ISO-
16 8859-1 character set. RFC 2231 defines an encoding mechanism for use
17 in Multipurpose Internet Mail Extensions (MIME) headers. This
18 document specifies an encoding suitable for use in HTTP header fields
19 that is compatible with a profile of the encoding defined in RFC
20 2231.
22 Editorial Note (To be removed by RFC Editor before publication)
24 Distribution of this document is unlimited. Although this is not a
25 work item of the HTTPbis Working Group, comments should be sent to
26 the Hypertext Transfer Protocol (HTTP) mailing list at
27 ietf-http-wg@w3.org [1], which may be joined by sending a message
28 with subject "subscribe" to ietf-http-wg-request@w3.org [2].
30 Discussions of the HTTPbis Working Group are archived at
31 .
33 XML versions, latest edits and the issues list for this document are
34 available from
35 . A
36 collection of test cases is available at
37 .
39 Status of This Memo
41 This Internet-Draft is submitted in full conformance with the
42 provisions of BCP 78 and BCP 79.
44 Internet-Drafts are working documents of the Internet Engineering
45 Task Force (IETF). Note that other groups may also distribute
46 working documents as Internet-Drafts. The list of current Internet-
47 Drafts is at http://datatracker.ietf.org/drafts/current/.
49 Internet-Drafts are draft documents valid for a maximum of six months
50 and may be updated, replaced, or obsoleted by other documents at any
51 time. It is inappropriate to use Internet-Drafts as reference
52 material or to cite them other than as "work in progress."
54 This Internet-Draft will expire on March 11, 2012.
56 Copyright Notice
58 Copyright (c) 2011 IETF Trust and the persons identified as the
59 document authors. All rights reserved.
61 This document is subject to BCP 78 and the IETF Trust's Legal
62 Provisions Relating to IETF Documents
63 (http://trustee.ietf.org/license-info) in effect on the date of
64 publication of this document. Please review these documents
65 carefully, as they describe your rights and restrictions with respect
66 to this document. Code Components extracted from this document must
67 include Simplified BSD License text as described in Section 4.e of
68 the Trust Legal Provisions and are provided without warranty as
69 described in the Simplified BSD License.
71 Table of Contents
73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
74 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 4
75 3. Comparison to RFC 2231 and Definition of the Encoding . . . . 4
76 3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 5
77 3.2. Parameter Value Character Set and Language Information . . 5
78 3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . 5
79 3.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 7
80 3.3. Language Specification in Encoded Words . . . . . . . . . 8
81 4. Guidelines for Usage in HTTP Header Field Definitions . . . . 8
82 4.1. When to Use the Extension . . . . . . . . . . . . . . . . 9
83 4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 9
84 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10
85 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
86 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
87 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10
88 7.2. Informative References . . . . . . . . . . . . . . . . . . 11
89 Appendix A. Changes from RFC 5987 . . . . . . . . . . . . . . . . 12
90 Appendix B. Change Log (to be removed by RFC Editor before
91 publication) . . . . . . . . . . . . . . . . . . . . 12
92 B.1. Since RFC5987 . . . . . . . . . . . . . . . . . . . . . . 12
93 B.2. Since draft-reschke-rfc5987bis-00 . . . . . . . . . . . . 12
94 Appendix C. Resolved issues (to be removed by RFC Editor
95 before publication) . . . . . . . . . . . . . . . . . 12
96 C.1. iso-8859-1 . . . . . . . . . . . . . . . . . . . . . . . . 12
97 C.2. title . . . . . . . . . . . . . . . . . . . . . . . . . . 13
98 C.3. historic5987 . . . . . . . . . . . . . . . . . . . . . . . 13
99 Appendix D. Open issues (to be removed by RFC Editor prior to
100 publication) . . . . . . . . . . . . . . . . . . . . 13
101 D.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
102 D.2. impls . . . . . . . . . . . . . . . . . . . . . . . . . . 13
104 1. Introduction
106 By default, message header field parameters in HTTP ([RFC2616])
107 messages cannot carry characters outside the ISO-8859-1 character set
108 ([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an encoding mechanism
109 for use in MIME headers. This document specifies an encoding
110 suitable for use in HTTP header fields that is compatible with a
111 profile of the encoding defined in RFC 2231.
113 This document obsoletes [RFC5987] and moves it to "historic" status;
114 the changes are summarized in Appendix A.
116 Note: in the remainder of this document, RFC 2231 is only
117 referenced for the purpose of explaining the choice of features
118 that were adopted; they are therefore purely informative.
120 Note: this encoding does not apply to message payloads transmitted
121 over HTTP, such as when using the media type "multipart/form-data"
122 ([RFC2388]).
124 2. Notational Conventions
126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
128 document are to be interpreted as described in [RFC2119].
130 This specification uses the ABNF (Augmented Backus-Naur Form)
131 notation defined in [RFC5234]. The following core rules are included
132 by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters),
133 DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP
134 (linear whitespace).
136 Note that this specification uses the term "character set" for
137 consistency with other IETF specifications such as RFC 2277 (see
138 [RFC2277], Section 3). A more accurate term would be "character
139 encoding" (a mapping of code points to octet sequences).
141 3. Comparison to RFC 2231 and Definition of the Encoding
143 RFC 2231 defines several extensions to MIME. The sections below
144 discuss if and how they apply to HTTP header fields.
146 In short:
148 o Parameter Continuations aren't needed (Section 3.1),
150 o Character Set and Language Information are useful, therefore a
151 simple subset is specified (Section 3.2), and
153 o Language Specifications in Encoded Words aren't needed
154 (Section 3.3).
156 3.1. Parameter Continuations
158 Section 3 of [RFC2231] defines a mechanism that deals with the length
159 limitations that apply to MIME headers. These limitations do not
160 apply to HTTP ([RFC2616], Section 19.4.7).
162 Thus, parameter continuations are not part of the encoding defined by
163 this specification.
165 3.2. Parameter Value Character Set and Language Information
167 Section 4 of [RFC2231] specifies how to embed language information
168 into parameter values, and also how to encode non-ASCII characters,
169 dealing with restrictions both in MIME and HTTP header parameters.
171 However, RFC 2231 does not specify a mandatory-to-implement character
172 set, making it hard for senders to decide which character set to use.
173 Thus, recipients implementing this specification MUST support the
174 "UTF-8" character set [RFC3629].
176 Furthermore, RFC 2231 allows the character set information to be left
177 out. The encoding defined by this specification does not allow that.
179 3.2.1. Definition
181 The syntax for parameters is defined in Section 3.6 of [RFC2616]
182 (with RFC 2616 implied LWS translated to RFC 5234 LWSP):
184 parameter = attribute LWSP "=" LWSP value
186 attribute = token
187 value = token / quoted-string
189 quoted-string =
190 token =
192 In order to include character set and language information, this
193 specification modifies the RFC 2616 grammar to be:
195 parameter = reg-parameter / ext-parameter
197 reg-parameter = parmname LWSP "=" LWSP value
199 ext-parameter = parmname "*" LWSP "=" LWSP ext-value
201 parmname = 1*attr-char
203 ext-value = charset "'" [ language ] "'" value-chars
204 ; like RFC 2231's
205 ; (see [RFC2231], Section 7)
207 charset = "UTF-8" / mime-charset
209 mime-charset = 1*mime-charsetc
210 mime-charsetc = ALPHA / DIGIT
211 / "!" / "#" / "$" / "%" / "&"
212 / "+" / "-" / "^" / "_" / "`"
213 / "{" / "}" / "~"
214 ; as in Section 2.3 of [RFC2978]
215 ; except that the single quote is not included
216 ; SHOULD be registered in the IANA charset registry
218 language =
220 value-chars = *( pct-encoded / attr-char )
222 pct-encoded = "%" HEXDIG HEXDIG
223 ; see [RFC3986], Section 2.1
225 attr-char = ALPHA / DIGIT
226 / "!" / "#" / "$" / "&" / "+" / "-" / "."
227 / "^" / "_" / "`" / "|" / "~"
228 ; token except ( "*" / "'" / "%" )
230 Thus, a parameter is either a regular parameter (reg-parameter), as
231 previously defined in Section 3.6 of [RFC2616], or an extended
232 parameter (ext-parameter).
234 Extended parameters are those where the left-hand side of the
235 assignment ends with an asterisk character.
237 The value part of an extended parameter (ext-value) is a token that
238 consists of three parts: the REQUIRED character set name (charset),
239 the OPTIONAL language information (language), and a character
240 sequence representing the actual value (value-chars), separated by
241 single quote characters. Note that both character set names and
242 language tags are restricted to the US-ASCII character set, and are
243 matched case-insensitively (see [RFC2978], Section 2.3 and [RFC5646],
244 Section 2.1.1).
246 Inside the value part, characters not contained in attr-char are
247 encoded into an octet sequence using the specified character set.
248 That octet sequence is then percent-encoded as specified in Section
249 2.1 of [RFC3986].
251 Producers MUST use the "UTF-8" ([RFC3629]) character set. Extension
252 character sets (mime-charset) are reserved for future use.
254 Note: recipients should be prepared to handle encoding errors,
255 such as malformed or incomplete percent escape sequences, or non-
256 decodable octet sequences, in a robust manner. This specification
257 does not mandate any specific behavior, for instance, the
258 following strategies are all acceptable:
260 * ignoring the parameter,
262 * stripping a non-decodable octet sequence,
264 * substituting a non-decodable octet sequence by a replacement
265 character, such as the Unicode character U+FFFD (Replacement
266 Character).
268 Note: the RFC 2616 token production ([RFC2616], Section 2.2)
269 differs from the production used in RFC 2231 (imported from
270 Section 5.1 of [RFC2045]) in that curly braces ("{" and "}") are
271 excluded. Thus, these two characters are excluded from the attr-
272 char production as well.
274 Note: the ABNF defined here differs from the one in
275 Section 2.3 of [RFC2978] in that it does not allow the single
276 quote character (see also RFC Errata ID 1912 [Err1912]). In
277 practice, no character set names using that character have been
278 registered at the time of this writing.
280 Note: [RFC5987] did require support for ISO-8859-1, too; for
281 compatibility with legacy code, recipients are encouraged to
282 support this encoding as well.
284 3.2.2. Examples
286 Non-extended notation, using "token":
288 foo: bar; title=Economy
290 Non-extended notation, using "quoted-string":
292 foo: bar; title="US-$ rates"
294 Extended notation, using the Unicode character U+00A3 (POUND SIGN):
296 foo: bar; title*=utf-8'en'%C2%A3%20rates
298 Note: the Unicode pound sign character U+00A3 was encoded into the
299 octet sequence C2 A3 using the UTF-8 character encoding, then
300 percent-encoded. Also, note that the space character was encoded as
301 %20, as it is not contained in attr-char.
303 Extended notation, using the Unicode characters U+00A3 (POUND SIGN)
304 and U+20AC (EURO SIGN):
306 foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates
308 Note: the Unicode pound sign character U+00A3 was encoded into the
309 octet sequence C2 A3 using the UTF-8 character encoding, then
310 percent-encoded. Likewise, the Unicode euro sign character U+20AC
311 was encoded into the octet sequence E2 82 AC, then percent-encoded.
312 Also note that HEXDIG allows both lowercase and uppercase characters,
313 so recipients must understand both, and that the language information
314 is optional, while the character set is not.
316 3.3. Language Specification in Encoded Words
318 Section 5 of [RFC2231] extends the encoding defined in [RFC2047] to
319 also support language specification in encoded words. Although the
320 HTTP/1.1 specification does refer to RFC 2047 ([RFC2616], Section
321 2.2), it's not clear to which header field exactly it applies, and
322 whether it is implemented in practice (see
323 for details).
325 Thus, this specification does not include this feature.
327 4. Guidelines for Usage in HTTP Header Field Definitions
329 Specifications of HTTP header fields that use the extensions defined
330 in Section 3.2 ought to clearly state that. A simple way to achieve
331 this is to normatively reference this specification, and to include
332 the ext-value production into the ABNF for that header field.
334 For instance:
336 foo-header = "foo" LWSP ":" LWSP token ";" LWSP title-param
337 title-param = "title" LWSP "=" LWSP value
338 / "title*" LWSP "=" LWSP ext-value
339 ext-value =
341 Note: The Parameter Value Continuation feature defined in Section
342 3 of [RFC2231] makes it impossible to have multiple instances of
343 extended parameters with identical parmname components, as the
344 processing of continuations would become ambiguous. Thus,
345 specifications using this extension are advised to disallow this
346 case for compatibility with RFC 2231.
348 4.1. When to Use the Extension
350 Section 4.2 of [RFC2277] requires that protocol elements containing
351 human-readable text are able to carry language information. Thus,
352 the ext-value production ought to be always used when the parameter
353 value is of textual nature and its language is known.
355 Furthermore, the extension ought to also be used whenever the
356 parameter value needs to carry characters not present in the US-ASCII
357 ([USASCII]) character set (note that it would be unacceptable to
358 define a new parameter that would be restricted to a subset of the
359 Unicode character set).
361 4.2. Error Handling
363 Header field specifications need to define whether multiple instances
364 of parameters with identical parmname components are allowed, and how
365 they should be processed. This specification suggests that a
366 parameter using the extended syntax takes precedence. This would
367 allow producers to use both formats without breaking recipients that
368 do not understand the extended syntax yet.
370 Example:
372 foo: bar; title="EURO exchange rates";
373 title*=utf-8''%e2%82%ac%20exchange%20rates
375 In this case, the sender provides an ASCII version of the title for
376 legacy recipients, but also includes an internationalized version for
377 recipients understanding this specification -- the latter obviously
378 ought to prefer the new syntax over the old one.
380 Note: at the time of this writing, many implementations failed to
381 ignore the form they do not understand, or prioritize the ASCII
382 form although the extended syntax was present.
384 5. Security Considerations
386 The format described in this document makes it possible to transport
387 non-ASCII characters, and thus enables character "spoofing"
388 scenarios, in which a displayed value appears to be something other
389 than it is.
391 Furthermore, there are known attack scenarios relating to decoding
392 UTF-8.
394 See Section 10 of [RFC3629] for more information on both topics.
396 In addition, the extension specified in this document makes it
397 possible to transport multiple language variants for a single
398 parameter, and such use might allow spoofing attacks, where different
399 language versions of the same parameter are not equivalent. Whether
400 this attack is useful as an attack depends on the parameter
401 specified.
403 6. Acknowledgements
405 Thanks to Martin Duerst and Frank Ellermann for help figuring out
406 ABNF details, to Graham Klyne and Alexey Melnikov for general review,
407 to Chris Newman for pointing out an RFC 2231 incompatibility, and to
408 Benjamin Carlyle, Roar Lauritzsen, and Eric Lawrence for
409 implementer's feedback.
411 7. References
413 7.1. Normative References
415 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
416 Requirement Levels", BCP 14, RFC 2119, March 1997.
418 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
419 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
420 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
422 [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration
423 Procedures", BCP 19, RFC 2978, October 2000.
425 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
426 10646", STD 63, RFC 3629, November 2003.
428 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter,
429 "Uniform Resource Identifier (URI): Generic Syntax",
430 STD 66, RFC 3986, January 2005.
432 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for
433 Syntax Specifications: ABNF", STD 68, RFC 5234,
434 January 2008.
436 [RFC5646] Phillips, A., Ed. and M. Davis, Ed., "Tags for
437 Identifying Languages", BCP 47, RFC 5646,
438 September 2009.
440 [USASCII] American National Standards Institute, "Coded Character
441 Set -- 7-bit American Standard Code for Information
442 Interchange", ANSI X3.4, 1986.
444 7.2. Informative References
446 [Err1912] RFC Errata, "Errata ID 1912, RFC 2978",
447 .
449 [ISO-8859-1] International Organization for Standardization,
450 "Information technology -- 8-bit single-byte coded
451 graphic character sets -- Part 1: Latin alphabet No.
452 1", ISO/IEC 8859-1:1998, 1998.
454 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet
455 Mail Extensions (MIME) Part One: Format of Internet
456 Message Bodies", RFC 2045, November 1996.
458 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
459 Extensions) Part Three: Message Header Extensions for
460 Non-ASCII Text", RFC 2047, November 1996.
462 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and
463 Encoded Word Extensions: Character Sets, Languages, and
464 Continuations", RFC 2231, November 1997.
466 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
467 Languages", BCP 18, RFC 2277, January 1998.
469 [RFC2388] Masinter, L., "Returning Values from Forms: multipart/
470 form-data", RFC 2388, August 1998.
472 [RFC5987] Reschke, J., "Character Set and Language Encoding for
473 Hypertext Transfer Protocol (HTTP) Header Field
474 Parameters", RFC 5987, August 2010.
476 URIs
478 [1]
480 [2]
482 Appendix A. Changes from RFC 5987
484 This section summarizes the changes compared to [RFC5987]:
486 o The document title was changed to "Indicating Character Encoding
487 and Language for HTTP Header Field Parameters".
489 o The requirement to support the "ISO-8859-1" encoding was removed.
491 Appendix B. Change Log (to be removed by RFC Editor before publication)
493 B.1. Since RFC5987
495 Only editorial changes for the purpose of starting the revision
496 process (obs5987).
498 B.2. Since draft-reschke-rfc5987bis-00
500 Resolved issues "iso-8859-1" and "title" (title simplified). Added
501 and resolved issue "historic5987".
503 Appendix C. Resolved issues (to be removed by RFC Editor before
504 publication)
506 Issues that were either rejected or resolved in this version of this
507 document.
509 C.1. iso-8859-1
511 Type: change
513 julian.reschke@greenbytes.de (2011-04-15): Remove requirement to
514 support ISO-8859-1? It doesn't really help, and it is not
515 implemented in IE9.
517 Resolution (2011-09-07): Removed requirement; adjusted examples;
518 explain that RFC 5987 required this so recipients may want to support
519 it anyway.
521 C.2. title
523 Type: edit
525 duerst@it.aoyama.ac.jp (2011-04-17): Proposed title: "Indicating
526 Character Encoding and Language for HTTP Header Field Parameters"
528 Resolution (2011-09-07): Done.
530 C.3. historic5987
532 In Section 1:
534 Type: change
536 julian.reschke@greenbytes.de (2011-09-08): Point out that RFC 5987
537 should be moved to "historic".
539 Resolution (2011-09-08): Done.
541 Appendix D. Open issues (to be removed by RFC Editor prior to
542 publication)
544 D.1. edit
546 Type: edit
548 julian.reschke@greenbytes.de (2011-04-15): Umbrella issue for
549 editorial fixes/enhancements.
551 D.2. impls
553 Type: change
555 julian.reschke@greenbytes.de (2011-04-15): Add implementation report.
557 Author's Address
559 Julian F. Reschke
560 greenbytes GmbH
561 Hafenweg 16
562 Muenster, NW 48155
563 Germany
565 EMail: julian.reschke@greenbytes.de
566 URI: http://greenbytes.de/tech/webdav/